JP2015230504A

JP2015230504A - Snapshot control device, snapshot control method, and snapshot control program

Info

Publication number: JP2015230504A
Application number: JP2014115005A
Authority: JP
Inventors: 佐藤　孝治; Koji Sato; 孝治佐藤; 一樹及川; Kazuki Oikawa; 公洋山本; Koyo Yamamoto
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2014-06-03
Filing date: 2014-06-03
Publication date: 2015-12-21
Anticipated expiration: 2034-06-03
Also published as: JP5947336B2

Abstract

PROBLEM TO BE SOLVED: To avoid becoming unable to write due to the shortage of free space in a data storage unit, etc., and copy a chunk appropriately.SOLUTION: When a request for write to a snapshot corresponding to file data which is stored in a plurality of data servers and which is stored separately in chunks of fixed length is accepted, a master server 100 determines a data server to which a chunk that is the object of the write request is copied by copy-on-write on the basis of one or more of a free space in the data storage unit of each data server, a distance between the data server holding the chunk to be copied and each data server, and the number of chunks shared by a plurality of files in each data server. Then, the master server 100 exerts control on the determined data server so that a chunk that is the object of the write request is copied.

Description

本発明は、スナップショット制御装置、スナップショット制御方法およびスナップショット制御プログラムに関する。 The present invention relates to a snapshot control device, a snapshot control method, and a snapshot control program.

近年、分散ファイルシステム（Distributed File System）が広く利用されるようになっている。分散ファイルシステムは、ネットワークで接続された多数のサーバで構成され、巨大な記憶装置を実現する。典型的には、分散ファイルシステムは、複数のラックにまたがる数十台から数百台、それ以上のサーバをネットワークで接続することで構成される。 In recent years, a distributed file system has been widely used. The distributed file system is composed of a large number of servers connected via a network, and realizes a huge storage device. Typically, a distributed file system is configured by connecting tens to hundreds or more servers across a plurality of racks via a network.

このような分散ファイルシステムの例としては、Google（登録商標） File System（以下、ＧＦＳ、非特許文献１参照）やHadoop（登録商標） Distributed File System（以下、ＨＤＦＳ、非特許文献２参照）が知られている。これらの分散ファイルシステムでは、ファイルを固定長のチャンクに分割して、分散ファイルシステムを構成する各サーバ（以下、データサーバという）に格納する。可用性や耐障害性を高めるために、各チャンクは定められた冗長度の数（ＧＦＳやＨＤＦＳではデフォルトで３）のデータサーバに複製して格納される。これにより、あるデータサーバに障害が発生しても、他のデータサーバに格納されているチャンクを用いて処理を継続することができる。 Examples of such distributed file systems include Google (registered trademark) File System (hereinafter referred to as GFS, non-patent document 1) and Hadoop (registered trademark) Distributed File System (hereinafter referred to as HDFS, non-patent document 2). Are known. In these distributed file systems, a file is divided into fixed-length chunks and stored in each server (hereinafter referred to as a data server) constituting the distributed file system. In order to increase availability and fault tolerance, each chunk is duplicated and stored in a data server with a predetermined number of redundancy (3 by default in GFS and HDFS). Thereby, even if a failure occurs in a certain data server, the processing can be continued using the chunk stored in the other data server.

ＧＦＳやＨＤＦＳはスナップショット機能を提供している。スナップショットとは、ある時点におけるファイルやディレクトリツリーのコピーである。スナップショットはデータのバックアップや誤った操作からのデータの復旧などに利用することができる。 GFS and HDFS provide a snapshot function. A snapshot is a copy of a file or directory tree at a certain point in time. Snapshots can be used for data backup and data recovery from incorrect operations.

ＧＦＳでは、ファイルやディレクトリに対する読み書き可能なスナップショットを作成することができる。スナップショットの実現にはコピーオンライトを用いており、高速なスナップショット作成や効率的なストレージの使用が可能となっている。 In GFS, it is possible to create a readable / writable snapshot for a file or directory. Copy-on-write is used to realize the snapshot, and high-speed snapshot creation and efficient storage use are possible.

ＨＤＦＳでは、ディレクトリに対する読み出し専用のスナップショットを作成することができる。スナップショットの作成において、データはコピーされないため、高速にスナップショットを作成することができる。 In HDFS, a read-only snapshot for a directory can be created. Since data is not copied when creating a snapshot, a snapshot can be created at high speed.

Sanjay Ghemawat、 Howard Gobioff、 Shun-Tak Leung、 The Google File System、 Proceedings of the 19th ACM Symposium on Operating Systems Principles、 pages 29-43、 October、 2003.Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung, The Google File System, Proceedings of the 19th ACM Symposium on Operating Systems Principles, pages 29-43, October, 2003. Hadoop、［online］、［平成２６年５月９日検索］、インターネット＜http://hadoop.apache.org/＞Hadoop, [online], [Search May 9, 2014], Internet <http://hadoop.apache.org/>

しかしながら、上記したＧＦＳでは、スナップショットへの書き込みにおいて、コピーオンライトにより、書き込み対象のチャンクを保持するデータサーバ内でローカルにチャンクがコピーされるので、ストレージの空き容量不足等による書き込み不可の状況では、適切にチャンクのコピーを行うことができない場合があるという課題があった。また、スナップショットに含まれるファイルを多く保持するデータサーバでは、スナップショットへの書き込みによって保持しているチャンク数が増加し、データサーバの負荷が増加することがある。 However, in the above-described GFS, when writing to a snapshot, the chunk is copied locally in the data server holding the chunk to be written by copy-on-write. However, there was a problem that the chunk copy could not be properly performed. In addition, in a data server that holds many files included in a snapshot, the number of chunks held by writing to the snapshot increases, and the load on the data server may increase.

例えば、ＧＦＳにおけるスナップショットの作成およびスナップショットへの書き込みの動作は以下のようになる。ここでは例として、ファイル１のスナップショットとしてファイル２を作成したとする。ファイル１のスナップショットとしてファイル２を作成すると、ファイル１とファイル２はファイルを構成するチャンクを共有するようになる。次に、ファイル２へ書き込みを行うと、コピーオンライトにより、書き込み対象のチャンクを保持するデータサーバ内でローカルに当該チャンクがコピーされ、ファイル２はコピーされたチャンクを参照するように変更される。ここで、書き込みはコピーされたチャンクに対して行われる。なお、ファイル１はもとのチャンクを参照したままである。また、書き込み対象でないチャンクはファイル１とファイル２で共有されたままである。 For example, the operation of creating a snapshot and writing to the snapshot in GFS is as follows. Here, as an example, assume that file 2 is created as a snapshot of file 1. When file 2 is created as a snapshot of file 1, file 1 and file 2 share the chunks that make up the file. Next, when writing to the file 2, the chunk is copied locally in the data server holding the chunk to be written by copy-on-write, and the file 2 is changed to refer to the copied chunk. . Here, the writing is performed on the copied chunk. Note that file 1 still refers to the original chunk. In addition, the chunk that is not a write target remains shared between the file 1 and the file 2.

ＧＦＳにおけるコピーオンライトによるチャンクのコピーでは、書き込み対象のチャンクを保持するデータサーバ内でローカルに当該チャンクがコピーされる。そのため、ストレージの空き容量がなくなると、データサーバ内でローカルにチャンクをコピーすることができなくなる。また、スナップショットにより複数のファイルに共有されているチャンクを多く保持するデータサーバでは、スナップショットへの書き込みによって保持しているチャンク数が増加し、当該データサーバの負荷が増加することがある。 In copying a chunk by copy-on-write in GFS, the chunk is copied locally in the data server holding the chunk to be written. For this reason, if there is no free storage space, chunks cannot be copied locally within the data server. In addition, in a data server that holds many chunks shared by a plurality of files by snapshot, the number of chunks held by writing to the snapshot may increase, and the load on the data server may increase.

スナップショットが作成されるファイルやディレクトリの数や、スナップショットへの読み書きの負荷は、ユースケースによってさまざまである。そのため、状況に応じて、コピーオンライトによるチャンクのコピーを制御することが必要となる。 The number of files and directories in which snapshots are created and the read / write load on snapshots vary depending on the use case. Therefore, it is necessary to control chunk copy by copy-on-write depending on the situation.

上述した課題を解決し、目的を達成するために、本発明のスナップショット制御装置は、複数のデータサーバに格納されたファイルデータであって、固定長のチャンクに分割して格納されたファイルデータに対するスナップショットへの書き込み要求を受け付けた場合に、コピーオンライトにより、該書き込み要求の対象となるチャンクのコピー先のデータサーバを、各データサーバにおけるデータ記憶部の空き容量、コピーされるチャンクを保持しているデータサーバと各データサーバとの距離、各データサーバの負荷状態、各データサーバにおける複数のファイルに共有されているチャンクの数のいずれか一つまたは複数に基づいて決定する決定部と、前記決定部によって決定されたデータサーバに対して、書き込み要求の対象となるチャンクをコピーするように制御する制御部とを備えることを特徴とする。 In order to solve the above-described problems and achieve the object, the snapshot control device of the present invention is file data stored in a plurality of data servers, and is stored by dividing into fixed-length chunks. When a write request to a snapshot is received, the copy-destination data server to which the chunk that is the target of the write request is copied, the free space of the data storage unit in each data server, the chunk to be copied A determination unit that makes a determination based on one or more of the distance between each data server that is held, the load status of each data server, and the number of chunks shared by a plurality of files in each data server To the data server determined by the determination unit, the target of the write request. And a controlling unit for controlling to copy the link.

また、本発明のスナップショット制御方法は、スナップショット制御装置で実行されるスナップショット制御方法であって、複数のデータサーバに格納されたファイルデータであって、固定長のチャンクに分割して格納されたファイルデータに対するスナップショットへの書き込み要求を受け付けた場合に、コピーオンライトにより、該書き込み要求の対象となるチャンクのコピー先のデータサーバを、各データサーバにおけるデータ記憶部の空き容量、コピーされるチャンクを保持しているデータサーバと各データサーバとの距離、各データサーバの負荷状態、各データサーバにおける複数のファイルに共有されているチャンクの数のいずれか一つまたは複数に基づいて決定する決定工程と、前記決定工程によって決定されたデータサーバに対して、書き込み要求の対象となるチャンクをコピーするように制御する制御工程とを含んだことを特徴とする。 The snapshot control method of the present invention is a snapshot control method executed by a snapshot control device, which is file data stored in a plurality of data servers, and is divided and stored in fixed-length chunks. When a write request to the snapshot for the file data that has been received is received, the copy-destination data server to which the chunk that is the target of the write request is copied, the free space of the data storage unit in each data server, the copy Based on one or more of the distance between each data server that holds the chunk to be processed, the load status of each data server, and the number of chunks shared by multiple files in each data server Determination process to determine, and data server determined by the determination process In contrast, characterized in that it includes a control step for controlling to copy the chunk to be the write request.

また、本発明のスナップショット制御プログラムは、複数のデータサーバに格納されたファイルデータであって、固定長のチャンクに分割して格納されたファイルデータに対するスナップショットへの書き込み要求を受け付けた場合に、コピーオンライトにより、該書き込み要求の対象となるチャンクのコピー先のデータサーバを、各データサーバにおけるデータ記憶部の空き容量、コピーされるチャンクを保持しているデータサーバと各データサーバとの距離、各データサーバの負荷状態、各データサーバにおける複数のファイルに共有されているチャンクの数のいずれか一つまたは複数に基づいて決定する決定ステップと、前記決定ステップによって決定されたデータサーバに対して、書き込み要求の対象となるチャンクをコピーするように制御する制御ステップとをコンピュータに実行させることを特徴とする。 In addition, the snapshot control program of the present invention receives file write requests for snapshots of file data stored in a plurality of data servers and stored divided into fixed-length chunks. , By copy-on-write, the copy destination data server of the chunk that is the target of the write request, the free capacity of the data storage unit in each data server, the data server holding the copied chunk and each data server A determination step based on any one or more of the distance, the load state of each data server, the number of chunks shared by a plurality of files in each data server, and the data server determined by the determination step On the other hand, the chunk that is the target of the write request is copied. Characterized in that to execute a control step of controlling to a computer.

本発明によれば、複数のファイルに共有されているチャンクを保持しているデータサーバのデータ記憶部の空き容量が不足しても、データ記憶部の空き容量がある他のデータサーバにチャンクをコピーすることにより、適切にチャンクのコピーを行うことができるという効果を奏する。 According to the present invention, even if the free space of the data storage unit of the data server holding the chunk shared by a plurality of files is insufficient, the chunk is transferred to another data server having the free space of the data storage unit. By copying, there is an effect that the chunk can be copied appropriately.

図１は、第１の実施形態におけるスナップショット制御システムの構成の一例を示す図である。FIG. 1 is a diagram illustrating an example of a configuration of a snapshot control system according to the first embodiment. 図２は、第１の実施形態におけるファイルデータと固定長のチャンクの関係を示す図である。FIG. 2 is a diagram illustrating the relationship between file data and fixed-length chunks in the first embodiment. 図３は、第１の実施形態におけるスナップショットと固定長のチャンクの関係を示す図である。FIG. 3 is a diagram illustrating a relationship between a snapshot and a fixed-length chunk in the first embodiment. 図４は、第１の実施形態におけるマスタサーバの構成の一例を示す図である。FIG. 4 is a diagram illustrating an example of the configuration of the master server in the first embodiment. 図５は、第１の実施形態におけるファイルテーブルの構成の一例を示す図である。FIG. 5 is a diagram illustrating an example of the configuration of the file table according to the first embodiment. 図６は、第１の実施形態におけるチャンクテーブルの構成の一例を示す図である。FIG. 6 is a diagram illustrating an example of the configuration of the chunk table in the first embodiment. 図７は、第１の実施形態におけるデータサーバ状態管理テーブルの構成の一例を示す図である。FIG. 7 is a diagram illustrating an example of the configuration of the data server state management table in the first embodiment. 図８は、第１の実施形態におけるデータサーバの構成の一例を示す図である。FIG. 8 is a diagram illustrating an example of the configuration of the data server according to the first embodiment. 図９は、第１の実施形態におけるチャンク情報テーブルの構成の一例を示す図である。FIG. 9 is a diagram illustrating an example of the configuration of the chunk information table according to the first embodiment. 図１０は、第１の実施形態におけるファイル生成処理において、外部アプリケーションがマスタサーバへファイル生成要求を送信したときの動作の一例を示すフローチャートである。FIG. 10 is a flowchart illustrating an example of an operation when an external application transmits a file generation request to the master server in the file generation process according to the first embodiment. 図１１は、第１の実施形態におけるファイル生成処理において、マスタサーバが外部アプリケーションからファイル生成要求を受信したときの動作の一例を示すフローチャートである。FIG. 11 is a flowchart illustrating an example of an operation when the master server receives a file generation request from an external application in the file generation process according to the first embodiment. 図１２は、第１の実施形態におけるディレクトリ生成処理において、外部アプリケーションがマスタサーバへディレクトリ生成要求を送信したときの動作の一例を示すフローチャートである。FIG. 12 is a flowchart illustrating an example of an operation when an external application transmits a directory generation request to the master server in the directory generation processing according to the first embodiment. 図１３は、第１の実施形態におけるディレクトリ生成処理において、マスタサーバが外部アプリケーションからディレクトリ生成要求を受信したときの動作の一例を示すフローチャートである。FIG. 13 is a flowchart illustrating an example of an operation when the master server receives a directory generation request from an external application in the directory generation processing according to the first embodiment. 図１４は、第１の実施形態におけるスナップショット生成処理において、外部アプリケーションがマスタサーバへスナップショット生成要求を送信したときの動作の一例を示すフローチャートである。FIG. 14 is a flowchart illustrating an example of an operation when an external application transmits a snapshot generation request to the master server in the snapshot generation processing according to the first embodiment. 図１５は、第１の実施形態におけるスナップショット生成処理において、マスタサーバが外部アプリケーションからスナップショット生成要求を受信したときの動作の一例を示すフローチャートである。FIG. 15 is a flowchart illustrating an example of an operation when the master server receives a snapshot generation request from an external application in the snapshot generation processing according to the first embodiment. 図１６は、第１の実施形態におけるスナップショット生成処理において、マスタサーバが外部アプリケーションからスナップショット生成要求を受信したときの動作の一例を示すフローチャートである。FIG. 16 is a flowchart illustrating an example of an operation when the master server receives a snapshot generation request from an external application in the snapshot generation processing according to the first embodiment. 図１７は、第１の実施形態におけるスナップショット生成処理において、マスタサーバが外部アプリケーションからスナップショット生成要求を受信したときの動作の一例を示すフローチャートである。FIG. 17 is a flowchart illustrating an example of an operation when the master server receives a snapshot generation request from an external application in the snapshot generation processing according to the first embodiment. 図１８は、第１の実施形態における書き込み処理において、外部アプリケーションがファイルへの書き込みを行うときの動作の一例を示すフローチャートである。FIG. 18 is a flowchart illustrating an example of an operation when an external application writes to a file in the writing process according to the first embodiment. 図１９は、第１の実施形態における書き込み処理において、マスタサーバがチャンク情報取得要求を受信したときの動作の一例を示すフローチャートである。FIG. 19 is a flowchart illustrating an example of an operation when the master server receives a chunk information acquisition request in the writing process according to the first embodiment. 図２０は、第１の実施形態における書き込み処理において、マスタサーバがチャンク情報取得要求を受信したときの動作の一例を示すフローチャートである。FIG. 20 is a flowchart illustrating an example of an operation when the master server receives a chunk information acquisition request in the writing process according to the first embodiment. 図２１は、第１の実施形態における書き込み処理において、マスタサーバがチャンク情報取得要求を受信したときの動作の一例を示すフローチャートである。FIG. 21 is a flowchart illustrating an example of an operation when the master server receives a chunk information acquisition request in the writing process according to the first embodiment. 図２２は、第１の実施形態における書き込み処理において、マスタサーバがチャンク情報取得要求を受信したときの動作の一例を示すフローチャートである。FIG. 22 is a flowchart illustrating an example of an operation when the master server receives a chunk information acquisition request in the writing process according to the first embodiment. 図２３は、第１の実施形態における書き込み処理において、データサーバがマスタサーバからローカルチャンクコピー要求を受信したときの動作の一例を示すフローチャートである。FIG. 23 is a flowchart illustrating an example of an operation when the data server receives a local chunk copy request from the master server in the writing process according to the first embodiment. 図２４は、第１の実施形態における書き込み処理において、データサーバがマスタサーバからリモートチャンクコピー要求を受信したときの動作の一例を示すフローチャートである。FIG. 24 is a flowchart illustrating an example of an operation when the data server receives a remote chunk copy request from the master server in the writing process according to the first embodiment. 図２５は、第１の実施形態における書き込み処理において、データサーバが他のデータサーバからチャンク読み出し要求を受信したときの動作の一例を示すフローチャートである。FIG. 25 is a flowchart illustrating an example of an operation when the data server receives a chunk read request from another data server in the writing process according to the first embodiment. 図２６は、第１の実施形態における書き込み処理において、データサーバがマスタサーバからチャンク生成要求を受信したときの動作の一例を示すフローチャートである。FIG. 26 is a flowchart illustrating an example of an operation when the data server receives a chunk generation request from the master server in the writing process according to the first embodiment. 図２７は、第１の実施形態における書き込み処理において、データサーバがマスタサーバから書き込み制御委譲要求を受信したときの動作の一例を示すフローチャートである。FIG. 27 is a flowchart illustrating an example of an operation when the data server receives a write control delegation request from the master server in the write processing according to the first embodiment. 図２８は、第１の実施形態における書き込み処理において、データサーバが外部アプリケーションまたは他のデータサーバからデータ送信要求を受信したときの動作の一例を示すフローチャートである。FIG. 28 is a flowchart illustrating an example of an operation when the data server receives a data transmission request from an external application or another data server in the writing process according to the first embodiment. 図２９は、第１の実施形態における書き込み処理において、プライマリデータサーバが外部アプリケーションから書き込み要求を受信したときの動作の一例を示すフローチャートである。FIG. 29 is a flowchart illustrating an example of an operation when the primary data server receives a write request from an external application in the write processing according to the first embodiment. 図３０は、第１の実施形態における書き込み処理において、セカンダリデータサーバがプライマリデータサーバからセカンダリ書き込み要求を受信したときの動作の一例を示すフローチャートである。FIG. 30 is a flowchart illustrating an example of an operation when the secondary data server receives a secondary write request from the primary data server in the write processing according to the first embodiment. 図３１は、スナップショット制御プログラムを実行するコンピュータを示す図である。FIG. 31 is a diagram illustrating a computer that executes a snapshot control program.

以下に添付図面を参照して、この発明に係るスナップショット制御装置、スナップショット制御方法およびスナップショット制御プログラムの実施形態を詳細に説明する。なお、この実施形態によりこの発明が限定されるものではない。 Exemplary embodiments of a snapshot control device, a snapshot control method, and a snapshot control program according to the present invention will be explained below in detail with reference to the accompanying drawings. In addition, this invention is not limited by this embodiment.

［第１の実施形態におけるスナップショット制御システムの構成の一例］
図１は、第１の実施形態におけるスナップショット制御システムの構成の一例を示す図である。 [Example of Configuration of Snapshot Control System in First Embodiment]
FIG. 1 is a diagram illustrating an example of a configuration of a snapshot control system according to the first embodiment.

図１に示すように、第１の実施形態におけるスナップショット制御システムは、マスタサーバ１００と、複数のデータサーバ２００、３００、４００、５００、６００として構成される。マスタサーバ１００、データサーバ２００、３００、４００、５００、６００はネットワーク８００で接続される。マスタサーバ１００は分散ファイルシステムの名前空間やデータサーバへのチャンクの割り当てなどを管理するためのメタデータを保持するスナップショット制御装置である。データサーバ２００、３００、４００、５００、６００はチャンクを保持する記憶装置である。外部アプリケーション７００はファイルやディレクトリの作成や削除、スナップショットの作成や削除、ファイルの読み出しや書き込みなどを要求する。図１では、例として、１つのマスタサーバを示しているが、可用性や耐障害性を高めるために、複数のマスタサーバを用意し、マスタサーバ間でメタデータを同期するようにしてもよい。また、図１では、例として、５つのデータサーバを示しているが、データサーバの数は５つに限定されない。 As shown in FIG. 1, the snapshot control system in the first embodiment is configured as a master server 100 and a plurality of data servers 200, 300, 400, 500, 600. The master server 100 and the data servers 200, 300, 400, 500, 600 are connected via a network 800. The master server 100 is a snapshot control device that holds metadata for managing the name space of the distributed file system, allocation of chunks to the data server, and the like. The data servers 200, 300, 400, 500, and 600 are storage devices that hold chunks. The external application 700 requests creation and deletion of files and directories, creation and deletion of snapshots, reading and writing of files, and the like. In FIG. 1, one master server is shown as an example. However, in order to increase availability and fault tolerance, a plurality of master servers may be prepared and metadata may be synchronized between the master servers. In FIG. 1, five data servers are shown as an example, but the number of data servers is not limited to five.

データサーバ２００、３００、４００、５００、６００には各データサーバを一意に識別するためのデータサーバ識別子が割り当てられる。例えば、データサーバ２００、３００、４００、５００、６００が動作するコンピュータのネットワークインタフェースに割り当てられたＩＰアドレスをデータサーバ識別子とする。図１の例では、データサーバ２００、３００、４００、５００、６００にはそれぞれデータサーバ識別子ｄｓ２、ｄｓ３、ｄｓ４、ｄｓ５、ｄｓ６が割り当てられている。 The data server 200, 300, 400, 500, 600 is assigned a data server identifier for uniquely identifying each data server. For example, an IP address assigned to a network interface of a computer on which the data servers 200, 300, 400, 500, and 600 operate is used as the data server identifier. In the example of FIG. 1, data server identifiers ds2, ds3, ds4, ds5, and ds6 are assigned to the data servers 200, 300, 400, 500, and 600, respectively.

外部アプリケーション７００が書き込むファイルデータは固定長のチャンクに分割されてデータサーバに格納される。各チャンクにはチャンクを一意に識別するためのチャンク識別子が割り当てられる。各チャンクは定められた冗長度の数のデータサーバに格納される。 File data written by the external application 700 is divided into fixed-length chunks and stored in the data server. Each chunk is assigned a chunk identifier for uniquely identifying the chunk. Each chunk is stored in a predetermined number of data servers.

図２は、第１の実施形態におけるファイルデータと固定長のチャンクの関係を示す図である。図２の例では、ファイルデータ９００は固定長のチャンク９１０、９２０、９３０に分割される。チャンク９１０、９２０、９３０にはそれぞれチャンク識別子ｃ１、ｃ２、ｃ３が割り当てられている。なお、ファイルのサイズが固定長の倍数でない場合、末尾のチャンク９３０のサイズは固定長よりも小さくなる。また、ファイル中にデータが存在しない区間がある場合、固定長よりも小さいチャンクとなったり、チャンクが割り当てられていない区間となったりすることがある。 FIG. 2 is a diagram illustrating the relationship between file data and fixed-length chunks in the first embodiment. In the example of FIG. 2, the file data 900 is divided into fixed-length chunks 910, 920, and 930. Chunk identifiers c1, c2, and c3 are assigned to the chunks 910, 920, and 930, respectively. When the file size is not a multiple of the fixed length, the size of the trailing chunk 930 is smaller than the fixed length. In addition, when there is a section in which no data exists in the file, the chunk may be smaller than the fixed length or may be a section to which no chunk is allocated.

図３は、第１の実施形態におけるスナップショットと固定長のチャンクの関係を示す図である。図３（１）はスナップショット作成前のファイル１１０００の状態である。ファイル１１０００はチャンク識別子ｃ１、ｃ２、ｃ３で識別されるチャンク１０１０、１０２０、１０３０で構成されている。図３（２）はファイル１１０００のスナップショットとしてファイル２１１００を作成したときのファイル１１０００とファイル２１１００の状態である。ファイル１１０００とファイル２１１００はチャンク識別子ｃ１、ｃ２、ｃ３で識別されるチャンク１０１０、１０２０、１０３０を共有している。 FIG. 3 is a diagram illustrating a relationship between a snapshot and a fixed-length chunk in the first embodiment. FIG. 3A shows the state of the file 1 1000 before creating the snapshot. The file 1 1000 includes chunks 1010, 1020, and 1030 identified by chunk identifiers c1, c2, and c3. FIG. 3B shows the state of the file 1 1000 and the file 2 1100 when the file 2 1100 is created as a snapshot of the file 1 1000. File 1 1000 and file 2 1100 share chunks 1010, 1020, 1030 identified by chunk identifiers c1, c2, c3.

図３（３）はファイル２１１００の末尾へ書き込みを行った後のファイル１１０００とファイル２１１００の状態である。ファイル２１１００への書き込みを行うと、書き込み対象チャンクであるチャンク識別子ｃ３で識別されるチャンク１０３０をコピーして、チャンク識別子ｃ４で識別されるチャンク１１３０を生成する。ファイル２１１００はチャンク識別子ｃ４で識別されるチャンク１１３０を参照するように変更される。書き込みはチャンク識別子ｃ４で識別されるチャンク１１３０に対して行われる。ファイル１１０００はチャンク識別子ｃ３で識別されるチャンク１０３０を参照したままである。また、チャンク識別子ｃ１で識別されるチャンク１０１０とチャンク識別子ｃ２で識別されるチャンク１０２０はファイル１１０００とファイル２１１００とで共有されたままである。 FIG. 3 (3) shows the state of file 1 1000 and file 2 1100 after writing to the end of file 2 1100. When writing to the file 2 1100 is performed, the chunk 1030 identified by the chunk identifier c3 that is the write target chunk is copied, and the chunk 1130 identified by the chunk identifier c4 is generated. File 2 1100 is modified to refer to chunk 1130 identified by chunk identifier c4. The writing is performed on the chunk 1130 identified by the chunk identifier c4. File 1 1000 remains referenced to chunk 1030 identified by chunk identifier c3. Also, the chunk 1010 identified by the chunk identifier c1 and the chunk 1020 identified by the chunk identifier c2 remain shared by the file 1 1000 and the file 2 1100.

［第１の実施形態におけるマスタサーバ１００の構成の一例］
図４は、第１の実施形態におけるマスタサーバ１００の構成の一例を示す図である。マスタサーバ１００はファイル管理部１１０とデータサーバ状態管理部１２０を備える。ファイル管理部１１０はファイルやディレクトリの生成や削除、ファイルへのチャンクの割り当て、チャンクを保持するデータサーバの割り当てを管理する。ファイル管理部１１０はファイルと当該ファイルを構成するチャンクとを対応づけるためのファイルテーブル１１１を保持する。また、ファイル管理部１１０はチャンクと当該チャンクを保持するデータサーバとを対応づけるためのチャンクテーブル１１２を保持する。また、ファイル管理部１１０は、決定部１１３および制御部１１４を有する。 [One Example of Configuration of Master Server 100 in the First Embodiment]
FIG. 4 is a diagram illustrating an example of the configuration of the master server 100 according to the first embodiment. The master server 100 includes a file management unit 110 and a data server state management unit 120. The file management unit 110 manages the creation and deletion of files and directories, the allocation of chunks to files, and the allocation of data servers that hold chunks. The file management unit 110 holds a file table 111 for associating a file with chunks constituting the file. In addition, the file management unit 110 holds a chunk table 112 for associating the chunk with a data server that holds the chunk. In addition, the file management unit 110 includes a determination unit 113 and a control unit 114.

決定部１１３は、複数のデータサーバに格納されたファイルデータであって、固定長のチャンクに分割して格納されたファイルデータに対するスナップショットへの書き込み要求を受け付けた場合に、コピーオンライトにより、該書き込み要求の対象となるチャンクのコピー先のデータサーバを、各データサーバにおけるデータ記憶部の空き容量、コピーされるチャンクを保持しているデータサーバと各データサーバとの距離、各データサーバの負荷状態、各データサーバにおける複数のファイルに共有されているチャンクの数のいずれか一つまたは複数に基づいて決定する。 When the determination unit 113 accepts a write request to a snapshot for file data stored in a plurality of data servers and stored by dividing into fixed-length chunks, by copy-on-write, The data server that is the copy destination of the chunk that is the target of the write request, the free space of the data storage unit in each data server, the distance between the data server holding the copied chunk and each data server, This is determined based on one or more of the load status and the number of chunks shared by a plurality of files in each data server.

例えば、決定部１１３は、データサーバにおけるデータ記憶部の空き容量が所定の閾値以上であるデータサーバのなかから、後述するデータサーバ状態管理部１２０によって取得された負荷状態が最も低いデータサーバを、コピー先のデータサーバとして決定する。 For example, the determination unit 113 selects the data server having the lowest load state acquired by the data server state management unit 120 described later from among the data servers in which the free space of the data storage unit in the data server is equal to or greater than a predetermined threshold. Determine as the data server of the copy destination.

また、決定部１１３は、例えば、データサーバにおけるデータ記憶部の空き容量が所定の閾値以上であるデータサーバのなかから、コピーされるチャンクを保持しているデータサーバとの距離が最も小さいデータサーバを、コピー先のデータサーバとして決定する。 In addition, the determination unit 113 is, for example, the data server having the smallest distance from the data server that holds the chunk to be copied from among the data servers in which the free space of the data storage unit in the data server is equal to or greater than a predetermined threshold Is determined as the data server of the copy destination.

また、決定部１１３は、例えば、データサーバにおけるデータ記憶部の空き容量が所定の閾値以上であるデータサーバのなかから、各データサーバにおける複数のファイルに共有されているチャンクの数が所定の閾値未満であるデータサーバの１つを、コピー先のデータサーバとして決定する。 In addition, the determination unit 113 determines, for example, that the number of chunks shared by a plurality of files in each data server from a data server in which the free space of the data storage unit in the data server is equal to or greater than a predetermined threshold. One of the data servers that is less than the number is determined as a copy destination data server.

ここで、データサーバにおけるデータ記憶部の空き容量が所定の閾値以上であるデータサーバのなかから、コピーされるチャンクを保持しているデータサーバとの「距離」、各データサーバの「負荷状態」、各データサーバにおける複数のファイルに共有されている「チャンクの数」のいずれのパラメータに基づいて、コピー先のデータサーバを決定するかは、管理者が任意に選択できるものとする。例えば、データサーバ同士のネットワーク上の距離が離れているようなスナップショット制御システムである場合には、コピーされるチャンクを保持しているデータサーバとの「距離」が最も小さいデータサーバを、コピー先のデータサーバとして決定するように設定することが好ましい。このように、スナップショット制御システムの特性に応じて、任意にパラメータを選択することができる。 Here, the “distance” with the data server holding the chunk to be copied from the data servers in which the free space of the data storage unit in the data server is equal to or greater than a predetermined threshold, and the “load state” of each data server It is assumed that the administrator can arbitrarily select which parameter of the “number of chunks” shared by a plurality of files in each data server to determine the copy destination data server. For example, if the snapshot control system is such that the data servers are separated from each other on the network, copy the data server with the smallest “distance” from the data server holding the copied chunk. It is preferable to set so as to be determined as the previous data server. Thus, parameters can be arbitrarily selected according to the characteristics of the snapshot control system.

また、複数のパラメータを設定することも可能であるし、複数のパラメータを選択して、パラメータに優先順位を設定することも可能である。例えば、各データサーバの「負荷状態」を最も高い優先順位に設定し、各データサーバにおける複数のファイルに共有されている「チャンクの数」を２番目に高い優先順位に設定する。このような場合には、例えば、負荷状態が最も低いデータサーバが複数ある場合には、それらのデータサーバのなかから、複数のファイルに共有されているチャンクの数が所定の閾値未満であるデータサーバを、コピー先のデータサーバとして決定するようにしてもよい。 It is also possible to set a plurality of parameters, and it is also possible to select a plurality of parameters and set the priority order for the parameters. For example, the “load state” of each data server is set to the highest priority, and the “number of chunks” shared by a plurality of files in each data server is set to the second highest priority. In such a case, for example, when there are a plurality of data servers having the lowest load state, the data in which the number of chunks shared by the plurality of files is less than a predetermined threshold from among the data servers. You may make it determine a server as a data server of a copy destination.

また、制御部１１４は、決定部１１３によって決定されたデータサーバに対して、書き込み要求の対象となるチャンクをコピーするように制御する。 In addition, the control unit 114 controls the data server determined by the determination unit 113 to copy the chunk that is the target of the write request.

データサーバ状態管理部１２０は、データサーバの状態を管理する。データサーバ状態管理部１２０は、データサーバの状態に関する情報を保持するデータサーバ状態管理テーブル１２１を保持する。データサーバ状態管理部１２０は、すべてのデータサーバから各データサーバのデータ記憶部に格納可能なデータの最大容量と、データ記憶部の空き容量と、データサーバの負荷状態をそれぞれ取得する。データサーバのデータ記憶部については後述する。 The data server state management unit 120 manages the state of the data server. The data server state management unit 120 holds a data server state management table 121 that holds information related to the state of the data server. The data server status management unit 120 acquires the maximum data capacity that can be stored in the data storage unit of each data server, the free capacity of the data storage unit, and the load status of the data server from all the data servers. The data storage unit of the data server will be described later.

ファイルテーブル１１１は、各ファイルまたはディレクトリに対して、当該ファイルまたはディレクトリのパス名と、ファイルかディレクトリかを示すパス種別、ファイルを構成する各チャンクのチャンク識別子のリストであるチャンク識別子リストとを保持する。これらに加えて、ファイルテーブル１１１は、ファイルまたはディレクトリに関するその他の情報を保持してもよい。 The file table 111 holds, for each file or directory, a path name of the file or directory, a path type indicating whether the file or directory, and a chunk identifier list that is a list of chunk identifiers of each chunk constituting the file. To do. In addition to these, the file table 111 may hold other information related to files or directories.

図５は、第１の実施形態におけるファイルテーブル１１１の構成の一例を示す図である。図５の例では、パス名として、ディレクトリ「／」、ファイル「／ｆｉｌｅ１」、ファイル「／ｆｉｌｅ２」、ファイル「／ｆｉｌｅ３」がファイルテーブル１１１に登録されている。パス種別では、「ｆ」はファイル、「ｄ」はディレクトリを表している。第１の実施形態におけるディレクトリは、当該ディレクトリ内のファイルやディレクトリに関する情報を持たないため、ディレクトリにおけるチャンク識別子リストは使用されない。図５の例では、ディレクトリ「／」の未使用のチャンク識別子リストを「−」で表している。ファイル「／ｆｉｌｅ１」はチャンク識別子ｃ１、ｃ２、ｃ３で識別されるチャンクで構成されている。ファイル「／ｆｉｌｅ２」はファイル「／ｆｉｌｅ１」のスナップショットとして作成されたあと、ファイルの末尾へ書き込みを行った状態であり、チャンク識別子ｃ１、ｃ２、ｃ４で識別されるチャンクで構成されている。ファイル「／ｆｉｌｅ３」はチャンク識別子ｃ５で識別されるチャンクで構成されている。 FIG. 5 is a diagram illustrating an example of the configuration of the file table 111 according to the first embodiment. In the example of FIG. 5, a directory “/”, a file “/ file1”, a file “/ file2”, and a file “/ file3” are registered in the file table 111 as path names. In the path type, “f” represents a file and “d” represents a directory. Since the directory in the first embodiment does not have information regarding files and directories in the directory, the chunk identifier list in the directory is not used. In the example of FIG. 5, the unused chunk identifier list of the directory “/” is represented by “−”. The file “/ file1” is composed of chunks identified by chunk identifiers c1, c2, and c3. The file “/ file2” is created as a snapshot of the file “/ file1” and then written to the end of the file, and is composed of chunks identified by chunk identifiers c1, c2, and c4. The file “/ file3” is composed of chunks identified by the chunk identifier c5.

チャンクテーブル１１２は、各チャンクに対して、当該チャンクのチャンク識別子と、当該チャンクを共有しているファイルの数を示す参照カウントと、当該チャンクを格納する各データサーバのデータサーバ識別子のリストであるデータサーバ識別子リストとを保持する。 The chunk table 112 is a list of chunk identifiers of each chunk, a reference count indicating the number of files sharing the chunk, and a data server identifier of each data server that stores the chunk. Holds a data server identifier list.

図６は、第１の実施形態におけるチャンクテーブル１１２の構成の一例を示す図である。図６の例では、チャンク識別子ｃ１で識別されるチャンクは参照カウントが２であり、データサーバ識別子ｄｓ２、ｄｓ３、ｄｓ４で識別されるデータサーバ各々に格納されている。チャンク識別子ｃ２で識別されるチャンクは参照カウントが２であり、データサーバ識別子ｄｓ２、ｄｓ５、ｄｓ６で識別されるデータサーバ各々に格納されている。チャンク識別子ｃ３で識別されるチャンクは参照カウントが１であり、データサーバ識別子ｄｓ２、ｄｓ４、ｄｓ５で識別されるデータサーバ各々に格納されている。チャンク識別子ｃ４で識別されるチャンクは参照カウントが１であり、データサーバ識別子ｄｓ２、ｄｓ４、ｄｓ６で識別されるデータサーバ各々に格納されている。チャンク識別子ｃ５で識別されるチャンクは参照カウントが１であり、データサーバ識別子ｄｓ３、ｄｓ４、ｄｓ５で識別されるデータサーバ各々に格納されている。 FIG. 6 is a diagram illustrating an example of the configuration of the chunk table 112 according to the first embodiment. In the example of FIG. 6, the chunk identified by the chunk identifier c1 has a reference count of 2, and is stored in each data server identified by the data server identifiers ds2, ds3, and ds4. The chunk identified by the chunk identifier c2 has a reference count of 2, and is stored in each of the data servers identified by the data server identifiers ds2, ds5, and ds6. The chunk identified by the chunk identifier c3 has a reference count of 1, and is stored in each of the data servers identified by the data server identifiers ds2, ds4, and ds5. The chunk identified by the chunk identifier c4 has a reference count of 1, and is stored in each of the data servers identified by the data server identifiers ds2, ds4, and ds6. The chunk identified by the chunk identifier c5 has a reference count of 1, and is stored in each of the data servers identified by the data server identifiers ds3, ds4, and ds5.

図６の例では、チャンク識別子ｃ１で識別されるチャンクおよびチャンク識別子ｃ２で識別されるチャンクはファイル「／ｆｉｌｅ１」とファイル「／ｆｉｌｅ２」とで共有されているため、参照カウントがそれぞれ２になっている。また、ファイル「／ｆｉｌｅ２」の末尾への書き込みでは、コピーオンライトにより、チャンク識別子ｃ３で識別されるチャンクがチャンク識別子ｃ４で識別されるチャンクにコピーされ、そのあとにチャンク識別子ｃ４で識別されるチャンクへ書き込みが行われている。データサーバ識別子ｄｓ２で識別されるデータサーバと、データサーバ識別子ｄｓ４で識別されるデータサーバでは、コピーオンライトによるチャンクのコピーは同一データサーバ内でローカルに行われているが、データサーバ識別子ｄｓ５で識別されるデータサーバでは、データサーバ識別子ｄｓ６で識別されるデータサーバへのチャンクのコピーが行われている。 In the example of FIG. 6, since the chunk identified by the chunk identifier c1 and the chunk identified by the chunk identifier c2 are shared by the file “/ file1” and the file “/ file2”, the reference count becomes 2, respectively. ing. Further, when writing to the end of the file “/ file2”, the chunk identified by the chunk identifier c3 is copied to the chunk identified by the chunk identifier c4 by copy-on-write, and then identified by the chunk identifier c4. The chunk is being written to. In the data server identified by the data server identifier ds2 and the data server identified by the data server identifier ds4, the copy of the chunk by copy-on-write is performed locally within the same data server, but the data server identifier ds5 In the identified data server, the chunk is copied to the data server identified by the data server identifier ds6.

データサーバ状態管理テーブル１２１は、各データサーバに対して、当該データサーバのデータサーバ識別子、当該データサーバのデータ記憶部に格納可能なデータの最大容量と、データ記憶部の空き容量と、当該データサーバの負荷状態とを保持する。これらに加えて、データサーバ状態管理テーブル１２１は、データサーバに関するその他の情報を保持してもよい。 The data server state management table 121 includes, for each data server, the data server identifier of the data server, the maximum capacity of data that can be stored in the data storage unit of the data server, the free capacity of the data storage unit, and the data Maintains server load status. In addition to these, the data server state management table 121 may hold other information related to the data server.

図７は、第１の実施形態におけるデータサーバ状態管理テーブル１２１の構成の一例を示す図である。図７の例では、データサーバ識別子ｄｓ２で識別されるデータサーバ２００のデータ記憶部に格納可能なデータの最大容量は２．０テラバイト（ＴＢ）、空き容量は１．０テラバイト（ＴＢ）、負荷状態は０．５、データサーバ識別子ｄｓ３で識別されるデータサーバ３００のデータ記憶部に格納可能なデータの最大容量は２．０テラバイト（ＴＢ）、空き容量は１．１テラバイト（ＴＢ）、負荷状態は１．０、データサーバ識別子ｄｓ４で識別されるデータサーバ４００のデータ記憶部に格納可能なデータの最大容量は１．０テラバイト（ＴＢ）、空き容量は０．６テラバイト（ＴＢ）、負荷状態は３．０、データサーバ識別子ｄｓ５で識別されるデータサーバ５００のデータ記憶部に格納可能なデータの最大容量は１．０テラバイト（ＴＢ）、空き容量は０．１テラバイト（ＴＢ）、負荷状態は０．８、データサーバ識別子ｄｓ６で識別されるデータサーバ６００のデータ記憶部に格納可能なデータの最大容量は１．５テラバイト（ＴＢ）、空き容量は０．７テラバイト（ＴＢ）、負荷状態は０．５である。 FIG. 7 is a diagram illustrating an example of the configuration of the data server state management table 121 according to the first embodiment. In the example of FIG. 7, the maximum capacity of data that can be stored in the data storage unit of the data server 200 identified by the data server identifier ds2 is 2.0 terabytes (TB), the free capacity is 1.0 terabytes (TB), and the load The state is 0.5, the maximum capacity of data that can be stored in the data storage unit of the data server 300 identified by the data server identifier ds3 is 2.0 terabytes (TB), the free capacity is 1.1 terabytes (TB), and the load The state is 1.0, the maximum capacity of data that can be stored in the data storage unit of the data server 400 identified by the data server identifier ds4 is 1.0 terabyte (TB), the free capacity is 0.6 terabyte (TB), and the load The state is 3.0, and the maximum capacity of data that can be stored in the data storage unit of the data server 500 identified by the data server identifier ds5 is 1.0 terabyte (T ), The free capacity is 0.1 terabytes (TB), the load state is 0.8, and the maximum capacity of data that can be stored in the data storage unit of the data server 600 identified by the data server identifier ds6 is 1.5 terabytes (TB) ), The free space is 0.7 terabytes (TB), and the load state is 0.5.

このように、マスタサーバ１００は、コピーオンライトによるチャンクのコピーにおいて、多様なコピーオンライト戦略に基づいて、コピーオンライトによるチャンクのコピー先を決定する。このため、複数のファイルに共有されているチャンクを保持しているデータサーバのデータ記憶部の空き容量が不足しても、データ記憶部の空き容量がある他のデータサーバにチャンクをコピーすることにより、書き込みを継続したり、データサーバ間で負荷を分散したりすることができるという効果を奏する。 As described above, the master server 100 determines a copy destination of a chunk by copy-on-write based on various copy-on-write strategies in copying a chunk by copy-on-write. For this reason, even if the free space of the data storage unit of the data server that holds chunks shared by multiple files is insufficient, the chunk is copied to another data server that has free space in the data storage unit. Thus, it is possible to continue writing or to distribute the load among the data servers.

［第１の実施形態におけるデータサーバの構成の一例］
図８は、第１の実施形態におけるデータサーバ２００、３００、４００、５００、６００の構成の一例を示す図である。なお、データサーバ２００、３００、４００、５００、６００の構成はいずれも同様であるものとし、以下、データサーバ２００を代表例として説明する。以下の説明では、必要に応じて対応する他のデータサーバ３００、４００、５００、６００の構成要素の参照符号を、括弧を付して併記する。 [One example of configuration of data server in first embodiment]
FIG. 8 is a diagram illustrating an example of the configuration of the data servers 200, 300, 400, 500, and 600 according to the first embodiment. The configurations of the data servers 200, 300, 400, 500, and 600 are all the same, and the data server 200 will be described below as a representative example. In the following description, reference numerals of constituent elements of other corresponding data servers 300, 400, 500, and 600 are shown together with parentheses as necessary.

データサーバ２００（３００、４００、５００、６００）は、データ記憶部２１０（３１０、４１０、５１０、６１０）、データアクセス部２２０（３２０、４２０、５２０、６２０）、および状態通知部２３０（３３０、４３０、５３０、６３０）を備える。 The data server 200 (300, 400, 500, 600) includes a data storage unit 210 (310, 410, 510, 610), a data access unit 220 (320, 420, 520, 620), and a status notification unit 230 (330, 430, 530, 630).

データ記憶部２１０（３１０、４１０、５１０、６１０）はデータサーバ２００（３００、４００、５００、６００）が格納するすべてのチャンク、およびチャンク情報テーブル２１１（３１１、４１１、５１１、６１１）を保持する。 The data storage unit 210 (310, 410, 510, 610) holds all the chunks stored in the data server 200 (300, 400, 500, 600) and the chunk information table 211 (311, 411, 511, 611). .

第１の実施形態におけるチャンク情報テーブル２１１（３１１、４１１、５１１、６１１）は、格納する各チャンクに対して、当該チャンクのチャンク識別子と、データ記憶部２１０（３１０、４１０、５１０、６１０）内での当該チャンクの格納場所と、当該チャンクのサイズとを保持する。これらに加えて、チャンク情報テーブル２１１（３１１、４１１、５１１、６１１）は、チャンクに関するその他の情報を保持してもよい。 The chunk information table 211 (311, 411, 511, 611) in the first embodiment, for each chunk to be stored, the chunk identifier of the chunk and the data storage unit 210 (310, 410, 510, 610) The storage location of the chunk and the size of the chunk are stored. In addition to these, the chunk information table 211 (311 411 511 611) may hold other information related to the chunk.

図９は、第１の実施形態におけるチャンク情報テーブル２１１（３１１、４１１、５１１、６１１）の構成の一例を示す図である。図９の例では、チャンク識別子ｃ１で識別されるチャンクは、データ記憶部２１０内の「ｐ１」で識別される位置に格納されており、チャンクのサイズは６４メガバイト（ＭＢ）、チャンク識別子ｃ２で識別されるチャンクは、データ記憶部２１０内の「ｐ２」で識別される位置に格納されており、チャンクのサイズは６４メガバイト（ＭＢ）、チャンク識別子ｃ３で識別されるチャンクは、データ記憶部２１０内の「ｐ３」で識別される位置に格納されており、チャンクのサイズは４８メガバイト（ＭＢ）、チャンク識別子ｃ４で識別されるチャンクは、データ記憶部２１０内の「ｐ４」で識別される位置に格納されており、チャンクのサイズは６４メガバイト（ＭＢ）である。 FIG. 9 is a diagram illustrating an example of the configuration of the chunk information table 211 (311 411 511 611) according to the first embodiment. In the example of FIG. 9, the chunk identified by the chunk identifier c1 is stored at the position identified by “p1” in the data storage unit 210, the chunk size is 64 megabytes (MB), and the chunk identifier c2 The identified chunk is stored in the position identified by “p2” in the data storage unit 210, the chunk size is 64 megabytes (MB), and the chunk identified by the chunk identifier c3 is stored in the data storage unit 210. The chunk size is 48 megabytes (MB), and the chunk identified by the chunk identifier c4 is the position identified by “p4” in the data storage unit 210. The chunk size is 64 megabytes (MB).

データアクセス部２２０（３２０、４２０、５２０、６２０）は、データ記憶部２１０（３１０、４１０、５１０、６１０）に格納されているチャンクの読み出しや書き込みを実行する。また、データアクセス部２２０（３２０、４２０、５２０、６２０）は、データ記憶部２１０（３１０、４１０、５１０、６１０）に格納されているチャンク情報テーブル２１１（３１１、４１１、５１１、６１１）の情報の読み出しや書き込みを実行する。 The data access unit 220 (320, 420, 520, 620) executes reading and writing of chunks stored in the data storage unit 210 (310, 410, 510, 610). Further, the data access unit 220 (320, 420, 520, 620) stores information in the chunk information table 211 (311, 411, 511, 611) stored in the data storage unit 210 (310, 410, 510, 610). Read and write are executed.

状態通知部２３０（３３０、４３０、５３０、６３０）は、データ記憶部２１０（３１０、４１０、５１０、６１０）の状態やデータサーバ２００（３００、４００、５００、６００）の負荷状態を監視し、マスタサーバ１００に通知する。状態通知部２３０（３３０、４３０、５３０、６３０）は、データ記憶部２１０（３１０、４１０、５１０、６１０）に格納可能なデータの最大容量とデータ記憶部２１０（３１０、４１０、５１０、６１０）の空き容量とデータサーバ２００（３００、４００、５００、６００）の負荷状態とを定期的に検知し、検知した最大容量と空き容量と負荷状態とをマスタサーバ１００へ通知する。 The state notification unit 230 (330, 430, 530, 630) monitors the state of the data storage unit 210 (310, 410, 510, 610) and the load state of the data server 200 (300, 400, 500, 600), Notify the master server 100. The status notification unit 230 (330, 430, 530, 630) includes a maximum capacity of data that can be stored in the data storage unit 210 (310, 410, 510, 610) and the data storage unit 210 (310, 410, 510, 610). And the data server 200 (300, 400, 500, 600) are periodically detected, and the detected maximum capacity, free capacity, and load status are notified to the master server 100.

マスタサーバ１００はデータサーバ状態管理部１２０のデータサーバ状態管理テーブル１２１における当該データサーバの最大容量と空き容量と負荷状態とを、通知された最大容量と空き容量と負荷状態とで更新する。これらに加えて、状態通知部２３０（３３０、４３０、５３０、６３０）は、データサーバ２００（３００、４００、５００、６００）に関するその他の情報をマスタサーバ１００に通知してもよい。なお、ここでは定期的に状態を検知し通知するものとするが、なんらかのイベントの発生に応じて検知および通知を実行するものとしてもよい。 The master server 100 updates the maximum capacity, free capacity, and load state of the data server in the data server state management table 121 of the data server state management unit 120 with the notified maximum capacity, free capacity, and load state. In addition to these, the state notification unit 230 (330, 430, 530, 630) may notify the master server 100 of other information related to the data server 200 (300, 400, 500, 600). Here, the state is periodically detected and notified, but detection and notification may be executed in response to the occurrence of some event.

［第１の実施形態におけるスナップショット制御の一例］
はじめに、ファイルやディレクトリを生成するときの処理の一例を［ファイル生成処理の一例］と［ディレクトリ生成処理の一例］で説明する。次に、ファイルやディレクトリのスナップショットを生成するときの処理の一例を［スナップショット生成処理の一例］で説明する。最後に、ファイルへの書き込み処理の一例を［書き込み処理の一例］で説明する。 [One example of snapshot control in the first embodiment]
First, an example of processing when generating a file or directory will be described in [Example of file generation processing] and [Example of directory generation processing]. Next, an example of processing when generating a snapshot of a file or directory will be described in [Example of snapshot generation processing]. Finally, an example of a file writing process will be described in [Example of writing process].

［ファイル生成処理の一例］
［ファイル生成処理における外部アプリケーション７００の処理の一例］
図１０は、第１の実施形態におけるファイル生成処理において、外部アプリケーション７００がマスタサーバ１００へファイル生成要求を送信したときの動作の一例を示すフローチャートである。 [Example of file generation processing]
[Example of processing of external application 700 in file generation processing]
FIG. 10 is a flowchart illustrating an example of an operation when the external application 700 transmits a file generation request to the master server 100 in the file generation process according to the first embodiment.

外部アプリケーション７００はマスタサーバ１００へファイル生成要求と、ファイルパス名とを送信する（ステップＳ１００１）。そして、外部アプリケーション７００はマスタサーバ１００からファイル生成応答を受信する（ステップＳ１００２）。 The external application 700 transmits a file generation request and a file path name to the master server 100 (step S1001). Then, the external application 700 receives a file generation response from the master server 100 (step S1002).

［ファイル生成処理におけるマスタサーバ１００の処理の一例］
図１１は、第１の実施形態におけるファイル生成処理において、マスタサーバ１００が外部アプリケーション７００からファイル生成要求を受信したときの動作の一例を示すフローチャートである。 [Example of processing of master server 100 in file generation processing]
FIG. 11 is a flowchart illustrating an example of an operation when the master server 100 receives a file generation request from the external application 700 in the file generation process according to the first embodiment.

マスタサーバ１００は外部アプリケーション７００からファイル生成要求と、ファイルパス名とを受信する（ステップＳ１１０１）。そして、マスタサーバ１００はファイル管理部１１０のファイルテーブル１１１に当該ファイルパス名と、ファイルであることを示すパス種別「ｆ」と、空のチャンク識別子リストとを登録する（ステップＳ１１０２）。続いて、マスタサーバ１００は外部アプリケーション７００にファイル生成応答を送信する（ステップＳ１１０３）。 The master server 100 receives a file generation request and a file path name from the external application 700 (step S1101). Then, the master server 100 registers the file path name, the path type “f” indicating that the file is a file, and an empty chunk identifier list in the file table 111 of the file management unit 110 (step S1102). Subsequently, the master server 100 transmits a file generation response to the external application 700 (step S1103).

［ディレクトリ生成処理の一例］
［ディレクトリ生成処理における外部アプリケーション７００の処理の一例］
図１２は、第１の実施形態におけるディレクトリ生成処理において、外部アプリケーション７００がマスタサーバ１００へディレクトリ生成要求を送信したときの動作の一例を示すフローチャートである。 [Example of directory generation processing]
[Example of processing of external application 700 in directory generation processing]
FIG. 12 is a flowchart illustrating an example of an operation when the external application 700 transmits a directory generation request to the master server 100 in the directory generation process according to the first embodiment.

外部アプリケーション７００はマスタサーバ１００へディレクトリ生成要求と、ディレクトリパス名とを送信する（ステップＳ１２０１）。そして、外部アプリケーション７００はマスタサーバ１００からディレクトリ生成応答を受信する（ステップＳ１２０２）。 The external application 700 transmits a directory generation request and a directory path name to the master server 100 (step S1201). Then, the external application 700 receives a directory generation response from the master server 100 (step S1202).

［ディレクトリ生成処理におけるマスタサーバ１００の処理の一例］
図１３は、第１の実施形態におけるディレクトリ生成処理において、マスタサーバ１００が外部アプリケーション７００からディレクトリ生成要求を受信したときの動作の一例を示すフローチャートである。 [Example of processing of master server 100 in directory generation processing]
FIG. 13 is a flowchart illustrating an example of an operation when the master server 100 receives a directory generation request from the external application 700 in the directory generation process according to the first embodiment.

マスタサーバ１００は外部アプリケーション７００からディレクトリ生成要求と、ディレクトリパス名とを受信する（ステップＳ１３０１）。そして、マスタサーバ１００はファイル管理部１１０のファイルテーブル１１１に当該ディレクトリパス名と、ディレクトリであることを示すパス種別「ｄ」とを登録する（ステップＳ１３０２）。チャンク識別子リストは使用しない。続いて、マスタサーバ１００は外部アプリケーション７００にディレクトリ生成応答を送信する（ステップＳ１３０３）。 The master server 100 receives a directory generation request and a directory path name from the external application 700 (step S1301). Then, the master server 100 registers the directory path name and the path type “d” indicating the directory in the file table 111 of the file management unit 110 (step S1302). Chunk identifier list is not used. Subsequently, the master server 100 transmits a directory generation response to the external application 700 (step S1303).

［スナップショット生成処理の一例］
［スナップショット生成処理における外部アプリケーション７００の処理の一例］
図１４は、第１の実施形態におけるスナップショット生成処理において、外部アプリケーション７００がマスタサーバ１００へスナップショット生成要求を送信したときの動作の一例を示すフローチャートである。 [Example of snapshot generation processing]
[Example of processing of external application 700 in snapshot generation processing]
FIG. 14 is a flowchart illustrating an example of an operation when the external application 700 transmits a snapshot generation request to the master server 100 in the snapshot generation processing according to the first embodiment.

外部アプリケーション７００はマスタサーバ１００へスナップショット生成要求と、スナップショット元パス名と、スナップショット先パス名とを送信する（ステップＳ１４０１）。そして、外部アプリケーション７００はマスタサーバ１００からスナップショット生成応答を受信する（ステップＳ１４０２）。 The external application 700 transmits a snapshot generation request, a snapshot source path name, and a snapshot destination path name to the master server 100 (step S1401). Then, the external application 700 receives a snapshot generation response from the master server 100 (step S1402).

［スナップショット生成処理におけるマスタサーバ１００の処理の一例］
図１５は、第１の実施形態におけるスナップショット生成処理において、マスタサーバ１００が外部アプリケーション７００からスナップショット生成要求を受信したときの動作の一例を示すフローチャートである。 [Example of processing of master server 100 in snapshot generation processing]
FIG. 15 is a flowchart illustrating an example of an operation when the master server 100 receives a snapshot generation request from the external application 700 in the snapshot generation processing according to the first embodiment.

マスタサーバ１００は外部アプリケーション７００からスナップショット生成要求と、スナップショット元パス名と、スナップショット先パス名とを受信する（ステップＳ１５０１）。 The master server 100 receives a snapshot generation request, a snapshot source path name, and a snapshot destination path name from the external application 700 (step S1501).

そして、マスタサーバ１００はファイル管理部１１０のファイルテーブル１１１におけるスナップショット元パス名に対するパス種別と、チャンク識別子リストとを取得する（ステップＳ１５０２）。 Then, the master server 100 acquires a path type for the snapshot source path name in the file table 111 of the file management unit 110 and a chunk identifier list (step S1502).

続いて、マスタサーバ１００は当該パス種別が「ｆ」ならば、ステップＳ１５０４へ、「ｄ」ならば、ステップＳ１５０５へ遷移する（ステップＳ１５０３）。マスタサーバ１００は当該パス種別が「ｆ」ならば、ファイルスナップショット処理（ステップＳ１５０４−１〜ステップＳ１５０４−５）を実行する（ステップＳ１５０４）。 Subsequently, the master server 100 proceeds to step S1504 if the path type is “f”, and proceeds to step S1505 if the path type is “d” (step S1503). If the path type is “f”, the master server 100 executes file snapshot processing (step S1504-1 to step S1504-5) (step S1504).

ここで、図１６を用いて、マスタサーバ１００のファイルスナップショット処理について説明する。マスタサーバ１００はファイル管理部１１０のファイルテーブル１１１にスナップショット先パス名と、パス種別「ｆ」と、ステップＳ１５０２で取得したチャンク識別子リストとを登録する（ステップＳ１５０４−１）。ここで、マスタサーバ１００は当該チャンク識別子リスト内のすべてのチャンク識別子に対して後述するステップＳ１５０４−４の処理を実行する。 Here, the file snapshot processing of the master server 100 will be described with reference to FIG. The master server 100 registers the snapshot destination path name, the path type “f”, and the chunk identifier list acquired in step S1502 in the file table 111 of the file management unit 110 (step S1504-1). Here, the master server 100 executes the process of step S1504-4 described later for all chunk identifiers in the chunk identifier list.

はじめに、当該チャンク識別子リスト内のチャンク識別子の位置を示すインデックスを当該チャンク識別子リスト内の先頭に設定する（ステップＳ１５０４−２）。 First, an index indicating the position of the chunk identifier in the chunk identifier list is set at the head in the chunk identifier list (step S1504-2).

続いて、マスタサーバ１００は当該チャンク識別子リスト内のすべてのチャンク識別子の処理が完了したか判定し（ステップＳ１５０４−３）、完了したら処理を終了する。そうでなければ、ステップＳ１５０４−４へ遷移する。 Subsequently, the master server 100 determines whether or not the processing of all chunk identifiers in the chunk identifier list has been completed (step S1504-3). Otherwise, the process proceeds to step S1504-4.

マスタサーバ１００は当該チャンク識別子リスト内のすべてのチャンク識別子の処理が完了していなければ、ファイル管理部１１０のチャンクテーブル１１２における当該チャンク識別子リスト内のインデックスの位置のチャンク識別子に対する参照カウントの値を１増加させる（ステップＳ１５０４−４）。その後、マスタサーバ１００はインデックスを当該チャンク識別子リスト内の次の位置に設定する（ステップＳ１５０４−５）。 If the processing of all chunk identifiers in the chunk identifier list is not completed, the master server 100 sets the reference count value for the chunk identifier at the index position in the chunk identifier list in the chunk table 112 of the file management unit 110. It is increased by 1 (step S1504-4). Thereafter, the master server 100 sets the index to the next position in the chunk identifier list (step S1504-5).

図１５の説明に戻って、マスタサーバ１００は当該パス種別が「ｄ」ならば、ディレクトリスナップショット処理（ステップＳ１５０５−１〜ステップＳ１５０５−１１）を実行する（ステップＳ１５０５）。ディレクトリスナップショット処理では、ディレクトリ配下のファイルおよびディレクトリに対して再帰的にスナップショットを作成する。 Returning to the description of FIG. 15, if the path type is “d”, the master server 100 executes directory snapshot processing (step S1505-1 to step S1505-11) (step S1505). In directory snapshot processing, snapshots are recursively created for files and directories under the directory.

ここで、図１７を用いて、マスタサーバ１００のディレクトリスナップショット処理について説明する。マスタサーバ１００はファイル管理部１１０のファイルテーブル１１１から当該スナップショット元パス名配下にあるすべてのパス名のリスト（以降、スナップショット元配下パス名リスト）を取得する（ステップＳ１５０５−１）。 Here, the directory snapshot processing of the master server 100 will be described with reference to FIG. The master server 100 acquires a list of all path names under the snapshot source path name (hereinafter referred to as snapshot source subordinate path name list) from the file table 111 of the file management unit 110 (step S1505-1).

マスタサーバ１００は当該スナップショット元配下パス名リスト内のすべてのパス名に対してステップＳ１５０５−４からステップＳ１５０５−１０の処理を実行する。初めに、当該スナップショット元配下パス名リスト内のパス名の位置を示すインデックス１を当該スナップショット元配下パス名リスト内の先頭に設定する（ステップＳ１５０５−２）。 The master server 100 executes the processing from step S1505-4 to step S1505-10 for all the path names in the snapshot source subordinate path name list. First, the index 1 indicating the position of the path name in the snapshot source subordinate path name list is set to the head in the snapshot source subordinate path name list (step S1505-2).

そして、マスタサーバ１００は当該スナップショット元配下パス名リスト内のすべてのパス名の処理が完了したか判定し（ステップＳ１５０５−３）、完了したと判定した場合には、処理を終了する。また、処理が完了していないと判定した場合には、ステップＳ１５０５−４へ遷移する。 Then, the master server 100 determines whether or not the processing of all path names in the snapshot source subordinate path name list is completed (step S1505-3). If it is determined that the process has not been completed, the process proceeds to step S1505-4.

マスタサーバ１００は当該スナップショット元配下パス名リスト内のすべてのパス名の処理が完了していなければ、ファイル管理部１１０のファイルテーブル１１１における当該スナップショット元配下パス名リスト内のインデックス１の位置のパス名に対するパス種別とチャンク識別子リストを取得する（ステップＳ１５０５−４）。 If the processing of all path names in the snapshot source subordinate path name list is not completed, the master server 100 positions index 1 in the snapshot source subordinate path name list in the file table 111 of the file management unit 110. A path type and a chunk identifier list for the path name are acquired (step S1505-4).

続いて、マスタサーバ１００はファイル管理部１１０のファイルテーブル１１１に、当該スナップショット元配下パス名リスト内のインデックス１の位置のパス名におけるルートディレクトリから親ディレクトリまでの部分をスナップショット先パス名で置き換えたパス名と、ステップＳ１５０５−４で取得したパス種別、ステップＳ１５０５−４で取得したチャンク識別子リストとを登録する（ステップＳ１５０５−５）。 Subsequently, the master server 100 stores the portion from the root directory to the parent directory in the path name at the index 1 position in the snapshot source subordinate path name list in the file table 111 of the file management unit 110 as the snapshot destination path name. The replaced path name, the path type acquired in step S1505-4, and the chunk identifier list acquired in step S1505-4 are registered (step S1505-5).

そして、マスタサーバ１００は当該パス種別が「ｆ」であるか「ｄ」であるかを判定し、「ｆ」ならば、ステップＳ１５０５−７へ、「ｄ」ならば、ステップＳ１５０５−１１へ遷移する（ステップＳ１５０５−６）。ここで、当該パス種別が「ｆ」である場合には、マスタサーバ１００は当該チャンク識別子リスト内のすべてのチャンク識別子に対して後述するステップＳ１５０５−９の処理を実行する。 Then, the master server 100 determines whether the path type is “f” or “d”. If “f”, the process proceeds to step S1505-7, and if “d”, the process proceeds to step S1505-11. (Step S1505-6). Here, when the path type is “f”, the master server 100 executes the process of step S1505-9 described later for all chunk identifiers in the chunk identifier list.

初めに、当該チャンク識別子リスト内のチャンク識別子の位置を示すインデックス２を当該チャンク識別子リストの先頭に設定する（ステップＳ１５０５−７）。 First, index 2 indicating the position of the chunk identifier in the chunk identifier list is set at the head of the chunk identifier list (step S1505-7).

そして、マスタサーバ１００は当該チャンク識別子リスト内のすべてのチャンク識別子の処理が完了したら、ステップＳ１５０５−１１へ、そうでなければ、ステップＳ１５０５−９へ遷移する（ステップＳ１５０５−８）。 The master server 100 proceeds to step S1505-11 when processing of all chunk identifiers in the chunk identifier list is completed, and proceeds to step S1505-9 otherwise (step S1505-8).

マスタサーバ１００は当該チャンク識別子リスト内のすべてのチャンク識別子の処理が完了していなければ、ファイル管理部１１０のチャンクテーブル１１２における当該チャンク識別子リスト内のインデックス２の位置のチャンク識別子に対する参照カウントの値を１増加させる（ステップＳ１５０５−９）。 If the master server 100 has not processed all the chunk identifiers in the chunk identifier list, the value of the reference count for the chunk identifier at the position of index 2 in the chunk identifier list in the chunk table 112 of the file management unit 110. Is increased by 1 (step S1505-9).

そして、マスタサーバ１００はインデックス２を当該チャンク識別子リスト内の次の位置に設定する（ステップＳ１５０５−１０）。マスタサーバ１００は当該チャンク識別子リスト内のすべてのチャンク識別子の処理が完了したら、インデックス１を当該スナップショット元配下パス名リスト内の次の位置に設定する（ステップＳ１５０５−１１）。 Then, the master server 100 sets the index 2 to the next position in the chunk identifier list (step S1505-10). When the master server 100 completes the processing of all the chunk identifiers in the chunk identifier list, the master server 100 sets the index 1 to the next position in the snapshot source subordinate path name list (step S1505-11).

図１５の説明に戻って、ステップＳ１５０６において、マスタサーバ１００は外部アプリケーション７００へスナップショット生成応答を送信する（ステップＳ１５０６）。 Returning to the description of FIG. 15, in step S1506, the master server 100 transmits a snapshot generation response to the external application 700 (step S1506).

［書き込み処理の一例］
［書き込み処理における外部アプリケーション７００の処理の一例］
図１８は、第１の実施形態における書き込み処理において、外部アプリケーション７００がファイルへの書き込みを行うときの動作の一例を示すフローチャートである。 [Example of writing process]
[Example of Processing of External Application 700 in Write Processing]
FIG. 18 is a flowchart illustrating an example of an operation when the external application 700 writes to a file in the writing process according to the first embodiment.

外部アプリケーション７００はマスタサーバ１００へチャンク情報取得要求と、ファイルパス名と、書き込み位置（ファイルの先頭からのオフセット）とを送信する（ステップＳ１６０１）。 The external application 700 transmits a chunk information acquisition request, a file path name, and a writing position (offset from the beginning of the file) to the master server 100 (step S1601).

そして、外部アプリケーション７００はマスタサーバ１００から当該書き込み位置に対応するチャンクのチャンク識別子（以降、書き込み対象チャンク識別子）と、当該書き込み対象チャンク識別子で識別されるチャンクを保持するデータサーバ（以降、書き込み対象チャンク保持データサーバ）のデータサーバ識別子のリスト（以降、書き込み対象チャンク保持データサーバ識別子リスト）とを受信する（ステップＳ１６０２）。 Then, the external application 700 sends a chunk server identifier (hereinafter referred to as a write target chunk identifier) corresponding to the write position from the master server 100 and a data server (hereinafter referred to as a write target) that holds the chunk identified by the write target chunk identifier. A list of data server identifiers of the chunk holding data server (hereinafter referred to as a write target chunk holding data server identifier list) is received (step S1602).

続いて、外部アプリケーション７００は当該書き込み対象チャンク保持データサーバ識別子リスト内のデータサーバ識別子で識別されるデータサーバから最も距離の近いデータサーバを選択し、選択したデータサーバへデータ送信要求と、当該書き込み対象チャンク識別子と、書き込みデータと、当該書き込み対象チャンク保持データサーバ識別子リストとを送信する（ステップＳ１６０３）。 Subsequently, the external application 700 selects a data server closest to the data server identified by the data server identifier in the write target chunk holding data server identifier list, and sends a data transmission request to the selected data server and the write The target chunk identifier, write data, and the write target chunk holding data server identifier list are transmitted (step S1603).

データ送信要求を受信したデータサーバは、まだデータ送信要求を受信していない当該書き込み対象チャンク保持データサーバのうち、最も距離が近いデータサーバへデータ送信要求を送信する。以降、すべての当該書き込み対象チャンク保持データサーバがデータ送信要求を受信するまでデータ送信要求の送信を繰り返す。データサーバがデータ送信要求を受信したときの動作については後述する。 The data server that has received the data transmission request transmits the data transmission request to the data server having the shortest distance among the write target chunk holding data servers that have not yet received the data transmission request. Thereafter, the transmission of the data transmission request is repeated until all the write target chunk holding data servers receive the data transmission request. The operation when the data server receives the data transmission request will be described later.

二つのデータサーバ間の距離の例として、当該二つのデータサーバが動作しているコンピュータのネットワークインタフェースに割り当てられているＩＰアドレスの排他的論理和の自然対数に１を加算した値とする。この例では、二つのＩＰアドレスの上位ビットが異なるほど距離が大きくなる。ここでは、例として、最も距離の近いデータサーバとしてデータサーバ２００を選択したとする。 As an example of the distance between two data servers, a value obtained by adding 1 to the natural logarithm of the exclusive OR of the IP addresses assigned to the network interface of the computer on which the two data servers are operating. In this example, the distance increases as the upper bits of the two IP addresses differ. Here, as an example, it is assumed that the data server 200 is selected as the closest data server.

そして、外部アプリケーション７００はデータサーバ２００からデータ送信応答を受信する（ステップＳ１６０４）。続いて、外部アプリケーション７００はプライマリデータサーバへ書き込み要求と、当該書き込み対象チャンク識別子と、当該書き込み位置とを送信する（ステップＳ１６０５）。 Then, the external application 700 receives a data transmission response from the data server 200 (step S1604). Subsequently, the external application 700 transmits a write request, the write target chunk identifier, and the write position to the primary data server (step S1605).

プライマリデータサーバとは、マスタサーバ１００からチャンクへの書き込みに関する制御を委譲されたデータサーバであり、当該書き込み対象チャンク保持データサーバ識別子リスト内のデータサーバ識別子で識別されるデータサーバの中からマスタサーバ１００が選択する。また、当該書き込み対象チャンク保持データサーバ識別子リスト内のデータサーバ識別子で識別されるデータサーバのうち、プライマリデータサーバ以外のデータサーバをセカンダリデータサーバとする。プライマリデータサーバの選択方法については後述する。ここでは、例として、プライマリデータサーバとしてデータサーバ４００が選択されたとする。そして、外部アプリケーション７００はデータサーバ４００から書き込み応答を受信する（ステップＳ１６０６）。 The primary data server is a data server to which control related to writing to the chunk is delegated from the master server 100, and the master server is selected from the data servers identified by the data server identifier in the chunk storage data server identifier list to be written. 100 selects. Moreover, data servers other than the primary data server among the data servers identified by the data server identifier in the write target chunk holding data server identifier list are set as secondary data servers. A method for selecting the primary data server will be described later. Here, as an example, it is assumed that the data server 400 is selected as the primary data server. Then, the external application 700 receives a write response from the data server 400 (step S1606).

［書き込み処理におけるマスタサーバ１００の処理の一例］
図１９は、第１の実施形態における書き込み処理において、マスタサーバ１００が外部アプリケーション７００からチャンク情報取得要求を受信したときの動作の一例を示すフローチャートである。 [Example of processing of master server 100 in write processing]
FIG. 19 is a flowchart illustrating an example of an operation when the master server 100 receives a chunk information acquisition request from the external application 700 in the writing process according to the first embodiment.

マスタサーバ１００は外部アプリケーション７００からチャンク情報取得要求と、ファイルパス名と、書き込み位置とを受信する（ステップＳ１７０１）。そして、マスタサーバ１００はファイル管理部１１０のファイルテーブル１１１から当該ファイルパス名に対するチャンク識別子リストを取得する（ステップＳ１７０２）。 The master server 100 receives a chunk information acquisition request, a file path name, and a writing position from the external application 700 (step S1701). Then, the master server 100 acquires a chunk identifier list for the file path name from the file table 111 of the file management unit 110 (step S1702).

マスタサーバ１００は当該チャンク識別子リスト内に当該書き込み位置に対応するチャンク識別子（以降、書き込み位置チャンク識別子）が存在すれば、ステップＳ１７０４へ、そうでなければ、ステップＳ１７０７へ遷移する（ステップＳ１７０３）。コピーオンライトを発生させる書き込みでは、当該書き込み位置チャンク識別子が存在するため、ステップＳ１７０４へ遷移することになる。 If there is a chunk identifier corresponding to the write position (hereinafter, write position chunk identifier) in the chunk identifier list, the master server 100 proceeds to step S1704; otherwise, the master server 100 proceeds to step S1707 (step S1703). In writing that causes copy-on-write, since the writing position chunk identifier exists, the process proceeds to step S1704.

そして、マスタサーバ１００はファイル管理部１１０のチャンクテーブル１１２における当該書き込み位置チャンク識別子に対する参照カウントが２以上ならば、ステップＳ１７０５へ、そうでなければ、ステップＳ１７０６へ遷移する（ステップＳ１７０４）。 If the reference count for the write position chunk identifier in the chunk table 112 of the file management unit 110 is 2 or more, the master server 100 proceeds to step S1705; otherwise, the master server 100 proceeds to step S1706 (step S1704).

コピーオンライトを発生させる書き込みでは、当該書き込み位置チャンク識別子で識別されるチャンクは複数のファイルに共有されており、当該書き込み位置チャンク識別子に対する参照カウントは２以上であるため、ステップＳ１７０５へ遷移することになる。マスタサーバ１００はコピーオンライト処理（ステップＳ１７０５−１〜ステップＳ１７０５−１３）を実行する（ステップＳ１７０５）。 In writing that causes copy-on-write, the chunk identified by the writing position chunk identifier is shared by a plurality of files, and the reference count for the writing position chunk identifier is 2 or more, so the process proceeds to step S1705. become. The master server 100 executes a copy-on-write process (steps S1705-1 to S1705-13) (step S1705).

ここで、図２０を用いて、コピーオンライト処理について説明する。マスタサーバ１００はファイル管理部１１０のチャンクテーブル１１２から当該書き込み位置チャンク識別子に対するデータサーバ識別子リスト（以降、コピー対象チャンク保持データサーバ識別子リスト）を取得する（ステップＳ１７０５−１）。ここでは、例として、コピー対象チャンク保持データサーバ識別子リストとしてｄｓ２、ｄｓ４、ｄｓ５を取得したとする。 Here, the copy-on-write process will be described with reference to FIG. The master server 100 acquires a data server identifier list (hereinafter, copy target chunk holding data server identifier list) for the write position chunk identifier from the chunk table 112 of the file management unit 110 (step S1705-1). Here, as an example, it is assumed that ds2, ds4, and ds5 are acquired as the copy target chunk holding data server identifier list.

そして、マスタサーバ１００は新たなチャンク識別子（以降、書き込み対象チャンク識別子）と、空のデータサーバ識別子リスト（以降、書き込み対象チャンク保持データサーバ識別子リスト）を生成する（ステップＳ１７０５−２）。 Then, the master server 100 generates a new chunk identifier (hereinafter, write target chunk identifier) and an empty data server identifier list (hereinafter, write target chunk holding data server identifier list) (step S1705-2).

マスタサーバ１００は当該コピー対象チャンク保持データサーバ識別子リスト内のすべてのデータサーバ識別子（以降、コピー対象チャンク保持データサーバ識別子）に対してステップＳ１７０５−５からステップＳ１７０５−１１の処理を実行する。初めに、当該コピー対象チャンク保持データサーバ識別子リスト内のコピー対象チャンク保持データサーバ識別子の位置を示すインデックスを当該コピー対象チャンク保持データサーバ識別子リスト内の先頭に設定する（ステップＳ１７０５−３）。 The master server 100 executes the processing from step S1705-5 to step S1705-11 for all data server identifiers in the copy target chunk holding data server identifier list (hereinafter, copy target chunk holding data server identifier). First, an index indicating the position of the copy target chunk holding data server identifier in the copy target chunk holding data server identifier list is set to the head in the copy target chunk holding data server identifier list (step S1705-3).

マスタサーバ１００は当該コピー対象チャンク保持データサーバ識別子リスト内のすべてのコピー対象チャンク保持データサーバ識別子の処理が完了したら、ステップＳ１７０５−１３へ、そうでなければ、ステップＳ１７０５−５へ遷移する（ステップＳ１７０５−４）。 The master server 100 proceeds to step S1705-13 when processing of all the copy target chunk holding data server identifiers in the copy target chunk holding data server identifier list is completed, and to step S1705-5 otherwise (step S1705-5). S1705-4).

マスタサーバ１００は当該コピー対象チャンク保持データサーバ識別子リスト内のすべてのコピー対象チャンク保持データサーバ識別子の処理が完了していなければ、当該コピー対象チャンク保持データサーバ識別子リスト内のインデックスの位置にあるコピー対象チャンク保持データサーバ識別子に対して、コピーオンライト戦略に基づいて、コピーされるチャンクを保持するデータサーバ（以降、書き込み対象チャンク保持データサーバ）を決定する（ステップＳ１７０５−５）。 If processing of all copy target chunk holding data server identifiers in the copy target chunk holding data server identifier list has not been completed, the master server 100 copies at the index position in the copying target chunk holding data server identifier list. Based on the copy-on-write strategy for the target chunk holding data server identifier, the data server holding the chunk to be copied (hereinafter, write target chunk holding data server) is determined (step S1705-5).

コピーオンライト戦略の例として、コピー対象チャンク保持データサーバ識別子で識別されるデータサーバのデータ記憶部の空き容量が定められた閾値以上ならば、当該データサーバ自身を書き込み対象チャンク保持データサーバとして選択し、そうでなければ、すでに書き込み対象チャンク保持データサーバとして選択されたデータサーバが動作するコンピュータと同一のラックに収納されているコンピュータ上で動作するデータサーバを除いて、データ記憶部の空き容量が定められた閾値以上のデータサーバのうち、当該コピー対象チャンク保持データサーバ識別子リスト内のコピー対象チャンク保持データサーバ識別子で識別されるデータサーバ各々との距離の最小値が最も小さいデータサーバや、最も負荷状態の低いデータサーバや、保持しているチャンクのうち、参照カウントの値が２以上のチャンクの数が定められた閾値未満のデータサーバを書き込み対象チャンク保持データサーバとして選択する。また、各データサーバにおけるデータ記憶部の空き容量、コピーされるチャンクを保持しているデータサーバと各データサーバとの距離、各データサーバの負荷状態、各データサーバにおける複数のファイルに共有されているチャンクの数のいずれか一つまたは複数に基づいて選択することもできる。 As an example of a copy-on-write strategy, if the free space of the data storage unit of the data server identified by the copy target chunk holding data server identifier is equal to or greater than a predetermined threshold, the data server itself is selected as the writing target chunk holding data server Otherwise, the free space in the data storage unit, except for the data server operating on the computer stored in the same rack as the computer on which the data server already selected as the write target chunk holding data server operates Among the data servers that are equal to or greater than a predetermined threshold, the data server having the smallest distance between each data server identified by the copy target chunk holding data server identifier in the copy target chunk holding data server identifier list, The data server with the lowest load, Of the chunk held, the value of the reference count is selected as the target chunk held data server writes data servers less than the threshold number has been determined of 2 or more chunks. In addition, the free space of the data storage unit in each data server, the distance between the data server holding the chunk to be copied and each data server, the load state of each data server, shared by a plurality of files in each data server It can also be selected based on any one or more of the number of chunks present.

ここで、各データサーバに対する参照カウントの値が２以上のチャンクの数を高速に取得するために、マスタサーバ１００は、各データサーバに対して、当該データサーバのデータサーバ識別子と、当該データサーバ識別子で識別されるデータサーバにおける参照カウントの値が２以上のチャンクの数とを保持するテーブルを保持してもよい。 Here, in order to obtain at high speed the number of chunks having a reference count value of 2 or more for each data server, the master server 100 sends the data server identifier of the data server and the data server to each data server. You may hold | maintain the table holding the value of the reference count in the data server identified by an identifier with the number of chunks 2 or more.

分散ファイルシステムがあらかじめ一つ以上のコピーオンライト戦略を用意し、ユーザがそれらから利用するコピーオンライト戦略を選択したり、ユーザが独自のコピーオンライト戦略を定義して利用したりしてもよい。 Even if the distributed file system prepares one or more copy-on-write strategies in advance and the user selects a copy-on-write strategy to use from them, or the user defines and uses his own copy-on-write strategy Good.

ここでは、例として、コピー対象チャンク保持データサーバ識別子がｄｓ２のときは書き込み対象チャンク保持データサーバとしてデータサーバ識別子ｄｓ２で識別されるデータサーバ２００が、コピー対象チャンク保持データサーバ識別子がｄｓ４のときは書き込み対象チャンク保持データサーバとしてデータサーバ識別子ｄｓ４で識別されるデータサーバ４００が、コピー対象チャンク保持データサーバ識別子がｄｓ５のときは書き込み対象チャンク保持データサーバとしてデータサーバ識別子ｄｓ６で識別されるデータサーバ６００が、それぞれ選択されたとする。 Here, as an example, when the copy target chunk holding data server identifier is ds2, the data server 200 identified by the data server identifier ds2 as the write target chunk holding data server, and when the copy target chunk holding data server identifier is ds4, The data server 400 identified by the data server identifier ds4 as the write target chunk holding data server, and the data server 600 identified by the data server identifier ds6 as the writing target chunk holding data server when the copy target chunk holding data server identifier is ds5 Are selected.

そして、マスタサーバ１００は当該コピー対象チャンク保持データサーバ識別子と当該書き込み対象チャンク保持データサーバのデータサーバ識別子（以降、書き込み対象チャンク保持データサーバ識別子）とが同一ならば、ステップＳ１７０５−７へ、そうでなければ、ステップＳ１７０５−９へ遷移する（ステップＳ１７０５−６）。 If the copy target chunk holding data server identifier and the data server identifier of the writing target chunk holding data server (hereinafter, the writing target chunk holding data server identifier) are the same, the master server 100 moves to step S1705-7. Otherwise, the process proceeds to step S1705-9 (step S1705-6).

ここでは、コピー対象チャンク保持データサーバ識別子がｄｓ２、または、ｄｓ４のときはステップＳ１７０５−７へ、コピー対象チャンク保持データサーバ識別子がｄｓ５のときはステップＳ１７０５−９へ遷移することになる。 Here, when the copy target chunk holding data server identifier is ds2 or ds4, the process proceeds to step S1705-7, and when the copy target chunk holding data server identifier is ds5, the process proceeds to step S1705-9.

マスタサーバ１００は当該コピー対象チャンク保持データサーバ識別子と当該書き込み対象チャンク保持データサーバ識別子とが同一ならば、当該書き込み位置チャンク識別子をコピー元チャンク識別子とし、書き込み対象チャンク保持データサーバ識別子ｄｓ２（ｄｓ４）で識別される書き込み対象チャンク保持データサーバ２００（４００）へローカルチャンクコピー要求と、当該コピー元チャンク識別子と、当該書き込み対象チャンク識別子とを送信する（ステップＳ１７０５−７）。 If the copy target chunk holding data server identifier and the writing target chunk holding data server identifier are the same, the master server 100 sets the writing position chunk identifier as the copy source chunk identifier, and writes the chunk holding data server identifier ds2 (ds4). The local chunk copy request, the copy source chunk identifier, and the write target chunk identifier are transmitted to the write target chunk holding data server 200 (400) identified in step S1705-7 (step S1705-7).

そして、マスタサーバ１００は書き込み対象チャンク保持データサーバ２００（４００）からローカルチャンクコピー応答を受信する（ステップＳ１７０５−８）。 Then, the master server 100 receives a local chunk copy response from the write target chunk holding data server 200 (400) (step S1705-8).

マスタサーバ１００は当該コピー対象チャンク保持データサーバ識別子と当該書き込み対象チャンク保持データサーバ識別子とが同一でないならば、当該書き込み位置チャンク識別子をコピー元チャンク識別子とし、書き込み対象チャンク保持データサーバ識別子ｄｓ６で識別される書き込み対象チャンク保持データサーバ６００へリモートチャンクコピー要求と、当該コピー元チャンク識別子と、当該書き込み対象チャンク識別子と、当該コピー対象チャンク保持データサーバ識別子リストとを送信する（ステップＳ１７０５−９）。 If the copy target chunk holding data server identifier and the writing target chunk holding data server identifier are not the same, the master server 100 uses the writing position chunk identifier as a copy source chunk identifier and is identified by the writing target chunk holding data server identifier ds6. The remote chunk copy request, the copy source chunk identifier, the write target chunk identifier, and the copy target chunk holding data server identifier list are transmitted to the write target chunk holding data server 600 (step S1705-9).

そして、マスタサーバ１００は書き込み対象チャンク保持データサーバ６００からリモートチャンクコピー応答を受信する（ステップＳ１７０５−１０）。 Then, the master server 100 receives a remote chunk copy response from the write target chunk holding data server 600 (step S1705-10).

マスタサーバ１００はローカルチャンクコピー応答を受信するか、リモートチャンクコピーを受信すると、当該書き込み対象チャンク保持データサーバ識別子リストへ当該書き込み対象チャンク保持データサーバ識別子ｄｓ２（ｄｓ４、ｄｓ６）を追加する（ステップＳ１７０５−１１）。 When the master server 100 receives the local chunk copy response or receives the remote chunk copy, the master server 100 adds the write target chunk holding data server identifier ds2 (ds4, ds6) to the write target chunk holding data server identifier list (step S1705). -11).

そして、マスタサーバ１００はインデックスを当該コピー対象チャンク保持データサーバ識別子リスト内の次の位置に設定する（ステップＳ１７０５−１２）。 Then, the master server 100 sets the index to the next position in the copy target chunk holding data server identifier list (step S1705-12).

マスタサーバ１００はコピー対象チャンク保持データサーバ識別子リスト内のすべてのコピー対象チャンク保持データサーバ識別子の処理が完了したら、チャンクテーブル１１２を更新する（ステップＳ１７０５−１３）。具体的には、マスタサーバ１００はファイル管理部１１０のチャンクテーブル１１２に当該書き込み対象チャンク識別子と、参照カウント「１」と、当該書き込み対象チャンク保持データサーバ識別子リストとを登録する。また、マスタサーバ１００はファイル管理部１１０のチャンクテーブル１１２における当該書き込み位置チャンク識別子に対する参照カウントの値を１減少させる。さらに、マスタサーバ１００はファイル管理部１１０のファイルテーブル１１１における当該ファイルパス名に対するチャンク識別子リスト内の当該書き込み位置チャンク識別子を当該書き込み対象チャンク識別子で置き換える。 When the master server 100 completes the processing of all the copy target chunk holding data server identifiers in the copy target chunk holding data server identifier list, the master server 100 updates the chunk table 112 (step S1705-13). Specifically, the master server 100 registers the write target chunk identifier, the reference count “1”, and the write target chunk holding data server identifier list in the chunk table 112 of the file management unit 110. Also, the master server 100 decreases the reference count value for the write position chunk identifier in the chunk table 112 of the file management unit 110 by one. Further, the master server 100 replaces the write position chunk identifier in the chunk identifier list for the file path name in the file table 111 of the file management unit 110 with the write target chunk identifier.

図１９の説明に戻って、マスタサーバ１００はプライマリデータサーバ、セカンダリデータサーバを選択する（ステップＳ１７０８）。具体的には、マスタサーバ１００は当該書き込み対象チャンク保持データサーバ識別子リスト内から任意の書き込み対象チャンク保持データサーバ識別子を一つ選択し、当該書き込み対象チャンク保持データサーバ識別子で識別されるデータサーバをこの書き込みにおけるプライマリデータサーバとする。また、当該書き込み対象チャンク保持データサーバ識別子リスト内のその他の書き込み対象チャンク保持データサーバ識別子で識別されるデータサーバをセカンダリデータサーバとする。 Returning to the description of FIG. 19, the master server 100 selects a primary data server and a secondary data server (step S1708). Specifically, the master server 100 selects one arbitrary write target chunk holding data server identifier from the write target chunk holding data server identifier list, and selects the data server identified by the writing target chunk holding data server identifier. The primary data server in this writing is assumed. In addition, a data server identified by another write target chunk holding data server identifier in the write target chunk holding data server identifier list is set as a secondary data server.

当該書き込み対象チャンク保持データサーバ識別子リスト内で、どの書き込み対象チャンク保持データサーバ識別子がプライマリデータサーバのデータサーバ識別子であるかを判別できるように、例えば、プライマリデータサーバのデータサーバ識別子は当該書き込み対象チャンク保持データサーバ識別子リスト内の先頭に位置するようにする。その他の方法でプライマリデータサーバのデータサーバ識別子を判定できるようにしてもよい。 In order to be able to determine which write target chunk holding data server identifier is the data server identifier of the primary data server in the write target chunk holding data server identifier list, for example, the data server identifier of the primary data server is the writing target It should be positioned at the top of the chunk holding data server identifier list. The data server identifier of the primary data server may be determined by other methods.

ここでは、例として、データサーバ識別子ｄｓ４で識別されるデータサーバ４００がプライマリデータサーバとして、データサーバ識別子ｄｓ２で識別されるデータサーバ２００と、データサーバ識別子ｄｓ６で識別されるデータサーバ６００とがセカンダリデータサーバとして、それぞれ選択されたとする。 Here, as an example, the data server 400 identified by the data server identifier ds4 is the primary data server, and the data server 200 identified by the data server identifier ds2 and the data server 600 identified by the data server identifier ds6 are secondary. Assume that each data server is selected.

そして、マスタサーバ１００はデータサーバ４００に書き込み制御委譲要求と、当該書き込み対象チャンク識別子と、当該書き込み対象チャンク保持データサーバ識別子リストとを送信する（ステップＳ１７０９）。 Then, the master server 100 transmits a write control delegation request, the write target chunk identifier, and the write target chunk holding data server identifier list to the data server 400 (step S1709).

そして、マスタサーバ１００はデータサーバ４００から書き込み制御委譲応答を受信する（ステップＳ１７１０）。続いて、マスタサーバ１００は外部アプリケーション７００へチャンク情報として、当該書き込み対象チャンク識別子と、当該書き込み対象チャンク保持データサーバ識別子リストとを送信する（ステップＳ１７１１）。 Then, the master server 100 receives a write control delegation response from the data server 400 (step S1710). Subsequently, the master server 100 transmits the write target chunk identifier and the write target chunk holding data server identifier list as chunk information to the external application 700 (step S1711).

コピーオンライトを発生させる書き込みでは、ステップＳ１７０６とステップＳ１７０７は実行されない。ステップＳ１７０６は参照カウントの値が１であり、複数のファイルに共有されていないチャンクへの書き込み処理、ステップＳ１７０７はチャンクが割り当てられていない書き込み位置への書き込み処理である。 In writing that causes copy-on-write, steps S1706 and S1707 are not executed. Step S1706 is a write process to a chunk that has a reference count value of 1 and is not shared by a plurality of files, and step S1707 is a write process to a write position to which no chunk is allocated.

ここで、図２１を用いて、前述した既存チャンク処理について説明する。マスタサーバ１００は既存チャンク情報を取得する（ステップＳ１７０６−１）。具体的には、マスタサーバ１００は当該書き込み位置チャンク識別子を書き込み対象チャンク識別子とする。また、マスタサーバ１００はファイル管理部１１０のチャンクテーブル１１２における書き込み対象チャンク識別子に対するデータサーバ識別子リストを取得し、書き込み対象チャンク保持データサーバ識別子リストとする。 Here, the above-described existing chunk processing will be described with reference to FIG. The master server 100 acquires existing chunk information (step S1706-1). Specifically, the master server 100 sets the write position chunk identifier as a write target chunk identifier. Further, the master server 100 acquires a data server identifier list for the write target chunk identifier in the chunk table 112 of the file management unit 110 and sets it as a write target chunk holding data server identifier list.

ここで、図２２を用いて、前述した新規チャンク処理について説明する。マスタサーバ１００は新規チャンク処理（ステップＳ１７０７−１〜ステップＳ１７０７−９）を実行する。 Here, the above-described new chunk processing will be described with reference to FIG. The master server 100 executes new chunk processing (step S1707-1 to step S1707-9).

まず、マスタサーバ１００は新たなチャンク識別子（以降、書き込み対象チャンク識別子）と、空のデータサーバ識別子リスト（以降、書き込み対象チャンク保持データサーバ識別子リスト）を生成する（ステップＳ１７０７−１）。 First, the master server 100 generates a new chunk identifier (hereinafter, write target chunk identifier) and an empty data server identifier list (hereinafter, write target chunk holding data server identifier list) (step S1707-1).

そして、マスタサーバ１００は冗長度の数だけステップＳ１７０７−４からステップＳ１７０７−７の処理を実行する。初めに、ループカウントの値を０に設定する（ステップＳ１７０７−２）。続いて、ループカウントの値が冗長度の数より小さければ、ステップＳ１７０７−４へ、そうでなければ、ステップＳ１７０７−９へ遷移する（ステップＳ１７０７−３）。 Then, the master server 100 executes the processing from step S1707-4 to step S1707-7 by the number of redundancy. First, the loop count value is set to 0 (step S1707-2). Subsequently, if the loop count value is smaller than the redundancy number, the process proceeds to step S1707-4, and if not, the process proceeds to step S1707-9 (step S1707-3).

マスタサーバ１００はループカウントの値が冗長度の数より小さければ、新たに生成するチャンクを保持する書き込み対象チャンク保持データサーバを決定する（ステップＳ１７０７−４）。例えば、データ記憶部の空き容量が定められた閾値以上、かつ、保持しているチャンクのうち、参照カウントの値が２以上のチャンクの数が定められた閾値未満のデータサーバのなかから、それぞれ異なるラックに収納されたコンピュータ上で動作するデータサーバを書き込み対象チャンク保持データサーバとして選択する。 If the value of the loop count is smaller than the number of redundancy levels, the master server 100 determines a write target chunk holding data server that holds a newly generated chunk (step S1707-4). For example, each of the data servers in which the free capacity of the data storage unit is equal to or greater than a predetermined threshold and the number of chunks having a reference count value of 2 or more among the retained chunks is less than the predetermined threshold. A data server operating on a computer housed in a different rack is selected as a write target chunk holding data server.

そして、マスタサーバ１００は当該書き込み対象チャンク保持データサーバへチャンク生成要求と、当該書き込み対象チャンク識別子とを送信する（ステップＳ１７０７−５）。続いて、マスタサーバ１００は当該書き込み対象チャンク保持データサーバからチャンク生成応答を受信する（ステップＳ１７０７−６）。 Then, the master server 100 transmits a chunk generation request and the write target chunk identifier to the write target chunk holding data server (step S1707-5). Subsequently, the master server 100 receives a chunk generation response from the write target chunk holding data server (step S1707-6).

そして、マスタサーバ１００は当該書き込み対象チャンク保持データサーバ識別子リストへ当該書き込み対象チャンク保持データサーバ識別子を追加する（ステップＳ１７０７−７）。続いて、マスタサーバ１００はループカウントの値を１増加させる（ステップＳ１７０７−８）。 Then, the master server 100 adds the write target chunk holding data server identifier to the write target chunk holding data server identifier list (step S1707-7). Subsequently, the master server 100 increases the value of the loop count by 1 (step S1707-8).

マスタサーバ１００はループカウントの値が冗長度の数以上ならば、チャンクテーブル１１２を更新する（ステップＳ１７０７−９）。具体的には、マスタサーバ１００はファイル管理部１１０のチャンクテーブル１１２に当該書き込み対象チャンク識別子と、参照カウント「１」と、当該書き込み対象チャンク保持データサーバ識別子リストとを登録する。また、マスタサーバ１００はファイル管理部１１０のファイルテーブル１１１における当該ファイルパス名に対するチャンク識別子リスト内の当該書き込み位置に当該書き込み対象チャンク識別子を追加する。 If the value of the loop count is equal to or greater than the number of redundancy levels, the master server 100 updates the chunk table 112 (step S1707-9). Specifically, the master server 100 registers the write target chunk identifier, the reference count “1”, and the write target chunk holding data server identifier list in the chunk table 112 of the file management unit 110. Further, the master server 100 adds the write target chunk identifier to the write position in the chunk identifier list for the file path name in the file table 111 of the file management unit 110.

［書き込み処理におけるデータサーバの処理の一例］
図２３は、第１の実施形態における書き込み処理において、データサーバ２００（４００）がマスタサーバ１００からローカルチャンクコピー要求を受信したときの動作の一例を示すフローチャートである。 [Example of data server processing in write processing]
FIG. 23 is a flowchart illustrating an example of an operation when the data server 200 (400) receives a local chunk copy request from the master server 100 in the writing process according to the first embodiment.

データサーバ２００（４００）はマスタサーバ１００からローカルチャンクコピー要求と、コピー元チャンク識別子と、書き込み対象チャンク識別子とを受信する（ステップＳ１８０１）。 The data server 200 (400) receives a local chunk copy request, a copy source chunk identifier, and a write target chunk identifier from the master server 100 (step S1801).

そして、データサーバ２００（４００）はローカルにチャンクをコピーする（ステップＳ１８０２）。具体的には、データサーバ２００（４００）はデータ記憶部２１０（４１０）のチャンク情報テーブル２１１（４１１）から当該コピー元チャンク識別子に対するチャンクの格納場所とチャンクのサイズとを取得する。データサーバ２００（４００）は当該チャンク格納場所に格納されているチャンクをコピーし、データ記憶部２１０（４１０）に格納する。また、データサーバ２００（４００）はデータ記憶部２１０（４１０）のチャンク情報テーブル２１１（４１１）に当該書き込み対象チャンク識別子と、コピーしたチャンクの格納場所と、当該チャンクのサイズとを登録する。 Then, the data server 200 (400) copies the chunk locally (step S1802). Specifically, the data server 200 (400) acquires the chunk storage location and chunk size for the copy source chunk identifier from the chunk information table 211 (411) of the data storage unit 210 (410). The data server 200 (400) copies the chunk stored in the chunk storage location and stores it in the data storage unit 210 (410). Further, the data server 200 (400) registers the write target chunk identifier, the storage location of the copied chunk, and the size of the chunk in the chunk information table 211 (411) of the data storage unit 210 (410).

続いて、データサーバ２００（４００）はマスタサーバ１００へローカルチャンクコピー応答を送信する（ステップＳ１８０３）。 Subsequently, the data server 200 (400) transmits a local chunk copy response to the master server 100 (step S1803).

図２４は、第１の実施形態における書き込み処理において、データサーバ６００がマスタサーバ１００からリモートチャンクコピー要求を受信したときの動作の一例を示すフローチャートである。 FIG. 24 is a flowchart illustrating an example of an operation when the data server 600 receives a remote chunk copy request from the master server 100 in the writing process according to the first embodiment.

データサーバ６００はマスタサーバ１００からリモートチャンクコピー要求と、コピー元チャンク識別子と、書き込み対象チャンク識別子と、コピー対象チャンク保持データサーバ識別子リストとを受信する（ステップＳ１９０１）。 The data server 600 receives a remote chunk copy request, a copy source chunk identifier, a write target chunk identifier, and a copy target chunk holding data server identifier list from the master server 100 (step S1901).

データサーバ６００は当該コピー対象チャンク保持データサーバ識別子リスト内のコピー対象チャンク保持データサーバ識別子で識別されるデータサーバからチャンクコピー元データサーバを選択する（ステップＳ１９０２）。例えば、最も距離の近いデータサーバをチャンクコピー元データサーバとする。ここでは、例として、データサーバ５００をチャンクコピー元データサーバとして選択したとする。 The data server 600 selects a chunk copy source data server from the data servers identified by the copy target chunk holding data server identifier in the copy target chunk holding data server identifier list (step S1902). For example, the closest data server is set as the chunk copy source data server. Here, as an example, it is assumed that the data server 500 is selected as the chunk copy source data server.

そして、データサーバ６００はデータサーバ５００へチャンク読み出し要求と、当該コピー元チャンク識別子とを送信する（ステップＳ１９０３）。続いて、データサーバ６００はデータサーバ５００からチャンクを受信する（ステップＳ１９０４）。 Then, the data server 600 transmits a chunk read request and the copy source chunk identifier to the data server 500 (step S1903). Subsequently, the data server 600 receives a chunk from the data server 500 (step S1904).

そして、データサーバ６００はデータ記憶部６１０に受信したチャンクを格納する（ステップＳ１９０５）。また、データサーバ６００はデータ記憶部６１０のチャンク情報テーブル６１１に当該書き込み対象チャンク識別子と、格納したチャンクの格納場所と、当該チャンクのサイズとを登録する。続いて、データサーバ６００はマスタサーバ１００へリモートチャンクコピー応答を送信する（ステップＳ１９０６）。 The data server 600 stores the received chunk in the data storage unit 610 (step S1905). Further, the data server 600 registers the write target chunk identifier, the storage location of the stored chunk, and the size of the chunk in the chunk information table 611 of the data storage unit 610. Subsequently, the data server 600 transmits a remote chunk copy response to the master server 100 (step S1906).

図２５は、第１の実施形態における書き込み処理において、データサーバ５００がデータサーバ６００からチャンク読み出し要求を受信したときの動作の一例を示すフローチャートである。 FIG. 25 is a flowchart illustrating an example of an operation when the data server 500 receives a chunk read request from the data server 600 in the writing process according to the first embodiment.

データサーバ５００はデータサーバ６００からチャンク読み出し要求と、コピー元チャンク識別子とを受信する（ステップＳ２００１）。そして、データサーバ５００はデータ記憶部５１０のチャンク情報テーブル５１１から当該コピー元チャンク識別子に対するチャンクの格納場所を取得し、データ記憶部５１０から当該チャンクを読み出す（ステップＳ２００２）。続いて、データサーバ５００は当該チャンクをデータサーバ６００へ送信する（ステップＳ２００３）。 The data server 500 receives the chunk read request and the copy source chunk identifier from the data server 600 (step S2001). The data server 500 acquires the chunk storage location for the copy source chunk identifier from the chunk information table 511 of the data storage unit 510, and reads the chunk from the data storage unit 510 (step S2002). Subsequently, the data server 500 transmits the chunk to the data server 600 (step S2003).

図２６は、第１の実施形態における書き込み処理において、データサーバがマスタサーバ１００からチャンク生成要求を受信したときの動作の一例を示すフローチャートである。なお、コピーオンライトを発生させる書き込みでは、図２６に示す処理は実行されない。図２６に示す処理は、チャンクが割り当てられていない位置への書き込みである図１９のステップＳ１７０７を実行したときに実行される。 FIG. 26 is a flowchart illustrating an example of an operation when the data server receives a chunk generation request from the master server 100 in the writing process according to the first embodiment. Note that the processing shown in FIG. 26 is not executed in the writing that causes copy-on-write. The processing shown in FIG. 26 is executed when step S1707 of FIG. 19 is executed, which is writing to a position where no chunk is allocated.

データサーバはマスタサーバ１００からチャンク生成要求と、書き込み対象チャンク識別子とを受信する（ステップＳ２１０１）。そして、データサーバはチャンクを生成し、チャンク情報テーブルを更新する（ステップＳ２１０２）。具体的には、データサーバは空のチャンクを生成し、データ記憶部に格納する。また、当該書き込み対象チャンク識別子と、当該チャンクの格納場所と、当該チャンクのサイズ（この時点では０）とをデータ記憶部のチャンク情報テーブルに登録する。その後、データサーバはマスタサーバ１００へチャンク生成応答を送信する（ステップＳ２１０３）。 The data server receives a chunk generation request and a write target chunk identifier from the master server 100 (step S2101). Then, the data server generates a chunk and updates the chunk information table (step S2102). Specifically, the data server generates an empty chunk and stores it in the data storage unit. In addition, the chunk identifier to be written, the storage location of the chunk, and the size of the chunk (0 at this time) are registered in the chunk information table of the data storage unit. Thereafter, the data server transmits a chunk generation response to the master server 100 (step S2103).

図２７は、第１の実施形態における書き込み処理において、プライマリデータサーバであるデータサーバ４００がマスタサーバ１００から書き込み制御委譲要求を受信したときの動作の一例を示すフローチャートである。 FIG. 27 is a flowchart illustrating an example of an operation when the data server 400 as the primary data server receives a write control delegation request from the master server 100 in the write processing according to the first embodiment.

データサーバ４００はマスタサーバ１００から書き込み制御委譲要求と、書き込み対象チャンク識別子と、書き込み対象チャンク保持データサーバ識別子リストとを受信する（ステップＳ２２０１）。 The data server 400 receives a write control delegation request, a write target chunk identifier, and a write target chunk holding data server identifier list from the master server 100 (step S2201).

そして、データサーバ４００は当該書き込み対象チャンク識別子と当該書き込み対象チャンク保持データサーバ識別子リストをメモリに保持する（ステップＳ２２０２）。続いて、データサーバ４００はマスタサーバ１００へ書き込み制御委譲応答を送信する（ステップＳ２２０３）。 Then, the data server 400 holds the write target chunk identifier and the write target chunk holding data server identifier list in the memory (step S2202). Subsequently, the data server 400 transmits a write control delegation response to the master server 100 (step S2203).

図２８は、第１の実施形態における書き込み処理において、データサーバが外部アプリケーション７００または他のデータサーバからデータ送信要求を受信したときの動作の一例を示すフローチャートである。ここでは、例として、データサーバ２００は外部アプリケーション７００から、データサーバ４００はデータサーバ２００から、データサーバ６００はデータサーバ４００から、それぞれデータ送信要求を受信したとする。 FIG. 28 is a flowchart illustrating an example of an operation when the data server receives a data transmission request from the external application 700 or another data server in the writing process according to the first embodiment. Here, as an example, it is assumed that the data server 200 receives a data transmission request from the external application 700, the data server 400 from the data server 200, and the data server 600 from the data server 400, respectively.

データサーバ２００（４００、６００）は外部アプリケーション７００（データサーバ２００、４００）からデータ送信要求と、書き込み対象チャンク識別子と、書き込みデータと、書き込み対象チャンク保持データサーバ識別子リストとを受信する（ステップＳ２３０１）。 The data server 200 (400, 600) receives a data transmission request, write target chunk identifier, write data, and write target chunk holding data server identifier list from the external application 700 (data server 200, 400) (step S2301). ).

そして、データサーバ２００（４００、６００）は当該書き込み対象チャンク識別子と当該書き込みデータとをメモリに保持する（ステップＳ２３０２）。 Then, the data server 200 (400, 600) holds the write target chunk identifier and the write data in the memory (step S2302).

続いて、データサーバ２００（４００、６００）は当該書き込み対象チャンク保持データサーバ識別子リストから自身のデータサーバ識別子を削除する（ステップＳ２３０３）。 Subsequently, the data server 200 (400, 600) deletes its own data server identifier from the write target chunk holding data server identifier list (step S2303).

そして、データサーバ２００（４００、６００）は当該書き込み対象チャンク保持データサーバ識別子リストが空ならば、ステップＳ２３０７へ、そうでなければ、ステップＳ２３０５へ遷移する（ステップＳ２３０４）。ここでは、データサーバ２００、４００はステップＳ２３０５へ、データサーバ６００はステップＳ２３０７へそれぞれ遷移することになる。 If the write target chunk holding data server identifier list is empty, the data server 200 (400, 600) proceeds to step S2307; otherwise, the process proceeds to step S2305 (step S2304). Here, the data servers 200 and 400 transition to step S2305, and the data server 600 transitions to step S2307.

データサーバ２００（４００）は当該書き込み対象チャンク保持データサーバ識別子リスト内のデータサーバ識別子で識別されるデータサーバから、最も距離の近いデータサーバを選択し、選択したデータサーバへデータ送信要求と、当該書き込み対象チャンク識別子と、当該書き込みデータと、当該書き込み対象チャンク保持データサーバ識別子リストとを送信する（ステップＳ２３０５）。 The data server 200 (400) selects the data server having the closest distance from the data servers identified by the data server identifier in the write target chunk holding data server identifier list, and sends a data transmission request to the selected data server, The write target chunk identifier, the write data, and the write target chunk holding data server identifier list are transmitted (step S2305).

ここでは、最も距離の近いデータサーバとして、データサーバ２００はデータサーバ４００を、データサーバ４００はデータサーバ６００をそれぞれ選択することとする。 Here, it is assumed that the data server 200 selects the data server 400 and the data server 400 selects the data server 600 as the closest data server.

そして、データサーバ２００（４００）はデータサーバ４００（６００）からデータ送信応答を受信する（ステップＳ２３０６）。データサーバ２００（４００）はデータ送信応答を受信すると、外部アプリケーション７００（データサーバ２００）へデータ送信応答を送信する（ステップＳ２３０７）。また、データサーバ６００は書き込み対象チャンク保持データサーバ識別子リストが空になると、データサーバ４００へデータ送信応答を送信する（ステップＳ２３０７）。 Then, the data server 200 (400) receives a data transmission response from the data server 400 (600) (step S2306). When receiving the data transmission response, the data server 200 (400) transmits the data transmission response to the external application 700 (data server 200) (step S2307). In addition, when the write target chunk holding data server identifier list becomes empty, the data server 600 transmits a data transmission response to the data server 400 (step S2307).

図２９は、第１の実施形態における書き込み処理において、プライマリデータサーバであるデータサーバ４００が外部アプリケーション７００から書き込み要求を受信したときの動作の一例を示すフローチャートである。 FIG. 29 is a flowchart illustrating an example of an operation when the data server 400 as the primary data server receives a write request from the external application 700 in the write processing according to the first embodiment.

データサーバ４００は外部アプリケーション７００から書き込み要求と、書き込み対象チャンク識別子と、書き込み位置とを受信する（ステップＳ２４０１）。 The data server 400 receives a write request, a write target chunk identifier, and a write position from the external application 700 (step S2401).

そして、データサーバ４００はデータ記憶部４１０のチャンク情報テーブル４１１から当該書き込み対象チャンク識別子に対するチャンクの格納場所を取得する（ステップＳ２４０２）。 The data server 400 acquires the chunk storage location for the write target chunk identifier from the chunk information table 411 of the data storage unit 410 (step S2402).

続いて、データサーバ４００は図２８のステップＳ２３０２で保持した当該書き込み対象チャンク識別子に対応する当該書き込みデータを当該チャンクの当該書き込み位置に書き込む（ステップＳ２４０３）。 Subsequently, the data server 400 writes the write data corresponding to the write target chunk identifier held in step S2302 of FIG. 28 at the write position of the chunk (step S2403).

データサーバ４００はステップＳ２２０２で保持した当該書き込み対象チャンク保持データサーバ識別子リスト内のすべてのセカンダリデータサーバのデータサーバ識別子に対してステップＳ２４０６からステップＳ２４０７の処理を実行する。ここでは、セカンダリデータサーバであるデータサーバ２００とデータサーバ６００に対して処理を実行することになる。 The data server 400 executes the processing from step S2406 to step S2407 on the data server identifiers of all the secondary data servers in the write target chunk holding data server identifier list held in step S2202. Here, the processing is executed for the data server 200 and the data server 600 which are secondary data servers.

初めに、当該書き込み対象チャンク保持データサーバ識別子リスト内のデータサーバ識別子の位置を示すインデックスを当該書き込み対象チャンク保持データサーバ識別子リスト内の２番目に設定する（ステップＳ２４０４）。ここでは、プライマリデータサーバのデータサーバ識別子は当該書き込み対象チャンク保持データサーバ識別子リスト内の先頭に位置するものとし、セカンダリデータサーバのデータサーバ識別子は当該書き込み対象チャンク保持データサーバ識別子リスト内の２番目以降に位置するものとする。 First, an index indicating the position of the data server identifier in the write target chunk holding data server identifier list is set to the second in the write target chunk holding data server identifier list (step S2404). Here, it is assumed that the data server identifier of the primary data server is located at the head in the write target chunk holding data server identifier list, and the data server identifier of the secondary data server is the second in the write target chunk holding data server identifier list. It shall be located after that.

そして、データサーバ４００は当該書き込み対象チャンク保持データサーバ識別子リスト内のすべてのセカンダリデータサーバのデータサーバ識別子の処理が完了したら、ステップＳ２４０９へ遷移する。そうでなければ、ステップＳ２４０６へ遷移する（ステップＳ２４０５）。 Then, the data server 400 transitions to step S2409 when the processing of the data server identifiers of all the secondary data servers in the write target chunk holding data server identifier list is completed. Otherwise, the process proceeds to step S2406 (step S2405).

データサーバ４００は当該書き込み対象チャンク保持データサーバ識別子リスト内のすべてのセカンダリデータサーバのデータサーバ識別子の処理が完了していなければ、当該書き込み対象チャンク保持データサーバ識別子リスト内のインデックスの位置のデータサーバ識別子で識別されるデータサーバにセカンダリ書き込み要求と、当該書き込み対象チャンク識別子と、当該書き込み位置とを送信する（ステップＳ２４０６）。 If the processing of the data server identifiers of all the secondary data servers in the write target chunk holding data server identifier list is not completed, the data server 400 is the data server at the index position in the write target chunk holding data server identifier list. The secondary write request, the write target chunk identifier, and the write position are transmitted to the data server identified by the identifier (step S2406).

続いて、データサーバ４００は当該書き込み対象チャンク保持データサーバ識別子リスト内のインデックスの位置のデータサーバ識別子で識別されるデータサーバからセカンダリ書き込み応答を受信する（ステップＳ２４０７）。そしてデータサーバ４００はインデックスを当該書き込み対象チャンク保持データサーバ識別子リスト内の次の位置に設定する（ステップＳ２４０８）。 Subsequently, the data server 400 receives a secondary write response from the data server identified by the data server identifier at the index position in the write target chunk holding data server identifier list (step S2407). The data server 400 sets the index to the next position in the write target chunk holding data server identifier list (step S2408).

データサーバ４００は当該書き込み対象チャンク保持データサーバ識別子リスト内のすべてのセカンダリデータサーバのデータサーバ識別子の処理が完了したら、外部アプリケーション７００へ書き込み応答を送信する（ステップＳ２４０９）。 When the processing of the data server identifiers of all the secondary data servers in the write target chunk holding data server identifier list is completed, the data server 400 transmits a write response to the external application 700 (step S2409).

図３０は、第１の実施形態における書き込み処理において、セカンダリデータサーバであるデータサーバ２００（６００）がプライマリデータサーバであるデータサーバ４００からセカンダリ書き込み要求を受信したときの動作の一例を示すフローチャートである。 FIG. 30 is a flowchart illustrating an example of an operation when the data server 200 (600) as the secondary data server receives a secondary write request from the data server 400 as the primary data server in the writing process according to the first embodiment. is there.

データサーバ２００（６００）はデータサーバ４００からセカンダリ書き込み要求と、書き込み対象チャンク識別子と、書き込み位置とを受信する（ステップＳ２５０１）。 The data server 200 (600) receives the secondary write request, write target chunk identifier, and write position from the data server 400 (step S2501).

そして、データサーバ２００（６００）はデータ記憶部２１０（６１０）のチャンク情報テーブル２１１（６１１）から当該書き込み対象チャンク識別子に対するチャンクの格納場所を取得する（ステップＳ２５０２）。 Then, the data server 200 (600) acquires a chunk storage location for the write target chunk identifier from the chunk information table 211 (611) of the data storage unit 210 (610) (step S2502).

続いて、データサーバ２００（６００）は図２８のステップＳ２３０２で保持した当該書き込み対象チャンク識別子に対応する当該書き込みデータを当該チャンクの当該書き込み位置に書き込む（ステップＳ２５０３）。そして、データサーバ２００（６００）はデータサーバ４００へセカンダリ書き込み応答を送信する（ステップＳ２５０４）。 Subsequently, the data server 200 (600) writes the write data corresponding to the write target chunk identifier held in step S2302 of FIG. 28 at the write position of the chunk (step S2503). Then, the data server 200 (600) transmits a secondary write response to the data server 400 (step S2504).

［第１の実施形態の効果］
このように、第１の実施形態に係るマスタサーバ１００は、複数のデータサーバに格納されたファイルデータであって、固定長のチャンクに分割して格納されたファイルデータに対するスナップショットへの書き込み要求を受け付けた場合に、コピーオンライトにより、該書き込み要求の対象となるチャンクのコピー先のデータサーバを、各データサーバにおけるデータ記憶部の空き容量、コピーされるチャンクを保持しているデータサーバと各データサーバとの距離、各データサーバの負荷状態、各データサーバにおける複数のファイルに共有されているチャンクの数のいずれか一つまたは複数に基づいて決定する。そして、マスタサーバ１００は、決定されたデータサーバに対して、書き込み要求の対象となるチャンクをコピーするように制御する。 [Effect of the first embodiment]
As described above, the master server 100 according to the first embodiment is a file data stored in a plurality of data servers, and a write request to a snapshot for file data stored in a fixed-length chunk. Is received by the copy-on-write, the data server that is the copy destination of the chunk that is the target of the write request, the free capacity of the data storage unit in each data server, and the data server that holds the chunk to be copied This is determined based on one or more of the distance to each data server, the load state of each data server, and the number of chunks shared by a plurality of files in each data server. Then, the master server 100 controls the determined data server to copy the chunk that is the target of the write request.

このため、コピーオンライトによるチャンクのコピーにおいて、コピーオンライト戦略に基づいて、チャンクのコピー先となるデータサーバを決定することを可能とする。コピーオンライト戦略は、各データサーバにおけるデータ記憶部の空き容量やデータサーバ間のネットワーク上の距離や各データサーバの負荷状態、各データサーバにおける複数のファイルに共有されているチャンクの数などのいずれか、または任意の組合せなどによって、コピーオンライトにおけるチャンクのコピー先となるデータサーバを決定する。想定されるユースケースに応じて適切なコピーオンライト戦略を用いることにより、データサーバのデータ記憶部の空き容量不足によるチャンクのコピーの失敗を防止したり、データサーバ間で負荷を分散したりすることが可能となる。 For this reason, in copying a chunk by copy-on-write, it is possible to determine a data server that is a copy destination of the chunk based on a copy-on-write strategy. The copy-on-write strategy is based on the data storage capacity of each data server, the network distance between data servers, the load status of each data server, the number of chunks shared by multiple files on each data server, etc. The data server that is the copy destination of the chunk in copy-on-write is determined by any one or any combination. By using an appropriate copy-on-write strategy according to the assumed use case, chunk copy failure due to insufficient free space in the data storage section of the data server can be prevented, and the load can be distributed among the data servers. It becomes possible.

つまり、スナップショットへの書き込みの際のコピーオンライトにおいて、データサーバ内で無条件にチャンクをローカルにコピーするのではなく、データサーバのデータ記憶部の空き容量やデータサーバ間のネットワーク上の距離、データサーバの負荷などを勘案して、データサーバ内でローカルにコピーしたり、別データサーバにコピーしたりすることにより、データ記憶部の空き容量不足による書き込み不可を回避したり、データサーバの負荷を分散したりすることができる。 In other words, in copy-on-write when writing to the snapshot, the chunks are not copied locally unconditionally in the data server, but the free space of the data storage unit of the data server and the network distance between the data servers Taking into account the load on the data server, etc., copy it locally within the data server or copy it to another data server, avoiding the inability to write due to insufficient free space in the data storage unit, Load can be distributed.

（システム構成等）
また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。例えば、決定部１１３と制御部１１４を統合してもよい。さらに、各装置にて行なわれる各処理機能は、その全部または任意の一部が、ＣＰＵおよび当該ＣＰＵにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 (System configuration etc.)
Further, each component of each illustrated apparatus is functionally conceptual, and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution / integration of each device is not limited to that shown in the figure, and all or a part thereof may be functionally or physically distributed or arbitrarily distributed in arbitrary units according to various loads or usage conditions. Can be integrated and configured. For example, the determination unit 113 and the control unit 114 may be integrated. Further, all or any part of each processing function performed in each device may be realized by a CPU and a program analyzed and executed by the CPU, or may be realized as hardware by wired logic.

また、本実施形態において説明した各処理のうち、自動的におこなわれるものとして説明した処理の全部または一部を手動的におこなうこともでき、あるいは、手動的におこなわれるものとして説明した処理の全部または一部を公知の方法で自動的におこなうこともできる。この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 In addition, among the processes described in the present embodiment, all or part of the processes described as being automatically performed can be manually performed, or the processes described as being manually performed can be performed. All or a part can be automatically performed by a known method. In addition, the processing procedure, control procedure, specific name, and information including various data and parameters shown in the above-described document and drawings can be arbitrarily changed unless otherwise specified.

（プログラム）
また、上記実施形態に係るマスタサーバ１００またはデータサーバ２００（３００、４００、５００、６００）が実行する処理をコンピュータが実行可能な言語で記述したプログラムを作成することもできる。この場合、コンピュータがプログラムを実行することにより、上記実施形態と同様の効果を得ることができる。さらに、かかるプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータに読み込ませて実行することにより上記実施形態と同様の処理を実現してもよい。なお、以下では、マスタサーバ１００と同様の機能を実現するスナップショット制御プログラムを実行するコンピュータの一例を説明する。 (program)
Moreover, it is also possible to create a program in which processing executed by the master server 100 or the data server 200 (300, 400, 500, 600) according to the above embodiment is described in a language that can be executed by a computer. In this case, the same effect as the above-described embodiment can be obtained by the computer executing the program. Further, such a program may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by a computer and executed to execute the same processing as in the above embodiment. In the following, an example of a computer that executes a snapshot control program that realizes the same function as the master server 100 will be described.

図３１は、スナップショット制御プログラムを実行するコンピュータを示す図である。図３１に示すように、コンピュータ１００００は、例えば、メモリ１０１００と、ＣＰＵ１０２００と、ハードディスクドライブインタフェース１０３００と、ディスクドライブインタフェース１０４００と、シリアルポートインタフェース１０５００と、ビデオアダプタ１０６００と、ネットワークインタフェース１０７００とを有する。これらの各部は、バス１０８００によって接続される。 FIG. 31 is a diagram illustrating a computer that executes a snapshot control program. As illustrated in FIG. 31, the computer 10000 includes, for example, a memory 10100, a CPU 10200, a hard disk drive interface 10300, a disk drive interface 10400, a serial port interface 10500, a video adapter 10600, and a network interface 10700. These units are connected by a bus 10800.

メモリ１０１００は、ＲＯＭ（Read Only Memory）１０１１０およびＲＡＭ（Random Access Memory）１０１２０を含む。ＲＯＭ１０１１０は、例えば、ＢＩＯＳ（Basic Input Output System）等のブートプログラムを記憶する。ハードディスクドライブインタフェース１０３００は、ハードディスクドライブ１０９００に接続される。ディスクドライブインタフェース１０４００は、ディスクドライブ１０４１０に接続される。ディスクドライブ１０４１０には、例えば、磁気ディスクや光ディスク等の着脱可能な記憶媒体が挿入される。シリアルポートインタフェース１０５００には、例えば、マウス１１１００およびキーボード１１２００が接続される。ビデオアダプタ１０６００には、例えば、ディスプレイ１１３００が接続される。 The memory 10100 includes a ROM (Read Only Memory) 10110 and a RAM (Random Access Memory) 10120. The ROM 10110 stores a boot program such as BIOS (Basic Input Output System). The hard disk drive interface 10300 is connected to the hard disk drive 10900. The disk drive interface 10400 is connected to the disk drive 10410. A removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 10410, for example. For example, a mouse 11100 and a keyboard 11200 are connected to the serial port interface 10500. For example, a display 11300 is connected to the video adapter 10600.

ここで、図３１に示すように、ハードディスクドライブ１０９００は、例えば、ＯＳ１０９１０、アプリケーションプログラム１０９２０、プログラムモジュール１０９３０およびプログラムデータ１０９４０を記憶する。上記実施形態で説明した各テーブルは、例えばハードディスクドライブ１０９００やメモリ１０１００に記憶される。 Here, as shown in FIG. 31, the hard disk drive 10900 stores, for example, an OS 10910, an application program 10920, a program module 10930, and program data 10940. Each table described in the above embodiment is stored in the hard disk drive 10900 or the memory 10100, for example.

また、スナップショット制御プログラムは、例えば、コンピュータ１００００によって実行される指令が記述されたプログラムモジュールとして、ハードディスクドライブ１０９００に記憶される。具体的には、上記実施形態で説明したマスタサーバ１００が実行する各処理が記述されたプログラムモジュールが、ハードディスクドライブ１０９００に記憶される。 Further, the snapshot control program is stored in the hard disk drive 10900 as a program module in which a command executed by the computer 10000 is described, for example. Specifically, a program module describing each process executed by the master server 100 described in the above embodiment is stored in the hard disk drive 10900.

また、スナップショット制御プログラムによる情報処理に用いられるデータは、プログラムデータとして、例えば、ハードディスクドライブ１０９００に記憶される。そして、ＣＰＵ１０２００が、ハードディスクドライブ１０９００に記憶されたプログラムモジュール１０９３０やプログラムデータ１０９４０を必要に応じてＲＡＭ１０１２０に読み出して、上述した各手順を実行する。 Data used for information processing by the snapshot control program is stored as program data in, for example, the hard disk drive 10900. Then, the CPU 10200 reads out the program module 10930 and the program data 10940 stored in the hard disk drive 10900 to the RAM 10120 as necessary, and executes the above-described procedures.

なお、スナップショット制御プログラムに係るプログラムモジュール１０９３０やプログラムデータ１０９４０は、ハードディスクドライブ１０９００に記憶される場合に限られず、例えば、着脱可能な記憶媒体に記憶されて、ディスクドライブ１０４１０等を介してＣＰＵ１０２００によって読み出されてもよい。あるいは、スナップショット制御プログラムに係るプログラムモジュール１０９３０やプログラムデータ１０９４０は、ＬＡＮ（Local Area Network）やＷＡＮ（Wide Area Network）等のネットワークを介して接続された他のコンピュータに記憶され、ネットワークインタフェース１０７００を介してＣＰＵ１０２００によって読み出されてもよい。 Note that the program module 10930 and the program data 10940 related to the snapshot control program are not limited to being stored in the hard disk drive 10900. For example, the program module 10930 and the program data 10940 are stored in a removable storage medium, and the CPU 10200 via the disk drive 10410 or the like. It may be read out. Alternatively, the program module 10930 and the program data 10940 related to the snapshot control program are stored in another computer connected via a network such as a LAN (Local Area Network) or a WAN (Wide Area Network), and the network interface 10700 is stored. Via the CPU 10200.

１００マスタサーバ
１１０ファイル管理部
１１１ファイルテーブル
１１２チャンクテーブル
１１３決定部
１１４制御部
１２０データサーバ状態管理部
１２１データサーバ状態管理テーブル
２００、３００、４００、５００、６００データサーバ
２１０、３１０、４１０、５１０、６１０データ記憶部
２１１、３１１、４１１、５１１、６１１チャンク情報テーブル
２２０、３２０、４２０、５２０、６２０データアクセス部
２３０、３３０、４３０、５３０、６３０状態通知部
７００外部アプリケーション
８００ネットワーク
９００ファイルデータ
９１０、９２０、９３０、１０１０、１０２０、１０３０、１１３０チャンク
１０００ファイル１
１１００ファイル２
１００００コンピュータ
１０１００メモリ
１０１１０ＲＯＭ
１０１２０ＲＡＭ
１０２００ＣＰＵ
１０３００ハードディスクドライブインタフェース
１０４００ディスクドライブインタフェース
１０４１０ディスクドライブ
１０５００シリアルポートインタフェース
１０６００ビデオアダプタ
１０７００ネットワークインタフェース
１０８００バス
１０９００ハードディスクドライブ
１０９１０ＯＳ
１０９２０アプリケーションプログラム
１０９３０プログラムモジュール
１０９４０プログラムデータ
１１１００マウス
１１２００キーボード
１１３００ディスプレイ DESCRIPTION OF SYMBOLS 100 Master server 110 File management part 111 File table 112 Chunk table 113 Determination part 114 Control part 120 Data server state management part 121 Data server state management table 200, 300, 400, 500, 600 Data server 210, 310, 410, 510, 610 Data storage unit 211, 311, 411, 511, 611 Chunk information table 220, 320, 420, 520, 620 Data access unit 230, 330, 430, 530, 630 Status notification unit 700 External application 800 Network 900 File data 910, 920, 930, 1010, 1020, 1030, 1130 Chunk 1000 File 1
1100 File 2
10000 computer 10100 memory 10110 ROM
10120 RAM
10200 CPU
10300 Hard disk drive interface 10400 Disk drive interface 10410 Disk drive 10500 Serial port interface 10600 Video adapter 10700 Network interface 10800 Bus 10900 Hard disk drive 10910 OS
10920 application program 10930 program module 10940 program data 11100 mouse 11200 keyboard 11300 display

Claims

When a write request to a snapshot for file data stored in multiple data servers and stored in fixed-length chunks is received, the target of the write request is made by copy-on-write. The data server that is the copy destination of the chunk, the free space of the data storage unit in each data server, the distance between the data server holding the copied chunk and each data server, the load status of each data server, each data A determining unit that determines based on any one or more of the number of chunks shared by a plurality of files on the server;
A snapshot control apparatus comprising: a control unit that controls a data server determined by the determination unit to copy a chunk that is a target of a write request.

The determining unit selects a data server having the smallest distance from the data server holding the chunk to be copied from among the data servers in which the free space of the data storage unit in the data server is equal to or greater than a predetermined threshold. The snapshot control apparatus according to claim 1, wherein the snapshot control apparatus is determined as a data server.

The determination unit determines a data server having the lowest load state as a copy destination data server from among data servers in which a free space of a data storage unit in the data server is equal to or greater than a predetermined threshold. Item 4. The snapshot control device according to Item 1.

The determination unit is a data server in which the free space of the data storage unit in the data server is equal to or greater than a predetermined threshold, and the number of chunks shared by a plurality of files in each data server is less than the predetermined threshold 2. The snapshot control apparatus according to claim 1, wherein the server is determined as a copy destination data server.

A snapshot control method executed by a snapshot control device,
When a write request to a snapshot for file data stored in multiple data servers and stored in fixed-length chunks is received, the target of the write request is made by copy-on-write. The data server that is the copy destination of the chunk, the free space of the data storage unit in each data server, the distance between the data server holding the copied chunk and each data server, the load status of each data server, each data A determination step of determining based on any one or more of the number of chunks shared by the plurality of files on the server;
And a control step of controlling the data server determined by the determination step so as to copy the chunk that is the target of the write request.

When a write request to a snapshot for file data stored in multiple data servers and stored in fixed-length chunks is received, the target of the write request is made by copy-on-write. The data server that is the copy destination of the chunk, the free space of the data storage unit in each data server, the distance between the data server holding the copied chunk and each data server, the load status of each data server, each data A decision step for determining based on any one or more of the number of chunks shared by the plurality of files on the server;
A snapshot control program that causes a computer to execute a control step of controlling a data server determined in the determination step to copy a chunk that is a target of a write request.