JP6201340B2

JP6201340B2 - Replication system

Info

Publication number: JP6201340B2
Application number: JP2013036614A
Authority: JP
Inventors: 孝紘伊東
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2013-02-27
Filing date: 2013-02-27
Publication date: 2017-09-27
Anticipated expiration: 2033-02-27
Also published as: JP2014164634A

Description

本発明は、レプリケーションシステムにかかり、特に、重複記憶を排除する機能を有するストレージシステムを備えたレプリケーションシステムに関する。 The present invention relates to a replication system, and more particularly to a replication system including a storage system having a function of eliminating duplicate storage.

近年、コンピュータの発達及び普及に伴い、種々の情報がデジタルデータ化されている。このようなデジタルデータを保存しておく装置として、磁気テープや磁気ディスクなどの記憶装置がある。そして、保存すべきデータは日々増大し、膨大な量となるため、大容量なストレージシステムが必要となっている。また、記憶装置に費やすコストを削減しつつ、信頼性も必要とされる。これに加えて、後にデータを容易に取り出すことが可能であることも必要である。その結果、自動的に記憶容量や性能の増大を実現できると共に、重複記憶を排除して記憶コストを削減し、さらには、冗長性の高いストレージシステムが望まれている。 In recent years, with the development and spread of computers, various types of information have been converted into digital data. As a device for storing such digital data, there are storage devices such as a magnetic tape and a magnetic disk. Since the data to be stored increases day by day and becomes enormous, a large-capacity storage system is required. In addition, reliability is required while reducing the cost of the storage device. In addition to this, it is necessary that data can be easily retrieved later. As a result, there is a demand for a storage system that can automatically increase storage capacity and performance, eliminate duplicate storage, reduce storage costs, and have high redundancy.

このような状況に応じて、近年では、特許文献１に示すように、コンテンツアドレスストレージシステムが開発されている。このコンテンツアドレスストレージシステムは、データを分散して複数の記憶装置に記憶すると共に、このデータの内容に応じて特定される固有のコンテンツアドレスによって、当該データを格納した格納位置が特定される。また、コンテンツアドレスストレージシステムの中には、所定のデータを複数のフラグメントに分割すると共に、冗長データとなるフラグメントをさらに付加して、これら複数のフラグメントをそれぞれ複数の記憶装置にそれぞれ格納する、というものもある。 In response to such a situation, in recent years, a content address storage system has been developed as shown in Patent Document 1. In this content address storage system, data is distributed and stored in a plurality of storage devices, and the storage location where the data is stored is specified by a unique content address specified according to the content of the data. Further, in the content address storage system, predetermined data is divided into a plurality of fragments, and a fragment that becomes redundant data is further added, and the plurality of fragments are stored in a plurality of storage devices, respectively. There are also things.

そして、上述したようなコンテンツアドレスストレージシステムでは、後に、コンテンツアドレスを指定することにより、当該コンテンツアドレスにて特定される格納位置に格納されているデータつまりフラグメントを読み出し、複数のフラグメントから分割前の所定のデータを復元することができる。 Then, in the content address storage system as described above, by designating the content address later, the data stored in the storage location specified by the content address, that is, the fragment is read out, and a plurality of fragments before the division are read. Predetermined data can be restored.

また、上記コンテンツアドレスは、データの内容に応じて固有となるよう生成される値、例えばデータのハッシュ値、に基づいて生成される。このため、重複データであれば同じ格納位置のデータを参照することで、同一内容のデータを取得することができる。従って、重複データを別々に格納する必要がなく、重複記録を排除して、データ容量の削減を図ることができる。 The content address is generated based on a value generated to be unique according to the data content, for example, a hash value of the data. For this reason, if it is duplicate data, the data of the same content can be acquired by referring to the data at the same storage position. Therefore, there is no need to store duplicate data separately, and duplicate recording can be eliminated to reduce the data capacity.

特に、上述したような重複記憶を排除する機能を有するストレージシステムでは、ファイルなど書き込み対象となるデータを所定容量の複数のブロックデータに分割して圧縮し、記憶装置に書き込む。このように、ファイルを分割したブロックデータ単位で重複記憶を排除することで、データ容量の削減を図っている。そして、ストレージシステムをデータのバックアップ用に使用する場合には、バックアップの容量を節約したり、レプリケーション時の帯域の節約を図ることができる。なお、かかるレプリケーションは、データの保全等の目的で、あるシステムに存在するデータをネットワーク等を介して他のシステムに送信し、複製することである。 In particular, in a storage system having a function of eliminating duplicate storage as described above, data to be written such as a file is divided into a plurality of block data of a predetermined capacity, compressed, and written to a storage device. In this way, the data capacity is reduced by eliminating duplicate storage in units of block data obtained by dividing the file. When the storage system is used for data backup, it is possible to save backup capacity and save bandwidth during replication. Such replication is to transmit data existing in a certain system to another system via a network or the like for the purpose of data maintenance or the like, and to replicate the data.

上述した重複排除ストレージシステムにおけるレプリケーションでは、複製元のシステム（以下、マスタシステム）から複製先のシステム（以下、レプリカシステム）にデータの転送を行う前に、そのデータがレプリカシステムにすでに存在するかどうか確認を行う。そして、すでにレプリカシステムに存在するデータは、マスタシステムから転送することを省略することができ、転送データ量を削減することができる。具体的には、転送しようとしているデータのダイジェスト（ハッシュ値）をマスタシステムがレプリカシステムに送信し、そのダイジェストに基づいてレプリカシステムが自システムに同一のデータが存在しているか否かの確認を行う。そして、存在しないデータのみをマスタシステムへ送信の要求を行うことで実現している。 In the above-described replication in the deduplication storage system, whether the data already exists in the replica system before data is transferred from the replication source system (hereinafter referred to as the master system) to the replication destination system (hereinafter referred to as the replica system). Check if. Data that already exists in the replica system can be omitted from the master system, and the amount of transferred data can be reduced. Specifically, the master system sends a digest (hash value) of the data to be transferred to the replica system, and based on the digest, the replica system checks whether the same data exists in its own system. Do. This is realized by requesting transmission of only non-existing data to the master system.

ここで、マスタシステムに存在し、レプリケーションの対象となるデータは、以下の４種類に分類可能である。
Ａ：マスタシステムでの書き込み時、すでにマスタシステム中に存在したために重複排除され、実際にデータの書き込みは行われなかった、かつ、レプリカシステムにも同一のデータが存在する。
Ｂ：マスタシステムでの書き込み時、マスタシステム中に存在しなかったためデータの書き込みが実際に行われたが、レプリカシステムには同一のデータが存在する。
Ｃ：マスタシステムでの書き込み時、すでにマスタシステム中に存在したために重複排除され、実際にデータの書き込みは行われなかったが、レプリカシステムには同一のデータが存在しない。
Ｄ：マスタシステムでの書き込み時、マスタシステム中に存在しなかったためデータの書き込みが実際に行われた、かつ、レプリカシステムにも同一のデータは存在しない。 Here, the data that exists in the master system and is the replication target can be classified into the following four types.
A: At the time of writing in the master system, duplication is eliminated because it already exists in the master system, data is not actually written, and the same data also exists in the replica system.
B: At the time of writing in the master system, data was actually written because it did not exist in the master system, but the same data exists in the replica system.
C: At the time of writing in the master system, duplication is eliminated because it already exists in the master system, and data is not actually written, but the same data does not exist in the replica system.
D: At the time of writing in the master system, data was actually written because it did not exist in the master system, and the same data does not exist in the replica system.

レプリケーション開始前にマスタシステムが知ることができる数字は、レプリケーション対象データの合計（Ａ＋Ｂ＋Ｃ＋Ｄ）と、そのうち、重複排除されずに新たに書き込まれたデータのサイズの合計（Ｂ＋Ｄ）のみである。 The numbers that the master system can know before starting replication are only the total of replication target data (A + B + C + D) and the total of the data newly written without deduplication (B + D).

上記ＡとＢに分類されるデータに関しては、ネットワークでの転送は行われず、レプリカシステムにおける存在確認のみが行われるため、非常に高速に処理される。ところが、上記ＣとＤに分類されるデータに関しては、レプリカシステムにおける存在確認の後、マスタシステムでの読み込み、ネットワークでの転送、レプリカシステムでの書き込みが必要となるため、非常に低速な処理となる。すなわち、レプリケーションに占める処理時間はほぼ、上記ＣとＤに分類されるデータの処理による時間で占められることとなる。 The data classified into A and B is not transferred on the network, and only the existence confirmation in the replica system is performed, so that the data is processed at a very high speed. However, since the data classified into C and D needs to be read in the master system, transferred in the network, and written in the replica system after the existence check in the replica system, the processing is very slow. Become. That is, the processing time for replication is almost occupied by the time for processing data classified as C and D.

レプリケーションの処理は、特にマスタシステムとレプリカシステム間のネットワークの帯域が狭い場合は、非常に時間がかかることが多い。そのため、レプリケーションの開始から終了まで線形に増加し、完了までの所要時間の予測が行いやすい進捗率を知りたいという高いニーズがある。例えば、特許文献２に、データの複製時にデータ転送時間を予測することが記載されている。 Replication processing is often very time consuming, especially when the network bandwidth between the master system and the replica system is narrow. For this reason, there is a high need to know a progress rate that increases linearly from the start to the end of replication and is easy to predict the required time to completion. For example, Patent Document 2 describes that a data transfer time is predicted when data is replicated.

特開２００５−２３５１７１号公報JP 2005-235171 A 特開２０１２−２４７９８１号公報JP 2012-247981 A

ここで、レプリケーションの進捗率を計算するためには、上述した理由から、上記ＣとＤに分類されるデータの総量を知ることが必要となる。しかし、レプリケーション対象となるデータを持つマスタシステムが、レプリカシステムに事前に問い合わせずにこれを知ることは困難である。つまり、マスタシステムが、レプリケーションが完了する前に上記データの総量を知ることは困難である。 Here, in order to calculate the replication progress rate, it is necessary to know the total amount of data classified into C and D for the reasons described above. However, it is difficult for a master system having data to be replicated to know this without making an inquiry in advance to the replica system. That is, it is difficult for the master system to know the total amount of data before replication is completed.

一方で、マスタシステムがレプリケーション実行前に得ることができるのは「Ａ＋Ｂ＋Ｃ＋Ｄ」と「Ｂ＋Ｄ」のみである。このため、これまでは、「Ｃ＋Ｄ」の代わりに「Ｂ＋Ｄ」をレプリケーション処理の総量と仮定し、レプリケーションのある時点において、「レプリケーション開始からのデータの転送量」を、「Ｂ＋Ｄ」で割ることで、進捗率の計算を行うことが考えられていた。 On the other hand, the master system can obtain only “A + B + C + D” and “B + D” before executing replication. For this reason, until now, instead of “C + D”, it is assumed that “B + D” is the total amount of replication processing, and at a certain point of replication, “data transfer amount from the start of replication” is divided by “B + D”. It was considered to calculate the progress rate.

しかしながら、上記「Ｂ＋Ｄ」と「Ｃ＋Ｄ」の差分が大きい場合は、以下のような問題が発生する。
・（Ｂ＋Ｄ）＞（Ｃ＋Ｄ）の場合：進捗率は低い数字でなかなか上昇せず、突然レプリケーションが完了する。
・（Ｂ＋Ｄ）＜（Ｃ＋Ｄ）の場合：進捗率はレプリケーションの実行途中でほぼ１００％まで上昇してしまう。 However, when the difference between “B + D” and “C + D” is large, the following problem occurs.
When (B + D)> (C + D): The progress rate is a low number and does not increase easily, and replication is completed suddenly.
When (B + D) <(C + D): The progress rate increases to almost 100% during the execution of replication.

いずれの場合も、ある時点の進捗率をもとに行うレプリケーション完了までの所要時間の予測は大きく外れることとなる。以上のことから、重複排除ストレージシステムにおいてレプリケーションの進捗率をより実態に近い値で算出することが困難であるという問題があった。 In either case, the prediction of the time required to complete the replication based on the progress rate at a certain point will be significantly different. From the above, there is a problem that it is difficult to calculate the replication progress rate with a value closer to the actual situation in the deduplication storage system.

このため、本発明の目的は、上述した課題である、重複排除ストレージシステムにおいてレプリケーションの進捗率をより実態に近い値で算出することが困難である、ということを解決することができるレプリケーションシステムを提供することにある。 Therefore, an object of the present invention is to provide a replication system that can solve the above-mentioned problem, that is, it is difficult to calculate the replication progress rate with a value closer to the actual situation in the deduplication storage system. It is to provide.

本発明の一形態であるレプリケーションシステムは、
それぞれ記憶装置を備えたマスタ側ストレージシステム及びレプリカ側ストレージシステムが、データを記憶装置に格納すると共に、当該記憶装置に既に記憶されているデータと同一のデータ内容の他のデータを記憶装置に格納する場合に、当該記憶装置に既に記憶されているデータを他のデータとして参照させることで重複記憶を排除する機能を備えており、
前記マスタ側ストレージシステムの記憶装置に記憶されているデータを、前記レプリカ側ストレージシステムの記憶装置にレプリケーションするレプリケーション実行部と、
所定のレプリケーション時点における当該レプリケーションの進捗率を算出する進捗率算出部と、を備え、
前記レプリケーション実行部は、前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムにレプリケーションするデータが、前記レプリカ側ストレージシステムの記憶装置に重複排除されて格納されるか否かを調査すると共に、当該レプリケーションするデータのうち前記レプリカ側ストレージシステムに重複排除されずに格納されるデータを前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムに転送し、
前記進捗率算出部は、前記レプリケーションするデータの調査結果に基づいて、レプリケーション全体において前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムに転送されて当該レプリカ側ストレージシステムの記憶装置に重複排除されずに新たに格納されるデータのデータ総量を推定すると共に、当該推定したデータ総量と、所定のレプリケーション時点において前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムに転送されたデータのデータ量と、に基づいて、当該所定のレプリケーション時点におけるレプリケーションの進捗率を算出する、
という構成をとる。 A replication system according to one aspect of the present invention is:
The master-side storage system and replica-side storage system each having a storage device store the data in the storage device, and store other data in the storage device with the same data content as the data already stored in the storage device In this case, it has a function of eliminating duplicate storage by referring to data already stored in the storage device as other data,
A replication execution unit that replicates data stored in the storage device of the master-side storage system to the storage device of the replica-side storage system;
A progress rate calculation unit that calculates the progress rate of the replication at a predetermined replication point;
The replication execution unit investigates whether or not data to be replicated from the master side storage system to the replica side storage system is deduplicated and stored in the storage device of the replica side storage system, and performs replication. Transfer data stored in the replica storage system without being deduplicated from the master storage system to the replica storage system,
The progress rate calculation unit is transferred from the master side storage system to the replica side storage system in the entire replication based on the investigation result of the data to be replicated and is not deduplicated in the storage device of the replica side storage system. Estimating the total data amount of newly stored data, and based on the estimated total data amount and the data amount of data transferred from the master storage system to the replica storage system at a predetermined replication point , Calculate the replication progress rate at the given replication point,
The configuration is as follows.

本発明の他の形態であるプログラムは、
それぞれ記憶装置を備えたマスタ側ストレージシステム及びレプリカ側ストレージシステムが、データを記憶装置に格納すると共に、当該記憶装置に既に記憶されているデータと同一のデータ内容の他のデータを記憶装置に格納する場合に、当該記憶装置に既に記憶されているデータを他のデータとして参照させることで重複記憶を排除する機能を備えている場合に、前記マスタ側ストレージシステムと前記レプリカ側ストレージシステムとの間におけるレプリケーションを制御する情報処理装置に、
前記マスタ側ストレージシステムの記憶装置に記憶されているデータを、前記レプリカ側ストレージシステムの記憶装置にレプリケーションするレプリケーション実行部と、
所定のレプリケーション時点における当該レプリケーションの進捗率を算出する進捗率算出部と、を実現させると共に、
前記レプリケーション実行部は、前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムにレプリケーションするデータが、前記レプリカ側ストレージシステムの記憶装置に重複排除されて格納されるか否かを調査すると共に、当該レプリケーションするデータのうち前記レプリカ側ストレージシステムに重複排除されずに格納されるデータを前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムに転送し、
前記進捗率算出部は、前記レプリケーションするデータの調査結果に基づいて、レプリケーション全体において前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムに転送されて当該レプリカ側ストレージシステムの記憶装置に重複排除されずに新たに格納されるデータのデータ総量を推定すると共に、当該推定したデータ総量と、所定のレプリケーション時点において前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムに転送されたデータのデータ量と、に基づいて、当該所定のレプリケーション時点におけるレプリケーションの進捗率を算出する、
ことを実現させるためのプログラムである。 The program which is the other form of this invention is:
The master-side storage system and replica-side storage system each having a storage device store the data in the storage device, and store other data in the storage device with the same data content as the data already stored in the storage device In the case of providing a function for eliminating duplicate storage by referring to data already stored in the storage device as other data, between the master-side storage system and the replica-side storage system. In the information processing device that controls replication in
A replication execution unit that replicates data stored in the storage device of the master-side storage system to the storage device of the replica-side storage system;
A progress rate calculation unit that calculates the progress rate of the replication at a predetermined replication time point, and
The replication execution unit investigates whether or not data to be replicated from the master side storage system to the replica side storage system is deduplicated and stored in the storage device of the replica side storage system, and performs replication. Transfer data stored in the replica storage system without being deduplicated from the master storage system to the replica storage system,
The progress rate calculation unit is transferred from the master side storage system to the replica side storage system in the entire replication based on the investigation result of the data to be replicated and is not deduplicated in the storage device of the replica side storage system. Estimating the total data amount of newly stored data, and based on the estimated total data amount and the data amount of data transferred from the master storage system to the replica storage system at a predetermined replication point , Calculate the replication progress rate at the given replication point,
It is a program for realizing this.

本発明の他の形態であるレプリケーション方法は、
それぞれ記憶装置を備えたマスタ側ストレージシステム及びレプリカ側ストレージシステムが、データを記憶装置に格納すると共に、当該記憶装置に既に記憶されているデータと同一のデータ内容の他のデータを記憶装置に格納する場合に、当該記憶装置に既に記憶されているデータを他のデータとして参照させることで重複記憶を排除する機能を備えており、前記マスタ側ストレージシステムと前記レプリカ側ストレージシステムとの間におけるレプリケーション方法であって、
前記マスタ側ストレージシステムの記憶装置に記憶されているデータを、前記レプリカ側ストレージシステムの記憶装置にレプリケーションするレプリケーション実行工程と、
所定のレプリケーション時点における当該レプリケーションの進捗率を算出する進捗率算出工程と、を有し、
前記レプリケーション実行工程は、前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムにレプリケーションするデータが、前記レプリカ側ストレージシステムの記憶装置に重複排除されて格納されるか否かを調査すると共に、当該レプリケーションするデータのうち前記レプリカ側ストレージシステムに重複排除されずに格納されるデータを前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムに転送し、
前記進捗率算出工程は、前記レプリケーションするデータの調査結果に基づいて、レプリケーション全体において前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムに転送されて当該レプリカ側ストレージシステムの記憶装置に重複排除されずに新たに格納されるデータのデータ総量を推定すると共に、当該推定したデータ総量と、所定のレプリケーション時点において前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムに転送されたデータのデータ量と、に基づいて、当該所定のレプリケーション時点におけるレプリケーションの進捗率を算出する、
という構成をとる。 Another aspect of the present invention is a replication method,
The master-side storage system and replica-side storage system each having a storage device store the data in the storage device, and store other data in the storage device with the same data content as the data already stored in the storage device Replication function between the master side storage system and the replica side storage system, which has a function of eliminating duplicate storage by referring to data already stored in the storage device as other data. A method,
A replication execution step of replicating the data stored in the storage device of the master side storage system to the storage device of the replica side storage system;
A progress rate calculating step for calculating the progress rate of the replication at a predetermined replication point;
The replication execution step investigates whether data to be replicated from the master side storage system to the replica side storage system is stored in the storage device of the replica side storage system in a deduplicated manner and performs the replication. Transfer data stored in the replica storage system without being deduplicated from the master storage system to the replica storage system,
The progress rate calculation step is based on the investigation result of the data to be replicated, and is transferred from the master side storage system to the replica side storage system in the entire replication and is not deduplicated in the storage device of the replica side storage system. Estimating the total data amount of newly stored data, and based on the estimated total data amount and the data amount of data transferred from the master storage system to the replica storage system at a predetermined replication point , Calculate the replication progress rate at the given replication point,
The configuration is as follows.

本発明は、以上のように構成されることにより、ストレージシステムにおいてレプリケーションの進捗率をより実態に近い値で算出することができる。 With the configuration as described above, the present invention can calculate the replication progress rate with a value closer to the actual condition in the storage system.

本発明の実施形態１におけるレプリケーションシステムの構成を示すブロック図である。It is a block diagram which shows the structure of the replication system in Embodiment 1 of this invention. 図１に開示したマスタシステムあるいはレプリカシステムの構成の概略を示すブロック図である。It is a block diagram which shows the outline of a structure of the master system disclosed in FIG. 1, or a replica system. 図１に開示したマスタシステムあるいはレプリカシステムにおけるデータ書き込み処理の様子を説明するための説明図である。It is explanatory drawing for demonstrating the mode of the data writing process in the master system or replica system disclosed in FIG. 図１に開示したマスタシステムあるいはレプリカシステムデータ書き込み処理の様子を説明する説明図である。It is explanatory drawing explaining the mode of the master system or replica system data write processing disclosed in FIG. 図１に開示したマスタシステムにおいて格納されるメタデータの構造を示す図である。It is a figure which shows the structure of the metadata stored in the master system disclosed in FIG. 図１に開示したレプリケーションシステムにてレプリケーションされるデータの分類を示す図である。It is a figure which shows the classification | category of the data replicated with the replication system disclosed in FIG. 図１に開示したレプリケーションシステムによるファイル書き込み処理の動作を示すフローチャートである。3 is a flowchart showing an operation of file writing processing by the replication system disclosed in FIG. 1. 図１に開示したレプリケーションシステムによるレプリケーション処理の動作を示すフローチャートである。3 is a flowchart showing an operation of replication processing by the replication system disclosed in FIG. 1. 図１に開示したレプリケーションシステムによる進捗率算出処理の動作を示すフローチャートである。2 is a flowchart showing an operation of progress rate calculation processing by the replication system disclosed in FIG. 1. 本発明の付記１におけるストレージシステムの構成を示すブロック図である。It is a block diagram which shows the structure of the storage system in attachment 1 of this invention.

＜実施形態１＞
本発明の第１の実施形態を、図１乃至図９を参照して説明する。図１は、レプリケーションシステム全体の構成を示すブロック図である。図２は、レプリケーションシステムを構成するストレージシステムの概略を示すブロック図である。図３乃至図６は、レプリケーションシステムにおけるデータ書き込み処理及びレプリケーション処理の動作を説明するための説明図である。図７乃至図９は、レプリケーションシステムの動作を示すフローチャートである。 <Embodiment 1>
A first embodiment of the present invention will be described with reference to FIGS. FIG. 1 is a block diagram showing the configuration of the entire replication system. FIG. 2 is a block diagram showing an outline of a storage system constituting the replication system. 3 to 6 are explanatory diagrams for explaining the operation of data write processing and replication processing in the replication system. 7 to 9 are flowcharts showing the operation of the replication system.

ここで、本実施形態は、後述する付記に記載のレプリケーションシステムを構成するマスタ側ストレージシステム及びレプリカ側ストレージシステムの具体的な一例を示すものである。そして、以下では、各ストレージシステムが、複数台のサーバコンピュータが接続されて構成されている場合を説明する。但し、本発明におけるレプリケーションシステムを構成する各ストレージシステムは、複数台のコンピュータにて構成されることに限定されず、１台のコンピュータで構成されていてもよい。 Here, the present embodiment shows a specific example of the master-side storage system and the replica-side storage system constituting the replication system described in the appendix to be described later. In the following, a case will be described in which each storage system is configured by connecting a plurality of server computers. However, each storage system constituting the replication system in the present invention is not limited to being constituted by a plurality of computers, and may be constituted by one computer.

［構成］
図１に示すように、本実施形態におけるレプリケーションシステムは、ストレージシステムであるマスタシステム１０及びレプリカシステム２０がネットワークを介して接続されている。また、マスタシステム１０には、ユーザシステム３０が接続されている。 [Constitution]
As shown in FIG. 1, in the replication system of this embodiment, a master system 10 and a replica system 20 that are storage systems are connected via a network. A user system 30 is connected to the master system 10.

まず、上記ユーザシステム３０は、後述するマスタシステム１０上のファイルシステムビュー提供部１４を介して、一般的なファイル・ディレクトリの操作によって、マスタシステム１０に対してデータの読み書き等を行う。また、ユーザシステム３０は、後述するレプリケーションの進捗率をマスタシステム１０に要求し、これに応じて算出された進捗率を閲覧する。 First, the user system 30 reads / writes data from / to the master system 10 through general file / directory operations via the file system view providing unit 14 on the master system 10 described later. In addition, the user system 30 requests the master system 10 for a progress rate of replication, which will be described later, and browses the progress rate calculated accordingly.

上記マスタシステム１０は、演算装置（図示せず）と記憶装置１１とを備えており、演算装置にプログラムが組み込まれることで構築された、マスタ側レプリケーション実行部１２と、データ分類情報管理部１３と、ファイルシステムビュー提供部１４と、進捗情報閲覧部１５と、を備えている。また、レプリカシステム２０は、演算装置（図示せず）と記憶装置２１とを備えており、演算装置にプログラムが組み込まれることで構築された、レプリカ側レプリケーション実行部２２を備えている。 The master system 10 includes an arithmetic device (not shown) and a storage device 11, and is constructed by incorporating a program into the arithmetic device, and a master-side replication execution unit 12 and a data classification information management unit 13 And a file system view providing unit 14 and a progress information browsing unit 15. The replica system 20 includes an arithmetic device (not shown) and a storage device 21, and includes a replica-side replication execution unit 22 constructed by incorporating a program into the arithmetic device.

上記ファイルシステムビュー提供部１４は、ユーザシステム３０からの要求に応じて、記憶装置１１にデータを書き込む処理や、記憶装置１１からデータを読み出す処理を行い、ユーザシステム３０にファイルシステムビューを提供する。ここで、マスタシステム１０は、記憶装置１１にデータを書き込む際に、重複記憶を排除する機能を有する。また、レプリカシステム２０は、マスタシステム１０からレプリケーションされたデータを記憶装置２１に記憶するが、当該レプリカシステム２０も重複記憶を排除する機能を有する。以下、マスタシステム１０及びレプリカシステム２０であるストレージシステム１が備える重複記憶を排除する機能について、図２乃至図４を参照して説明する。 The file system view providing unit 14 performs a process of writing data to the storage device 11 and a process of reading data from the storage device 11 in response to a request from the user system 30, and provides a file system view to the user system 30. . Here, the master system 10 has a function of eliminating duplicate storage when data is written to the storage device 11. The replica system 20 stores the data replicated from the master system 10 in the storage device 21, and the replica system 20 also has a function of eliminating duplicate storage. Hereinafter, the function of eliminating the duplicate storage included in the storage system 1 that is the master system 10 and the replica system 20 will be described with reference to FIGS.

図２に示すように、本実施形態におけるマスタシステム１０及びレプリカシステム２０であるストレージシステム１は、複数のサーバコンピュータが接続された構成を採っている。具体的に、ストレージシステム１は、ストレージシステム１自体における記憶再生動作を制御するサーバコンピュータであるアクセラレータノード２と、データを格納する記憶装置を備えたサーバコンピュータであるストレージノード３と、を備えている。なお、アクセラレータノード２の数とストレージノード３の数は、図２に示したものに限定されず、さらに多くの各ノード２，３が接続されて構成されていてもよい。 As shown in FIG. 2, the storage system 1 that is the master system 10 and the replica system 20 in the present embodiment has a configuration in which a plurality of server computers are connected. Specifically, the storage system 1 includes an accelerator node 2 that is a server computer that controls storage and reproduction operations in the storage system 1 itself, and a storage node 3 that is a server computer including a storage device that stores data. Yes. The number of accelerator nodes 2 and the number of storage nodes 3 are not limited to those shown in FIG. 2, and more nodes 2 and 3 may be connected.

さらに、本実施形態におけるストレージシステム１は、データを分割及び冗長化し、分散して複数の記憶装置に記憶すると共に、記憶するデータの内容に応じて設定される固有のコンテンツアドレスによって、当該データを格納した格納位置を特定するコンテンツアドレスストレージシステムである。 Furthermore, the storage system 1 according to the present embodiment divides and redundantly stores the data, distributes and stores the data in a plurality of storage devices, and stores the data by a unique content address set according to the content of the stored data. It is a content address storage system for specifying a stored location.

なお、以下では、ストレージシステム１が１つのシステムであるとして、当該ストレージシステム１が備えている構成及び機能を説明する。つまり、以下に説明するストレージシステム１が有する構成及び機能は、アクセラレータノード２あるいはストレージノード３のいずれに備えられていてもよい。なお、ストレージシステム１は、図２に示すように、必ずしもアクセラレータノード２とストレージノード３とを備えていることに限定されず、いかなる構成であってもよく、例えば、１台のコンピュータにて構成されていてもよい。さらには、ストレージシステム１は、コンテンツアドレスストレージシステムであることにも限定されず、重複排除機能を有しているストレージシステムであればよい。 Hereinafter, assuming that the storage system 1 is one system, the configuration and functions of the storage system 1 will be described. That is, the configuration and function of the storage system 1 described below may be provided in either the accelerator node 2 or the storage node 3. As shown in FIG. 2, the storage system 1 is not necessarily limited to including the accelerator node 2 and the storage node 3, and may have any configuration, for example, a single computer. May be. Furthermore, the storage system 1 is not limited to being a content address storage system, and may be a storage system having a deduplication function.

次に、本実施形態におけるストレージシステム１によるコンテンツアドレスを利用したデータ書き込み処理及び読み出し処理について、図３及び図４を参照して説明する。 Next, data write processing and read processing using content addresses by the storage system 1 in this embodiment will be described with reference to FIG. 3 and FIG.

まず、図３及び図４の矢印Ｙ１に示すように、ストレージシステム１が書き込み要求されたファイルＡの入力を受ける。すると、図３及び図４の矢印Ｙ２に示すように、ファイルＡを所定容量（例えば、６４ＫＢ）のブロックデータＤに分割する。 First, as indicated by an arrow Y1 in FIGS. 3 and 4, the storage system 1 receives an input of a file A requested to be written. Then, as shown by an arrow Y2 in FIGS. 3 and 4, the file A is divided into block data D having a predetermined capacity (for example, 64 KB).

続いて、分割されたブロックデータＤのデータ内容に基づいて、当該データ内容を代表する固有のハッシュ値Ｈを算出する（図４の矢印Ｙ３）。例えば、ハッシュ値Ｈは、予め設定されたハッシュ関数を用いて、ブロックデータＤのデータ内容から算出する。 Subsequently, based on the data content of the divided block data D, a unique hash value H representing the data content is calculated (arrow Y3 in FIG. 4). For example, the hash value H is calculated from the data content of the block data D using a preset hash function.

続いて、ファイルＡのブロックデータＤのハッシュ値Ｈを用いて、当該ブロックデータＤが既に格納されているか否かを調べる。具体的には、まず、既に格納されているブロックデータＤは、そのハッシュ値Ｈと格納位置を表すコンテンツアドレスＣＡとが、関連付けられてＭＦＩ（ＭａｉｎＦｒａｇｍｅｎｔＩｎｄｅｘ）ファイルに登録されている。従って、格納前に算出したブロックデータＤのハッシュ値ＨがＭＦＩファイル内に存在している場合には、既に同一内容のブロックデータＤが格納されていると判断できる（図４の矢印Ｙ４）。この場合には、格納前のブロックデータＤのハッシュ値Ｈと一致したＭＦＩ内のハッシュ値Ｈに関連付けられているコンテンツアドレスＣＡを、当該ＭＦＩファイルから取得する。そして、このコンテンツアドレスＣＡを、書き込み要求されたブロックデータＤのコンテンツアドレスＣＡとして返却する。 Subsequently, using the hash value H of the block data D of the file A, it is checked whether or not the block data D is already stored. Specifically, first, the block data D that has already been stored has its hash value H and a content address CA representing the storage position associated with each other and registered in an MFI (Main Fragment Index) file. Therefore, if the hash value H of the block data D calculated before storage exists in the MFI file, it can be determined that the same block data D has already been stored (arrow Y4 in FIG. 4). In this case, the content address CA associated with the hash value H in the MFI that matches the hash value H of the block data D before storage is acquired from the MFI file. Then, this content address CA is returned as the content address CA of the block data D requested to be written.

そして、返却されたコンテンツアドレスＣＡが参照する既に格納されているデータを、書き込み要求されたブロックデータＤとして使用する。つまり、書き込み要求されたブロックデータＤの格納先として、返却されたコンテンツアドレスＣＡが参照する領域を指定することで、当該書き込み要求されたブロックデータＤを記憶したこととする。これにより、書き込み要求にかかるブロックデータＤを、実際に記憶装置内に記憶する必要がなくなる。 The already stored data referred to by the returned content address CA is used as the block data D requested to be written. That is, it is assumed that the block data D requested to be written is stored by designating an area referred to by the returned content address CA as a storage destination of the block data D requested to be written. This eliminates the need to actually store the block data D relating to the write request in the storage device.

また、書き込み要求にかかるブロックデータＤがまだ記憶されていないと判断された場合には、以下のようにして書き込み要求にかかるブロックデータＤの書き込みを行う。まず、書き込み要求にかかるブロックデータＤを圧縮して、図４の矢印Ｙ５に示すように、複数の所定の容量のフラグメントデータに分割する。例えば、図３の符号Ｄ１〜Ｄ９に示すように、９つのフラグメントデータ（分割データ４１）に分割する。そしてさらに、分割したフラグメントデータのうちいくつかが欠けた場合であっても、元となるブロックデータを復元可能なよう冗長データを生成し、上記分割したフラグメントデータ４１に追加する。例えば、図３の符号Ｄ１０〜Ｄ１２に示すように、３つのフラグメントデータ（冗長データ４２）を追加する。これにより、９つの分割データ４１と、３つの冗長データ４２とにより構成される１２個のフラグメントデータからなるデータセット４０を生成する。 If it is determined that the block data D related to the write request is not yet stored, the block data D related to the write request is written as follows. First, the block data D related to the write request is compressed and divided into a plurality of fragment data of a predetermined capacity as indicated by an arrow Y5 in FIG. For example, as shown by symbols D1 to D9 in FIG. 3, the data is divided into nine fragment data (divided data 41). Further, even if some of the divided fragment data is missing, redundant data is generated so that the original block data can be restored and added to the divided fragment data 41. For example, three pieces of fragment data (redundant data 42) are added as indicated by reference numerals D10 to D12 in FIG. As a result, a data set 40 composed of twelve fragment data composed of nine divided data 41 and three redundant data 42 is generated.

続いて、上述したように生成されたデータセットを構成する各フラグメントデータを、記憶装置に形成された各記憶領域に、それぞれ分散して格納する。例えば、図３に示すように、１２個のフラグメントデータＤ１〜Ｄ１２を生成した場合には、複数の記憶装置内にそれぞれ形成したデータ格納ファイルに、各フラグメントデータＤ１〜Ｄ１２を１つずつそれぞれ格納する（図４の矢印Ｙ６参照）。 Subsequently, each fragment data constituting the data set generated as described above is distributed and stored in each storage area formed in the storage device. For example, as shown in FIG. 3, when 12 pieces of fragment data D1 to D12 are generated, each piece of fragment data D1 to D12 is stored in a data storage file formed in each of a plurality of storage devices. (See arrow Y6 in FIG. 4).

続いて、ストレージシステム１は、上述したように格納したフラグメントデータＤ１〜Ｄ１２の格納位置、つまり、当該フラグメントデータＤ１〜Ｄ１２にて復元されるブロックデータＤの格納位置を表すコンテンツアドレスＣＡを生成して管理する。具体的には、格納したブロックデータＤの内容に基づいて算出したハッシュ値Ｈの一部（ショートハッシュ）（例えば、ハッシュ値Ｈの先頭８Ｂ（バイト））と、論理格納位置を表す情報と、を組み合わせて、コンテンツアドレスＣＡを生成する。そして、このコンテンツアドレスＣＡを、ストレージシステム１内のファイルシステムに返却する（図４の矢印Ｙ７）。すると、ストレージシステム１は、バックアップ対象データのファイル名などの識別情報と、コンテンツアドレスＣＡとを関連付けてファイルシステムで管理する。 Subsequently, the storage system 1 generates a content address CA indicating the storage position of the fragment data D1 to D12 stored as described above, that is, the storage position of the block data D restored by the fragment data D1 to D12. Manage. Specifically, a part of the hash value H (short hash) calculated based on the contents of the stored block data D (for example, the top 8B (bytes) of the hash value H), information indicating the logical storage position, Are combined to generate a content address CA. Then, this content address CA is returned to the file system in the storage system 1 (arrow Y7 in FIG. 4). Then, the storage system 1 manages identification information such as the file name of the backup target data and the content address CA in association with the file system.

また、ブロックデータＤのコンテンツアドレスＣＡと、当該ブロックデータＤのハッシュ値Ｈと、を関連付けて、各ストレージノード３がＭＦＩファイルにて管理する。このように、上記コンテンツアドレスＣＡは、ファイルを特定する情報やハッシュ値Ｈなどと関連付けられて、アクセラレータノード２やストレージノード３の記憶装置に格納される。 Further, each storage node 3 manages the content address CA of the block data D and the hash value H of the block data D by using the MFI file. As described above, the content address CA is stored in the storage device of the accelerator node 2 or the storage node 3 in association with the information specifying the file, the hash value H, or the like.

さらに、ストレージシステム１は、上述したように格納したファイルを読み出す制御を行う。例えば、ストレージシステム１に対して、特定のファイルを指定して読み出し要求があると、まず、ファイルシステムに基づいて、読み出し要求にかかるファイルに対応するハッシュ値の一部であるショートハッシュと論理位置の情報からなるコンテンツアドレスＣＡを指定する。そして、コンテンツアドレスＣＡがＭＦＩファイルに登録されているか否かを調べる。登録されていなければ、要求されたデータは格納されていないため、エラーを返却する。 Furthermore, the storage system 1 performs control to read out the stored file as described above. For example, when a read request is made by designating a specific file to the storage system 1, first, based on the file system, a short hash and a logical position that are part of a hash value corresponding to the file for the read request A content address CA consisting of the above information is designated. Then, it is checked whether or not the content address CA is registered in the MFI file. If it is not registered, the requested data is not stored and an error is returned.

一方、読み出し要求にかかるコンテンツアドレスＣＡが登録されている場合には、上記コンテンツアドレスＣＡにて指定される格納位置を特定し、この特定された格納位置に格納されている各フラグメントデータを、読み出し要求されたデータとして読み出す。このとき、各フラグメントが格納されているデータ格納ファイルと、当該データ格納ファイルのうち１つのフラグメントデータの格納位置が分かれば、同一の格納位置から他のフラグメントデータの格納位置を特定することができる。 On the other hand, when the content address CA related to the read request is registered, the storage location specified by the content address CA is specified, and each fragment data stored in the specified storage location is read. Read as requested data. At this time, if the data storage file in which each fragment is stored and the storage position of one fragment data in the data storage file are known, the storage position of other fragment data can be specified from the same storage position. .

そして、読み出し要求に応じて読み出した各フラグメントデータからブロックデータＤを復元する。さらに、復元したブロックデータＤを複数連結し、ファイルＡなどの一群のデータに復元して返却する。 Then, the block data D is restored from each fragment data read in response to the read request. Further, a plurality of restored block data D are concatenated, restored to a group of data such as file A, and returned.

次に、上述したように重複排除機能を有するマスタシステム１０（マスタ側ストレージシステム）の記憶装置１１内に書き込まれたデータを、同じく重複排除機能を有するレプリカシステム２０（レプリカ側ストレージシステム）の記憶装置２１にレプリケーションする際における進捗率を算出するための構成を、さらに説明する。 Next, as described above, the data written in the storage device 11 of the master system 10 (master side storage system) having the deduplication function is stored in the replica system 20 (replica side storage system) also having the deduplication function. A configuration for calculating the progress rate when replicating to the apparatus 21 will be further described.

まず、マスタシステム１０側のファイルシステムビュー提供部１４は、記憶装置１１にファイルを書き込む際に、上述したように重複記憶を排除してブロックデータを書き込む。このとき、ブロックデータが重複排除されたか否かを判別し、当該ブロックデータのアドレスに対応付けて、重複排除されたか否かを表す重複排除フラグを設定する。そして、図５に示すように、ファイルを構成するブロックデータのアドレス毎に重複排除フラグを設定したメタデータを作成し、記憶装置１１に記憶する。このメタデータを調べることで、マスタシステム１０は、記憶装置１１に重複排除されずに格納されているブロックデータを認識することができ、後述するように、マスタシステム１０側でＢ分類あるいはＤ分類に該当するブロックデータのサイズを把握することができる。 First, when writing a file to the storage device 11, the file system view providing unit 14 on the master system 10 side writes block data by eliminating redundant storage as described above. At this time, it is determined whether or not the block data is deduplicated, and a deduplication flag indicating whether or not deduplication is performed is set in association with the address of the block data. Then, as shown in FIG. 5, metadata in which a deduplication flag is set for each block data address constituting the file is created and stored in the storage device 11. By examining this metadata, the master system 10 can recognize the block data stored in the storage device 11 without being deduplicated. As will be described later, the master system 10 classifies the B class or the D class. The size of block data corresponding to can be grasped.

上記マスタ側レプリケーション実行部１２（レプリケーション実行部）は、レプリカ側レプリケーション実行部２２（レプリケーション実行部）とネットワークを介して通信を行い、マスタシステム１０の記憶装置１１に記憶されているデータをレプリカシステム２０の記憶装置２１にレプリケーションする処理を行う。マスタ側レプリケーション実行部１２は、記憶装置１１に記憶されているデータをレプリケーションする際に、レプリカ側レプリケーション実行部２２と協働して、レプリケーションする対象となるデータが、マスタシステム１０内で重複排除されて格納されているか否か、及び、レプリケーション後にレプリカシステム２０内で重複排除されて格納されるか否か、を調査する。特に、レプリケーションの対象となるデータが、以下の４種類のいずれに該当するかを調べる。なお、以下の分類表を図６に示す。 The master-side replication execution unit 12 (replication execution unit) communicates with the replica-side replication execution unit 22 (replication execution unit) via the network, and the data stored in the storage device 11 of the master system 10 is transferred to the replica system. A process of replicating to 20 storage devices 21 is performed. When the data stored in the storage device 11 is replicated, the master side replication execution unit 12 cooperates with the replica side replication execution unit 22 so that the data to be replicated is deduplicated in the master system 10. It is investigated whether or not it is stored and whether or not it is deduplicated and stored in the replica system 20 after replication. In particular, it is checked which of the following four types of data to be replicated corresponds to. The following classification table is shown in FIG.

Ａ：マスタシステムでの書き込み時、すでにマスタシステム中に存在したために重複排除され、実際にデータの書き込みは行われなかった、かつ、レプリカシステムにも同一のデータが存在する。（第三分類）
Ｂ：マスタシステムでの書き込み時、マスタシステム中に存在しなかったためデータの書き込みが実際に行われたが、レプリカシステムには同一のデータが存在する。（第一分類）
Ｃ：マスタシステムでの書き込み時、すでにマスタシステム中に存在したために重複排除され、実際にデータの書き込みは行われなかったが、レプリカシステムには同一のデータが存在しない。（第二分類）
Ｄ：マスタシステムでの書き込み時、マスタシステム中に存在しなかったためデータの書き込みが実際に行われた、かつ、レプリカシステムにも同一のデータは存在しない。（第四分類） A: At the time of writing in the master system, duplication is eliminated because it already exists in the master system, data is not actually written, and the same data also exists in the replica system. (Third classification)
B: At the time of writing in the master system, data was actually written because it did not exist in the master system, but the same data exists in the replica system. (First classification)
C: At the time of writing in the master system, duplication is eliminated because it already exists in the master system, and data is not actually written, but the same data does not exist in the replica system. (Second classification)
D: At the time of writing in the master system, data was actually written because it did not exist in the master system, and the same data does not exist in the replica system. (Fourth classification)

具体的に、マスタ側レプリケーション実行部１２は、レプリケーションするブロックデータのメタデータを記憶装置１１から読み出し、重複排除フラグから、当該ブロックデータが重複排除されているか否かを調べる。また、マスタ側レプリケーション実行部１２は、レプリケーションするブロックデータのダイジェストであるハッシュ値を記憶装置１１から読み出し、レプリカ側レプリケーション実行部２２に既に記憶されているか否かの存在確認の要求を行う。これを受けて、レプリカ側レプリケーション実行部２２は、存在確認を要求されたブロックデータのハッシュ値から同一内容のブロックデータを記憶装置２１に記憶しているか否かを確認し、確認結果をマスタ側レプリケーション実行部１２に通知する。 Specifically, the master side replication execution unit 12 reads the metadata of the block data to be replicated from the storage device 11, and checks whether or not the block data is deduplicated from the deduplication flag. Further, the master side replication execution unit 12 reads out a hash value, which is a digest of the block data to be replicated, from the storage device 11 and makes a request for existence confirmation as to whether or not it is already stored in the replica side replication execution unit 22. In response to this, the replica-side replication execution unit 22 confirms whether or not block data having the same contents is stored in the storage device 21 from the hash value of the block data requested to be present, and the confirmation result is sent to the master side. Notify the replication execution unit 12.

そして、マスタ側レプリケーション実行部１２は、上述した調査の結果、つまり、レプリカ側レプリケーション実行部２２から受け取った確認結果（各ブロックデータがＡ又はＢの分類に該当するか、Ｃ又はＤの分類に該当するか、を判断可能な情報）と、メタデータ中の各ブロックデータに対応する重複排除フラグと、に基づいて、各ブロックデータが、Ａ，Ｂ，Ｃ，Ｄのいずれの分類に該当するかを判断する。このとき、マスタ側レプリケーション実行部１２は、各分類に該当するブロックデータのサイズ（容量）を取得し、各分類に該当するブロックデータのサイズを分類ごとに加算して、データ分類情報管理部１３に格納する。そして、マスタ側レプリケーション実行部１２は、上記Ｃ又はＤに分類されるブロックデータについては、記憶装置１１から読み出し、レプリカ側レプリケーション実行部２２に対してレプリケーションのために送信する。 Then, the master-side replication execution unit 12 confirms the result of the above-described investigation, that is, the confirmation result received from the replica-side replication execution unit 22 (whether each block data corresponds to the A or B classification, or the C or D classification Each block data corresponds to any of A, B, C, and D based on the information that can be determined whether or not) and a deduplication flag corresponding to each block data in the metadata. Determine whether. At this time, the master-side replication execution unit 12 acquires the size (capacity) of the block data corresponding to each classification, adds the size of the block data corresponding to each classification for each classification, and the data classification information management unit 13 To store. Then, the master side replication execution unit 12 reads out the block data classified as C or D from the storage device 11 and transmits it to the replica side replication execution unit 22 for replication.

ここで、データ分類情報管理部１３には、上述したＡ、Ｂ，Ｃ，Ｄのそれぞれの分類ごとのカウンタを、分類情報として保持する。なお、この分類情報は、レプリケーション開始時に初期化される。 Here, the data classification information management unit 13 holds the above-described counters for each of the classifications A, B, C, and D as classification information. This classification information is initialized at the start of replication.

上記進捗情報閲覧部１５（進捗率算出部）は、ユーザシステム３０から進捗率の閲覧要求を受けて、進捗率の算出を行い、ユーザにて閲覧可能なようユーザシステム３０に提供する。進捗率は、レプリケーション開始から進捗率算出時点までに、マスタシステム１０からレプリカシステム２０に転送されたデータ量（Ｃ’＋Ｄ’）を、レプリケーション全体においてマスタシステム１０からレプリカシステム２０に転送されて当該レプリカシステム２０に重複排除されずに新たに格納されうるデータ総量（Ｃ＋Ｄ）で除算して、算出する。このとき、上記Ｂ又はＤに分類されるマスタシステム１０に重複排除されずに記憶されているデータのデータ量（Ｂ＋Ｄ）を、進捗率算出時点においてＢ及びＣの分類と判別されたデータ量（Ｂ’，Ｃ’）で補正することで、レプリケーション全体においてマスタシステム１０からレプリカシステム２０に転送されうるデータ総量（Ｃ＋Ｄ）を推定する。なお、進捗率の具体的な計算方法は、後の動作説明時に詳しく説明する。 The progress information browsing unit 15 (progress rate calculation unit) receives a request for browsing the progress rate from the user system 30, calculates the progress rate, and provides the user system 30 with the user so that the user can browse. The progress rate is calculated by transferring the data amount (C ′ + D ′) transferred from the master system 10 to the replica system 20 from the start of replication to the time when the progress rate is calculated. It is calculated by dividing by the total amount of data (C + D) that can be newly stored without being deduplicated in the replica system 20. At this time, the data amount (B + D) of the data stored in the master system 10 classified as B or D without being deduplicated is determined as the data amount (B + C) determined at the time of progress rate calculation (B + D). By correcting with B ′, C ′), the total amount of data (C + D) that can be transferred from the master system 10 to the replica system 20 in the entire replication is estimated. A specific method of calculating the progress rate will be described in detail when the operation is described later.

［動作］
次に、上述したレプリケーションシステムの動作を、図７乃至図９のフローチャートを参照して説明する。まず、図７を参照して、ユーザシステム３０からの要求に応じて、マスタシステム１０にファイルを書き込むときの動作を説明する。 [Operation]
Next, the operation of the above-described replication system will be described with reference to the flowcharts of FIGS. First, with reference to FIG. 7, an operation when a file is written in the master system 10 in response to a request from the user system 30 will be described.

ファイルシステムビュー提供部１４は、ユーザシステム３０からファイルの書き込み要求を受け取ると、記憶装置１１に書き込みを要求する（ステップＳ１）。書き込み処理は、上述したように、書き込み要求にかかるファイルを構成するブロックデータと同一内容のブロックデータが既に記憶されているか否かの重複判定を行い（ステップＳ２）、既に記憶されているブロックデータが存在する場合はそれを参照し、存在しない場合には新規に記憶装置１１に格納する（ステップＳ３）。 Upon receiving a file write request from the user system 30, the file system view providing unit 14 requests the storage device 11 to write (step S1). As described above, the writing process determines whether or not the block data having the same contents as the block data constituting the file related to the write request is already stored (step S2), and the already stored block data. Is stored in the storage device 11 if it does not exist (step S3).

ファイルの書き込みが完了すると、記憶装置１１から書き込んだファイルを構成するブロックデータのアドレスの一覧を受け取り、それぞれについて重複排除されたかどうか（ＡもしくはＣに当たるか、ＢもしくはＤに当たるか）を示す重複排除フラグを設定して、ファイルのメタデータを作成する。そして、ファイルのメタデータを記憶装置１１に書き込む（ステップＳ４）。 When the writing of the file is completed, a list of addresses of block data constituting the written file is received from the storage device 11, and deduplication indicating whether each is deduplicated (corresponding to A or C or B or D) Set flags and create file metadata. Then, the file metadata is written in the storage device 11 (step S4).

次に、図８を参照して、レプリケーションの実行時の動作を説明する。マスタ側レプリケーション実行部１２は、レプリケーションの開始時に、データ分類情報管理部１３に対して分類情報の記録開始を要求する。データ分類情報管理部１３は、記録開始の要求を受けて、初期化した分類情報（ゼロクリアしたカウンタ群）を用意する（ステップＳ１１）。 Next, an operation at the time of executing replication will be described with reference to FIG. The master-side replication execution unit 12 requests the data classification information management unit 13 to start recording classification information when replication is started. In response to the request to start recording, the data classification information management unit 13 prepares initialized classification information (a counter group cleared to zero) (step S11).

続いて、マスタ側レプリケーション実行部１２は、記憶装置１１に対してレプリケーションする対象となるブロックデータのダイジェスト（ハッシュ値）の読み出しを要求して、当該ダイジェストを取得する（ステップＳ１２）。そして、マスタ側レプリケーション実行部１２は、レプリケーション対象のブロックデータのダイジェストをレプリカ側レプリケーション実行部２２に送信して、レプリケーション対象のブロックデータの存在確認の要求を行う（ステップＳ１３）。 Subsequently, the master-side replication execution unit 12 requests the storage device 11 to read the digest (hash value) of the block data to be replicated, and acquires the digest (step S12). Then, the master-side replication execution unit 12 sends a digest of the block data to be replicated to the replica-side replication execution unit 22, and requests the presence of the block data to be replicated (step S13).

レプリカ側レプリケーション実行部２２は、マスタ側レプリケーション実行部１２から受け取ったブロックデータのダイジェストを記憶装置２１に渡し、同一のブロックデータを所持しているかどうかの確認を行う。そして、レプリカ側レプリケーション実行部２２は、存在確認結果をマスタ側レプリケーション実行部１２に知らせる。 The replica-side replication execution unit 22 passes the block data digest received from the master-side replication execution unit 12 to the storage device 21 and confirms whether the same block data is possessed. Then, the replica side replication execution unit 22 notifies the master side replication execution unit 12 of the existence confirmation result.

マスタ側レプリケーション実行部１２は、レプリカ側レプリケーション実行部２２から、ブロックデータがレプリカシステム２０に存在しているか否かの確認結果、つまり、各ブロックデータがＡもしくはＢに当たるか、ＣもしくはＤに当たるかを判断可能な情報を受け取る。また、マスタ側レプリケーション実行部１２は、記憶装置１１からメタデータを取得し、当該メタデータ中の各ブロックデータに対応する重複排除フラグに基づいて、ブロックデータがマスタシステム内で重複排除されているか否かを確認する（ステップＳ１４）。そして、マスタ側レプリケーション実行部１２は、レプリカ側レプリケーション実行部２２の存在確認結果と、マスタ側レプリケーション実行部１２による重複排除の確認結果と、に基づいて、各ブロックデータがＡ，Ｂ，Ｃ，Ｄの何れに分類されるかを判定する（ステップＳ１５）。 The master side replication execution unit 12 confirms whether or not the block data exists in the replica system 20 from the replica side replication execution unit 22, that is, whether each block data hits A or B, C or D Receive information that can be determined. Further, the master side replication execution unit 12 acquires the metadata from the storage device 11, and based on the deduplication flag corresponding to each block data in the metadata, whether the block data is deduplicated in the master system. It is confirmed whether or not (step S14). Then, the master side replication execution unit 12 determines that each block data is A, B, C, based on the existence confirmation result of the replica side replication execution unit 22 and the confirmation result of deduplication by the master side replication execution unit 12. Which of D is classified is determined (step S15).

続いて、マスタ側レプリケーション実行部１２は、データ分類情報管理部１３に対して、判定したブロックデータの分類に対応するカウンタに、当該ブロックデータのサイズを加算する要求を行う（ステップＳ１６）。そして、ＣもしくはＤに分類されるブロックデータ、つまり、レプリカシステム２０に存在しないブロックデータについては、マスタ側レプリケーション実行部１２が記憶装置１１から読み出して、レプリカ側レプリケーション実行部２２に対して送信する。 Subsequently, the master-side replication execution unit 12 requests the data classification information management unit 13 to add the size of the block data to the counter corresponding to the determined block data classification (step S16). Then, block data classified as C or D, that is, block data that does not exist in the replica system 20 is read from the storage device 11 by the master side replication execution unit 12 and transmitted to the replica side replication execution unit 22. .

レプリカ側レプリケーション実行部２２は、マスタ側レプリケーション実行部１２から受け取ったブロックデータの書き込みの要求を記憶装置２１に対して行い、当該記憶装置２１に書き込みを行う。そして、上述したステップＳ１２からステップＳ１６までの処理を、すべてのレプリケーション対象となるブロックデータに対して行う（ステップＳ１７）。 The replica-side replication execution unit 22 requests the storage device 21 to write the block data received from the master-side replication execution unit 12, and writes to the storage device 21. Then, the processing from step S12 to step S16 described above is performed for all block data to be replicated (step S17).

次に、図９を参照して、進捗率の提供時の動作を説明する。進捗情報閲覧部１５は、レプリケーション実行中に、ユーザシステム３０からの進捗率の取得の要求を受けて、マスタ側レプリケーション実行部１２に対して進捗率の取得を要求する（ステップＳ２１）。マスタ側レプリケーション実行部１２は、データ分類情報管理部１３に対して分類情報の取得を要求し、Ａ，Ｂ，Ｃ，Ｄの各分類についてその時点のカウンタ値を取得し、以下の式で進捗率を計算して、進捗情報閲覧部１５に返却する。なお、以下の進捗率の計算は、進捗情報閲覧部１５にて実行されてもよい。 Next, the operation at the time of providing the progress rate will be described with reference to FIG. The progress information browsing unit 15 receives a request for acquisition of the progress rate from the user system 30 during the execution of replication, and requests the master side replication execution unit 12 to acquire the progress rate (step S21). The master-side replication execution unit 12 requests the data classification information management unit 13 to acquire classification information, acquires the counter value at that time for each of A, B, C, and D, and progresses according to the following formula: The rate is calculated and returned to the progress information browsing unit 15. The following progress rate calculation may be executed by the progress information browsing unit 15.

進捗率＝（Ｃ’＋Ｄ’）÷（（Ｂ＋Ｄ）−（（Ｂ’＋Ｄ’）÷（Ｂ＋Ｄ））×（Ｂ’÷（Ｂ’＋Ｄ’））×（Ｂ＋Ｄ）＋（（Ａ’＋Ｃ’）÷（Ａ＋Ｃ））×（Ｃ’÷（Ａ’＋Ｃ’））×（Ａ＋Ｃ））
＝（Ｃ’＋Ｄ’）÷（Ｂ＋Ｄ−Ｂ’＋Ｃ’） Progress rate = (C ′ + D ′) ÷ ((B + D) − ((B ′ + D ′) ÷ (B + D)) × (B ′ ÷ (B ′ + D ′)) × (B + D) + ((A ′ + C ′ ) ÷ (A + C)) × (C ′ ÷ (A ′ + C ′)) × (A + C))
= (C ′ + D ′) ÷ (B + D−B ′ + C ′)

上記式において、Ａ，Ｂ，Ｃ，Ｄは、対応する各分類の最終的なデータ量（レプリケーション完了時点での集計結果）を意味する。また、Ａ’，Ｂ’，Ｃ’，Ｄ’は、レプリケーション実行中のある時点（例えば、進捗率の要求時点）でデータ分類情報管理部１３から取得される、対応する各分類の分類情報を意味する。 In the above formula, A, B, C, and D mean the final data amount of each corresponding classification (the total result at the time of completion of replication). A ′, B ′, C ′, and D ′ indicate the classification information of each corresponding classification acquired from the data classification information management unit 13 at a certain point in time during execution of replication (for example, when the progress rate is requested). means.

なお、上記式中の一部の式は、それぞれ以下の内容を表している。
（Ｂ’＋Ｄ’）÷（Ｂ＋Ｄ）：マスタシステム１０で書き込み時に重複排除されなかったブロックデータの処理の進捗率
（Ａ’＋Ｃ’）÷（Ａ＋Ｃ）：マスタシステム１０で書き込み時に重複排除されたブロックデータの処理の進捗率
Ｂ’÷（Ｂ’＋Ｄ’）：マスタシステム１０で書き込み時に重複排除されなかったブロックデータのうち、レプリケーション処理済かつ、レプリカサイト２０で重複排除されたブロックデータの割合
Ｃ’÷（Ａ’＋Ｃ’））×（Ａ＋Ｃ）：マスタシステム１０で書き込み時に重複排除されたブロックデータのうち、レプリケーション処理済かつ、レプリカサイトで重複排除されなかったブロックデータの割合 Note that some of the above expressions represent the following contents.
(B ′ + D ′) ÷ (B + D): Progress rate of processing of block data not deduplicated at the time of writing in the master system 10 (A ′ + C ′) ÷ (A + C): Deduplicated at the time of writing in the master system 10 Progress rate of block data processing B ′ ÷ (B ′ + D ′): Ratio of block data that has undergone replication processing and has been deduplicated at replica site 20 among block data that has not been deduplicated at the time of writing by master system 10 C ′ ÷ (A ′ + C ′)) × (A + C): Ratio of block data that has been subjected to replication processing and not deduplicated at the replica site out of block data that has been deduplicated at the time of writing in the master system

上記計算式では、ＡとＤに分類されるデータが大半で、ＢとＣはほとんど存在しない、つまり、レプリカシステム２０に格納されるデータの大半はマスタシステム１０に格納されているデータの複製である、という仮定の下で、まずレプリケーション開始時点での処理量の予測値を、「Ｂ＋Ｄ」とする。すなわち、ＢとＣはほとんど存在しないので、レプリケーション全体における処理量Ｃ＋Ｄ≒Ｂ＋Ｄと考える。この「Ｂ＋Ｄ」の値は、マスタシステム１０にて重複排除フラグを参照することで得ることができる。そして、処理の過程で得られる情報（Ａ’，Ｂ’，Ｃ’，Ｄ’）をもとに、上記予測値「Ｂ＋Ｄ」の補正を行い、その時点の転送データ量（Ｃ’＋Ｄ’）を割ることで、進捗率を計算する。 In the above formula, most of the data classified into A and D exists, and B and C hardly exist. That is, most of the data stored in the replica system 20 is a copy of the data stored in the master system 10. Under the assumption that there is, first, the predicted value of the processing amount at the start of replication is assumed to be “B + D”. That is, since B and C hardly exist, it is considered that the processing amount C + D≈B + D in the entire replication. The value “B + D” can be obtained by referring to the deduplication flag in the master system 10. Then, based on information (A ′, B ′, C ′, D ′) obtained in the course of processing, the predicted value “B + D” is corrected, and the transfer data amount (C ′ + D ′) at that time is corrected. The progress rate is calculated by dividing.

具体的に、上記処理量の予測値「Ｂ＋Ｄ」の補正は、次のように行う。まず、初期値「Ｂ＋Ｄ」から、ＢもしくはＤに分類されるデータについては、重複排除されたデータの比率（Ｂ’÷（Ｂ’＋Ｄ’））に総量（Ｂ＋Ｄ）を掛け、ＢとＤの発生の偏りによるゆらぎの影響を軽減するために、ＢとＤに属するデータの処理の完了比率（（Ｂ’＋Ｄ’）÷（Ｂ＋Ｄ））を掛けた数、を減算する。また、初期値「Ｂ＋Ｄ」に対して、ＡもしくはＣに分類されるデータについては、重複排除されたデータの比率（Ａ’÷（Ａ’＋Ｃ’））に総量（Ａ＋Ｃ）を掛け、ＡとＣの発生の偏りによるゆらぎの影響を軽減するために、ＡとＣに属するデータの処理の完了比率（（Ａ’＋Ｃ’）÷（Ａ＋Ｃ））を掛けた数、を加算する。 Specifically, the processing amount prediction value “B + D” is corrected as follows. First, for data classified into B or D from the initial value “B + D”, the ratio (B ′ ÷ (B ′ + D ′)) of data that has been deduplicated is multiplied by the total amount (B + D), and B and D In order to reduce the influence of fluctuation due to the occurrence bias, the number obtained by multiplying the processing completion ratio ((B ′ + D ′) ÷ (B + D)) of data belonging to B and D is subtracted. Further, for data classified as A or C with respect to the initial value “B + D”, the ratio (A ′ ÷ (A ′ + C ′)) of deduplicated data is multiplied by the total amount (A + C), and A and In order to reduce the influence of fluctuation due to the bias in the occurrence of C, a number obtained by multiplying the processing completion ratio ((A ′ + C ′) ÷ (A + C)) of data belonging to A and C is added.

上述した計算における処理量（Ｃ＋Ｄ）の予測値の補正方法は、初期値「Ｂ＋Ｄ」から、レプリケーションの過程で得られる「Ｂ’」を引き、「Ｃ’」を足すことと同等であり、最終的に真の処理量である「Ｃ＋Ｄ」を適切に表すことができる。 The method for correcting the predicted value of the processing amount (C + D) in the above calculation is equivalent to subtracting “B ′” obtained in the replication process from the initial value “B + D” and adding “C ′”. Therefore, “C + D” that is a true processing amount can be appropriately expressed.

以上のように、本発明によると、レプリケーションの処理の過程で、処理済みのデータを上記Ａ，Ｂ，Ｃ，Ｄに分類し、それぞれ集計を行い、このデータを元に、処理量「Ｃ＋Ｄ」の推定値を得て、進捗率を算出する。従って、より実態に近い進捗率を得ることができる。 As described above, according to the present invention, in the process of replication processing, the processed data is classified into the above A, B, C, and D, and totalized, and the processing amount “C + D” is based on this data. The estimated value of is obtained and the progress rate is calculated. Therefore, the progress rate closer to the actual situation can be obtained.

＜付記＞
上記実施形態の一部又は全部は、以下の付記のようにも記載されうる。以下、本発明におけるレプリケーションシステム（図１０参照）、プログラム、レプリケーション方法の構成の概略を説明する。但し、本発明は、以下の構成に限定されない。 <Appendix>
Part or all of the above-described embodiment can be described as in the following supplementary notes. The outline of the configuration of the replication system (see FIG. 10), program, and replication method in the present invention will be described below. However, the present invention is not limited to the following configuration.

（付記１）
それぞれ記憶装置１１０，２１０を備えたマスタ側ストレージシステム１００及びレプリカ側ストレージシステム２００が、データを記憶装置に格納すると共に、当該記憶装置に既に記憶されているデータと同一のデータ内容の他のデータを記憶装置に格納する場合に、当該記憶装置に既に記憶されているデータを他のデータとして参照させることで重複記憶を排除する機能を備えており、
前記マスタ側ストレージシステムの記憶装置に記憶されているデータを、前記レプリカ側ストレージシステムの記憶装置にレプリケーションするレプリケーション実行部１０１，２０１と、
所定のレプリケーション時点における当該レプリケーションの進捗率を算出する進捗率算出部１０２と、を備え、
前記レプリケーション実行部１０１，２０１は、前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムにレプリケーションするデータが、前記レプリカ側ストレージシステムの記憶装置に重複排除されて格納されるか否かを調査すると共に、当該レプリケーションするデータのうち前記レプリカ側ストレージシステムに重複排除されずに格納されるデータを前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムに転送し、
前記進捗率算出部１０２は、前記レプリケーションするデータの調査結果に基づいて、レプリケーション全体において前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムに転送されて当該レプリカ側ストレージシステムの記憶装置に重複排除されずに新たに格納されるデータのデータ総量を推定すると共に、当該推定したデータ総量と、所定のレプリケーション時点において前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムに転送されたデータのデータ量と、に基づいて、当該所定のレプリケーション時点におけるレプリケーションの進捗率を算出する、
レプリケーションシステム。 (Appendix 1)
The master-side storage system 100 and the replica-side storage system 200 each having the storage devices 110 and 210 store the data in the storage device, and other data having the same data content as the data already stored in the storage device Is stored in the storage device, the data already stored in the storage device is referred to as other data, the function to eliminate the duplicate storage,
Replication execution units 101 and 201 for replicating data stored in the storage device of the master storage system to the storage device of the replica storage system;
A progress rate calculation unit 102 that calculates the progress rate of the replication at a predetermined replication time point,
The replication execution units 101 and 201 investigate whether or not the data to be replicated from the master storage system to the replica storage system is deduplicated and stored in the storage device of the replica storage system. Transfer the data stored in the replica side storage system without being deduplicated among the data to be replicated from the master side storage system to the replica side storage system,
The progress rate calculation unit 102 is transferred from the master side storage system to the replica side storage system in the entire replication based on the investigation result of the data to be replicated, and is not deduplicated to the storage device of the replica side storage system. Based on the estimated total data amount and the amount of data transferred from the master side storage system to the replica side storage system at a predetermined replication point. To calculate the replication progress rate at the given replication point,
Replication system.

（付記２）
付記１に記載のレプリケーションシステムであって、
前記レプリケーション実行部は、前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムにレプリケーションするデータが、前記マスタ側ストレージシステムの記憶装置に重複排除されて格納されているか否か、及び、前記レプリカ側ストレージシステムの記憶装置に重複排除されて格納されるか否か、に応じて設定された分類のいずれに属するか否かを調査すると共に、当該各分類にそれぞれ属するデータのデータ量を算出し、
前記進捗率算出部は、前記レプリケーションするデータが属する分類毎のデータ量に基づいて、レプリケーション全体において前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムに転送されて当該レプリカ側ストレージシステムの記憶装置に重複排除されずに新たに格納されるデータのデータ総量を推定する、
レプリケーションシステム。 (Appendix 2)
The replication system according to attachment 1, wherein
The replication execution unit determines whether data to be replicated from the master storage system to the replica storage system is deduplicated and stored in a storage device of the master storage system, and the replica storage system Whether or not it is deduplicated and stored in the storage device and whether it belongs to any of the classifications set according to, and calculates the data amount of the data belonging to each of the classifications,
The progress rate calculation unit is transferred from the master side storage system to the replica side storage system in the entire replication based on the data amount for each classification to which the data to be replicated belongs, and is duplicated in the storage device of the replica side storage system. Estimate the total amount of data newly stored without being excluded,
Replication system.

（付記３）
付記２に記載のレプリケーションシステムであって、
前記レプリケーション実行部は、前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムにレプリケーションするデータが、前記マスタ側ストレージシステムの記憶装置に対して重複排除されずに格納されているが前記レプリカ側ストレージシステムの記憶装置に重複排除されて格納されるものである第一分類（Ｂ）と、前記マスタ側ストレージシステムの記憶装置には重複排除されて格納されているが前記レプリカ側ストレージシステムの記憶装置には重複排除されずに格納されるものである第二分類（Ｃ）と、に属するか否かを少なくとも調査すると共に、前記第一分類及び前記第二分類にそれぞれ属するデータのデータ量を算出する、
レプリケーションシステム。 (Appendix 3)
The replication system according to appendix 2,
The replication execution unit stores data to be replicated from the master storage system to the replica storage system without being deduplicated in the storage device of the master storage system. The first classification (B) that is deduplicated and stored in the storage device, and the deduplication is stored in the storage device of the master storage system, but the storage device of the replica storage system And at least investigating whether it belongs to the second classification (C) that is stored without being deduplicated, and calculating the amount of data belonging to each of the first classification and the second classification,
Replication system.

（付記４）
付記２に記載のレプリケーションシステムであって、
前記レプリケーション実行部は、前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムにレプリケーションするデータが、前記マスタ側ストレージシステムの記憶装置に重複排除されずに格納されているが前記レプリカ側ストレージシステムの記憶装置に重複排除されて格納されるものである第一分類（Ｂ）と、前記マスタ側ストレージシステムの記憶装置には重複排除されて格納されているが前記レプリカ側ストレージシステムの記憶装置には重複排除されずに格納されるものである第二分類（Ｃ）と、前記マスタ側ストレージシステムの記憶装置に重複排除されて格納されており前記レプリカ側ストレージシステムの記憶装置に重複排除されて格納されるものである第三分類（Ａ）と、前記マスタ側ストレージシステムの記憶装置に重複排除されずに格納されており前記レプリカ側ストレージシステムの記憶装置に重複排除されずに格納されるものである第四分類（Ｄ）と、に属するか否かを調べると共に、前記各分類にそれぞれ属するデータのデータ量を算出する、
レプリケーションシステム。 (Appendix 4)
The replication system according to appendix 2,
The replication execution unit stores the data to be replicated from the master storage system to the replica storage system in the storage device of the master storage system without being deduplicated, but the storage device of the replica storage system Is deduplicated and stored in the storage device of the master-side storage system, but is deduplicated and stored in the storage device of the replica-side storage system. Is stored without being deduplicated and stored in the storage device of the master-side storage system, and deduplicated and stored in the storage device of the replica-side storage system. The third classification (A), and the master storage system And whether it belongs to the fourth classification (D) that is stored in the storage device of the replica-side storage system and not stored in the storage device of the replica-side storage system. Calculating the amount of data belonging to each of the classifications;
Replication system.

（付記５）
付記３又は４に記載のレプリケーションシステムであって、
前記レプリケーション実行部は、所定のレプリケーション時点において調査したレプリケーションするデータのデータ量を、前記分類毎に算出し、
前記進捗率算出部は、所定のレプリケーション時点において調べた前記第一分類及び前記第二分類のデータ量に基づいて、レプリケーション全体において前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムに転送されて当該レプリカ側ストレージシステムの記憶装置に重複排除されずに新たに格納されるデータのデータ総量を推定する、
レプリケーションシステム。 (Appendix 5)
The replication system according to appendix 3 or 4,
The replication execution unit calculates the amount of data to be replicated investigated at a predetermined replication point for each classification,
The progress rate calculation unit is transferred from the master side storage system to the replica side storage system in the whole replication based on the data amount of the first classification and the second classification examined at a predetermined replication time point, and the replica Estimate the total amount of data that is newly stored in the storage device of the storage system without being deduplicated,
Replication system.

（付記６）
付記５に記載のレプリケーションシステムであって、
前記進捗率算出部は、前記マスタ側ストレージシステムの記憶装置に重複排除されずに格納されているデータのデータ量に対して、前記第一分類のデータ量を減算すると共に前記第二分類のデータ量を加算した値を、レプリケーション全体において前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムに転送されて当該レプリカ側ストレージシステムの記憶装置に重複排除されずに新たに格納されるデータのデータ総量として推定し、所定のレプリケーション時点において前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムに転送されたデータの転送量を、前記推定したデータ総量で除算することで、前記進捗率を算出する、
レプリケーションシステム。 (Appendix 6)
The replication system according to appendix 5,
The progress rate calculation unit subtracts the data amount of the first classification from the data amount of data stored in the storage device of the master side storage system without being deduplicated, and the data of the second classification Estimated as the total amount of data transferred from the master storage system to the replica storage system and newly stored in the storage device of the replica storage system without deduplication The progress rate is calculated by dividing the transfer amount of the data transferred from the master storage system to the replica storage system at a predetermined replication time by the estimated total data amount.
Replication system.

（付記７）
付記１乃至６のいずれかに記載のレプリケーションシステムであって、
前記レプリケーション実行部は、レプリケーションするデータの調査を、当該データの内容に基づく当該データの特徴量を表すデータ特徴量情報に基づいて行う、
レプリケーションシステム。 (Appendix 7)
The replication system according to any one of appendices 1 to 6,
The replication execution unit performs an investigation of data to be replicated based on data feature amount information representing a feature amount of the data based on the content of the data,
Replication system.

（付記８）
それぞれ記憶装置を備えたマスタ側ストレージシステム及びレプリカ側ストレージシステムが、データを記憶装置に格納すると共に、当該記憶装置に既に記憶されているデータと同一のデータ内容の他のデータを記憶装置に格納する場合に、当該記憶装置に既に記憶されているデータを他のデータとして参照させることで重複記憶を排除する機能を備えている場合に、前記マスタ側ストレージシステムと前記レプリカ側ストレージシステムとの間におけるレプリケーションを制御する情報処理装置に、
前記マスタ側ストレージシステムの記憶装置に記憶されているデータを、前記レプリカ側ストレージシステムの記憶装置にレプリケーションするレプリケーション実行部と、
所定のレプリケーション時点における当該レプリケーションの進捗率を算出する進捗率算出部と、を実現させると共に、
前記レプリケーション実行部は、前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムにレプリケーションするデータが、前記レプリカ側ストレージシステムの記憶装置に重複排除されて格納されるか否かを調査すると共に、当該レプリケーションするデータのうち前記レプリカ側ストレージシステムに重複排除されずに格納されるデータを前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムに転送し、
前記進捗率算出部は、前記レプリケーションするデータの調査結果に基づいて、レプリケーション全体において前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムに転送されて当該レプリカ側ストレージシステムの記憶装置に重複排除されずに新たに格納されるデータのデータ総量を推定すると共に、当該推定したデータ総量と、所定のレプリケーション時点において前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムに転送されたデータのデータ量と、に基づいて、当該所定のレプリケーション時点におけるレプリケーションの進捗率を算出する、
ことを実現させるためのプログラム。 (Appendix 8)
The master-side storage system and replica-side storage system each having a storage device store the data in the storage device, and store other data in the storage device with the same data content as the data already stored in the storage device In the case of providing a function for eliminating duplicate storage by referring to data already stored in the storage device as other data, between the master-side storage system and the replica-side storage system. In the information processing device that controls replication in
A replication execution unit that replicates data stored in the storage device of the master-side storage system to the storage device of the replica-side storage system;
A progress rate calculation unit that calculates the progress rate of the replication at a predetermined replication time point, and
The replication execution unit investigates whether or not data to be replicated from the master side storage system to the replica side storage system is deduplicated and stored in the storage device of the replica side storage system, and performs replication. Transfer data stored in the replica storage system without being deduplicated from the master storage system to the replica storage system,
The progress rate calculation unit is transferred from the master side storage system to the replica side storage system in the entire replication based on the investigation result of the data to be replicated and is not deduplicated in the storage device of the replica side storage system. Estimating the total data amount of newly stored data, and based on the estimated total data amount and the data amount of data transferred from the master storage system to the replica storage system at a predetermined replication point , Calculate the replication progress rate at the given replication point,
A program to make things happen.

（付記８−２）
付記８に記載のプログラムであって、
前記レプリケーション実行部は、前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムにレプリケーションするデータが、前記マスタ側ストレージシステムの記憶装置に重複排除されて格納されているか否か、及び、前記レプリカ側ストレージシステムの記憶装置に重複排除されて格納されるか否か、に応じて設定された分類のいずれに属するか否かを調査すると共に、当該各分類にそれぞれ属するデータのデータ量を算出し、
前記進捗率算出部は、前記レプリケーションするデータが属する分類毎のデータ量に基づいて、レプリケーション全体において前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムに転送されて当該レプリカ側ストレージシステムの記憶装置に重複排除されずに新たに格納されるデータのデータ総量を推定する、
プログラム。 (Appendix 8-2)
The program according to attachment 8, wherein
The replication execution unit determines whether data to be replicated from the master storage system to the replica storage system is deduplicated and stored in a storage device of the master storage system, and the replica storage system Whether or not it is deduplicated and stored in the storage device and whether it belongs to any of the classifications set according to, and calculates the data amount of the data belonging to each of the classifications,
The progress rate calculation unit is transferred from the master side storage system to the replica side storage system in the entire replication based on the data amount for each classification to which the data to be replicated belongs, and is duplicated in the storage device of the replica side storage system. Estimate the total amount of data newly stored without being excluded,
program.

（付記９）
それぞれ記憶装置を備えたマスタ側ストレージシステム及びレプリカ側ストレージシステムが、データを記憶装置に格納すると共に、当該記憶装置に既に記憶されているデータと同一のデータ内容の他のデータを記憶装置に格納する場合に、当該記憶装置に既に記憶されているデータを他のデータとして参照させることで重複記憶を排除する機能を備えており、前記マスタ側ストレージシステムと前記レプリカ側ストレージシステムとの間におけるレプリケーション方法であって、
前記マスタ側ストレージシステムの記憶装置に記憶されているデータを、前記レプリカ側ストレージシステムの記憶装置にレプリケーションするレプリケーション実行工程と、
所定のレプリケーション時点における当該レプリケーションの進捗率を算出する進捗率算出工程と、を有し、
前記レプリケーション実行工程は、前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムにレプリケーションするデータが、前記レプリカ側ストレージシステムの記憶装置に重複排除されて格納されるか否かを調査すると共に、当該レプリケーションするデータのうち前記レプリカ側ストレージシステムに重複排除されずに格納されるデータを前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムに転送し、
前記進捗率算出工程は、前記レプリケーションするデータの調査結果に基づいて、レプリケーション全体において前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムに転送されて当該レプリカ側ストレージシステムの記憶装置に重複排除されずに新たに格納されるデータのデータ総量を推定すると共に、当該推定したデータ総量と、所定のレプリケーション時点において前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムに転送されたデータのデータ量と、に基づいて、当該所定のレプリケーション時点におけるレプリケーションの進捗率を算出する、
レプリケーション方法。 (Appendix 9)
The master-side storage system and replica-side storage system each having a storage device store the data in the storage device, and store other data in the storage device with the same data content as the data already stored in the storage device Replication function between the master side storage system and the replica side storage system, which has a function of eliminating duplicate storage by referring to data already stored in the storage device as other data. A method,
A replication execution step of replicating the data stored in the storage device of the master side storage system to the storage device of the replica side storage system;
A progress rate calculating step for calculating the progress rate of the replication at a predetermined replication point;
The replication execution step investigates whether data to be replicated from the master side storage system to the replica side storage system is stored in the storage device of the replica side storage system in a deduplicated manner and performs the replication. Transfer data stored in the replica storage system without being deduplicated from the master storage system to the replica storage system,
The progress rate calculation step is based on the investigation result of the data to be replicated, and is transferred from the master side storage system to the replica side storage system in the entire replication and is not deduplicated in the storage device of the replica side storage system. Estimating the total data amount of newly stored data, and based on the estimated total data amount and the data amount of data transferred from the master storage system to the replica storage system at a predetermined replication point , Calculate the replication progress rate at the given replication point,
Replication method.

（付記１０）
付記９に記載のレプリケーション方法であって、
前記レプリケーション実行工程は、前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムにレプリケーションするデータが、前記マスタ側ストレージシステムの記憶装置に重複排除されて格納されているか否か、及び、前記レプリカ側ストレージシステムの記憶装置に重複排除されて格納されるか否か、に応じて設定された分類のいずれに属するか否かを調査すると共に、当該各分類にそれぞれ属するデータのデータ量を算出し、
前記進捗率算出工程は、前記レプリケーションするデータが属する分類毎のデータ量に基づいて、レプリケーション全体において前記マスタ側ストレージシステムから前記レプリカ側ストレージシステムに転送されて当該レプリカ側ストレージシステムの記憶装置に重複排除されずに新たに格納されるデータのデータ総量を推定する、
レプリケーション方法。 (Appendix 10)
The replication method according to appendix 9, wherein
In the replication execution step, whether the data to be replicated from the master side storage system to the replica side storage system is deduplicated and stored in the storage device of the master side storage system, and the replica side storage system Whether or not it is deduplicated and stored in the storage device and whether it belongs to any of the classifications set according to, and calculates the data amount of the data belonging to each of the classifications,
The progress rate calculating step is based on the data amount for each classification to which the data to be replicated belongs, and is transferred from the master side storage system to the replica side storage system in the entire replication and overlapped with the storage device of the replica side storage system. Estimate the total amount of data newly stored without being excluded,
Replication method.

なお、上述したプログラムは、記憶装置に記憶されていたり、コンピュータが読み取り可能な記録媒体に記録されている。例えば、記録媒体は、フレキシブルディスク、光ディスク、光磁気ディスク、及び、半導体メモリ等の可搬性を有する媒体である。 Note that the above-described program is stored in a storage device or recorded on a computer-readable recording medium. For example, the recording medium is a portable medium such as a flexible disk, an optical disk, a magneto-optical disk, and a semiconductor memory.

以上、上記実施形態等を参照して本願発明を説明したが、本願発明は、上述した実施形態に限定されるものではない。本願発明の構成や詳細には、本願発明の範囲内で当業者が理解しうる様々な変更をすることができる。 Although the present invention has been described with reference to the above-described embodiment and the like, the present invention is not limited to the above-described embodiment. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

１ストレージシステム
２アクセラレータノード
３ストレージノード
４バックアップシステム
１０マスタシステム
１１記憶装置
１２マスタ側レプリケーション実行部
１３データ分類情報管理部
１４ファイルシステムビュー提供部
１５進捗情報閲覧部
２０レプリカシステム
２１記憶装置
２２レプリカ側レプリケーション実行部
１００マスタ側ストレージシステム
１０１，２０１レプリケーション実行部
１０２進捗率算出部
１１０，２１０記憶装置
２００レプリカ側ストレージシステム
DESCRIPTION OF SYMBOLS 1 Storage system 2 Accelerator node 3 Storage node 4 Backup system 10 Master system 11 Storage device 12 Master side replication execution part 13 Data classification information management part 14 File system view provision part 15 Progress information browsing part 20 Replica system 21 Storage apparatus 22 Replica side Replication execution unit 100 Master side storage systems 101 and 201 Replication execution unit 102 Progress rate calculation units 110 and 210 Storage device 200 Replica side storage system

Claims

The master-side storage system and replica-side storage system each having a storage device store the data in the storage device, and store other data in the storage device with the same data content as the data already stored in the storage device In this case, it has a function of eliminating duplicate storage by referring to data already stored in the storage device as other data,
A replication execution unit that replicates data stored in the storage device of the master-side storage system to the storage device of the replica-side storage system;
A progress rate calculation unit that calculates the progress rate of the replication at a predetermined replication point;
The replication execution unit stores the data to be replicated from the master storage system to the replica storage system in the storage device of the master storage system without being deduplicated, but the storage device of the replica storage system Is stored in the storage device of the master-side storage system, but is not deduplicated in the storage device of the replica-side storage system. A second classification that is stored, and a third classification that is deduplicated and stored in the storage device of the master-side storage system and is deduplicated and stored in the storage device of the replica-side storage system And duplicated in the storage device of the master storage system And whether the data belongs to the fourth classification that is stored without being deduplicated and stored in the storage device of the replica-side storage system, and the amount of data belonging to each classification Further, the data stored without being deduplicated in the replica-side storage system among the data to be replicated is transferred from the master-side storage system to the replica-side storage system,
The progress rate calculation unit is transferred from the master side storage system to the replica side storage system in the entire replication based on the data amount for each classification to which the data to be replicated belongs, and is duplicated in the storage device of the replica side storage system. Estimating the total amount of data newly stored without being excluded, the estimated total amount of data, and the amount of data transferred from the master storage system to the replica storage system at a predetermined replication point Based on, the replication progress rate at the predetermined replication point is calculated ,
further,
The replication execution unit calculates the amount of data to be replicated investigated at a predetermined replication point for each classification,
The progress rate calculation unit is configured to store the data stored in the storage device of the master storage system without deduplication based on the data amounts of the first classification and the second classification examined at a predetermined replication time point. A value obtained by subtracting the data amount of the first classification and adding the data amount of the second classification from the data amount is transferred from the master side storage system to the replica side storage system in the entire replication, and the replica Estimated as the total amount of data that is newly stored in the storage device of the side storage system without being deduplicated, and the transfer amount of data transferred from the master side storage system to the replica side storage system at a predetermined replication point By dividing by the estimated total data amount Calculating the progress rate,
Replication system.

The master-side storage system and replica-side storage system each having a storage device store the data in the storage device, and store other data in the storage device with the same data content as the data already stored in the storage device In this case, it has a function of eliminating duplicate storage by referring to data already stored in the storage device as other data,
A replication execution unit that replicates data stored in the storage device of the master-side storage system to the storage device of the replica-side storage system;
A progress rate calculation unit that calculates the progress rate of the replication at a predetermined replication point;
The replication execution unit stores data to be replicated from the master storage system to the replica storage system without being deduplicated in the storage device of the master storage system. The first classification, which is deduplicated and stored in the storage device, is deduplicated and stored in the storage device of the master storage system, but is deduplicated in the storage device of the replica storage system The at least one of the second classifications stored in the database, and at least investigating whether the data belongs to each of the first classification and the second classification, and calculating the amount of data belonging to each of the first classification and the second classification. The amount of data to be Calculated, further, it transfers the data stored without being deduplication on the replica side storage system among the data to be the replication from the master side storage system to the replica side storage system,
The progress rate calculation unit is configured to store the data stored in the storage device of the master storage system without deduplication based on the data amounts of the first classification and the second classification examined at a predetermined replication time point. A value obtained by subtracting the data amount of the first classification and adding the data amount of the second classification from the data amount is transferred from the master side storage system to the replica side storage system in the entire replication, and the replica Estimated as the total amount of data that is newly stored in the storage device of the side storage system without being deduplicated, and the transfer amount of data transferred from the master side storage system to the replica side storage system at a predetermined replication point By dividing by the estimated total data amount Calculating the progress rate,
Replication system.

The replication system according to claim 1 or 2 ,
The replication execution unit performs an investigation of data to be replicated based on data feature amount information representing a feature amount of the data based on the content of the data,
Replication system.

The master-side storage system and replica-side storage system each having a storage device store the data in the storage device, and store other data in the storage device with the same data content as the data already stored in the storage device In the case of providing a function for eliminating duplicate storage by referring to data already stored in the storage device as other data, between the master-side storage system and the replica-side storage system. In the information processing device that controls replication in
A replication execution unit that replicates data stored in the storage device of the master-side storage system to the storage device of the replica-side storage system;
A progress rate calculation unit that calculates the progress rate of the replication at a predetermined replication time point, and
The replication execution unit stores the data to be replicated from the master storage system to the replica storage system in the storage device of the master storage system without being deduplicated, but the storage device of the replica storage system Is stored in the storage device of the master-side storage system, but is not deduplicated in the storage device of the replica-side storage system. A second classification that is stored, and a third classification that is deduplicated and stored in the storage device of the master-side storage system and is deduplicated and stored in the storage device of the replica-side storage system And duplicated in the storage device of the master storage system And whether the data belongs to the fourth classification that is stored without being deduplicated and stored in the storage device of the replica-side storage system, and the amount of data belonging to each classification Further, the data stored without being deduplicated in the replica-side storage system among the data to be replicated is transferred from the master-side storage system to the replica-side storage system,
The progress rate calculation unit is transferred from the master side storage system to the replica side storage system in the entire replication based on the data amount for each classification to which the data to be replicated belongs, and is duplicated in the storage device of the replica side storage system. Estimating the total amount of data newly stored without being excluded, the estimated total amount of data, and the amount of data transferred from the master storage system to the replica storage system at a predetermined replication point Based on, the replication progress rate at the predetermined replication point is calculated ,
further,
The replication execution unit calculates the amount of data to be replicated investigated at a predetermined replication point for each classification,
The progress rate calculation unit is configured to store the data stored in the storage device of the master storage system without deduplication based on the data amounts of the first classification and the second classification examined at a predetermined replication time point. A value obtained by subtracting the data amount of the first classification and adding the data amount of the second classification from the data amount is transferred from the master side storage system to the replica side storage system in the entire replication, and the replica Estimated as the total amount of data that is newly stored in the storage device of the side storage system without being deduplicated, and the transfer amount of data transferred from the master side storage system to the replica side storage system at a predetermined replication point By dividing by the estimated total data amount Calculating the progress rate,
A program to make things happen.

The master-side storage system and replica-side storage system each having a storage device store the data in the storage device, and store other data in the storage device with the same data content as the data already stored in the storage device In the case of providing a function for eliminating duplicate storage by referring to data already stored in the storage device as other data, between the master-side storage system and the replica-side storage system. In the information processing device that controls replication in
A replication execution unit that replicates data stored in the storage device of the master-side storage system to the storage device of the replica-side storage system;
A progress rate calculation unit that calculates the progress rate of the replication at a predetermined replication time point, and
The replication execution unit stores data to be replicated from the master storage system to the replica storage system without being deduplicated in the storage device of the master storage system. The first classification, which is deduplicated and stored in the storage device, is deduplicated and stored in the storage device of the master storage system, but is deduplicated in the storage device of the replica storage system The at least one of the second classifications stored in the database, and at least investigating whether the data belongs to each of the first classification and the second classification, and calculating the amount of data belonging to each of the first classification and the second classification. The amount of data to be Calculated, further, it transfers the data stored without being deduplication on the replica side storage system among the data to be the replication from the master side storage system to the replica side storage system,
The progress rate calculation unit is configured to store the data stored in the storage device of the master storage system without deduplication based on the data amounts of the first classification and the second classification examined at a predetermined replication time point. A value obtained by subtracting the data amount of the first classification and adding the data amount of the second classification from the data amount is transferred from the master side storage system to the replica side storage system in the entire replication, and the replica Estimated as the total amount of data that is newly stored in the storage device of the side storage system without being deduplicated, and the transfer amount of data transferred from the master side storage system to the replica side storage system at a predetermined replication point By dividing by the estimated total data amount Calculating the progress rate,
A program to make things happen.

The master-side storage system and replica-side storage system each having a storage device store the data in the storage device, and store other data in the storage device with the same data content as the data already stored in the storage device Replication function between the master side storage system and the replica side storage system, which has a function of eliminating duplicate storage by referring to data already stored in the storage device as other data. A method,
A replication execution step of replicating the data stored in the storage device of the master side storage system to the storage device of the replica side storage system;
A progress rate calculating step for calculating the progress rate of the replication at a predetermined replication point;
In the replication execution step, data to be replicated from the master storage system to the replica storage system is stored in the storage device of the master storage system without being deduplicated, but the storage device of the replica storage system Is stored in the storage device of the master-side storage system, but is not deduplicated in the storage device of the replica-side storage system. A second classification that is stored, and a third classification that is deduplicated and stored in the storage device of the master-side storage system and is deduplicated and stored in the storage device of the replica-side storage system To the storage device of the master storage system. And whether the data belongs to the fourth category that is stored without being excluded and stored in the storage device of the replica-side storage system without being deduplicated, and the data amount of the data belonging to each category. Further, the data stored without being deduplicated in the replica-side storage system among the data to be replicated is transferred from the master-side storage system to the replica-side storage system,
The progress rate calculating step is based on the data amount for each classification to which the data to be replicated belongs, and is transferred from the master side storage system to the replica side storage system in the entire replication and overlapped with the storage device of the replica side storage system. Estimating the total amount of data newly stored without being excluded, the estimated total amount of data, and the amount of data transferred from the master storage system to the replica storage system at a predetermined replication point Based on, the replication progress rate at the predetermined replication point is calculated ,
further,
The replication execution step calculates the amount of data to be replicated investigated at a predetermined replication point for each classification,
In the progress rate calculation step, the data stored in the storage device of the master storage system without being deduplicated based on the data amounts of the first classification and the second classification examined at a predetermined replication time point. A value obtained by subtracting the data amount of the first classification and adding the data amount of the second classification from the data amount is transferred from the master side storage system to the replica side storage system in the entire replication, and the replica Estimated as the total amount of data that is newly stored in the storage device of the side storage system without being deduplicated, and the transfer amount of data transferred from the master side storage system to the replica side storage system at a predetermined replication point Divide by the estimated total amount of data , To calculate the progress rate,
Replication method.

The master-side storage system and replica-side storage system each having a storage device store the data in the storage device, and store other data in the storage device with the same data content as the data already stored in the storage device Replication function between the master side storage system and the replica side storage system, which has a function of eliminating duplicate storage by referring to data already stored in the storage device as other data. A method,
A replication execution step of replicating the data stored in the storage device of the master side storage system to the storage device of the replica side storage system;
A progress rate calculating step for calculating the progress rate of the replication at a predetermined replication point;
In the replication execution step, data to be replicated from the master storage system to the replica storage system is stored without being deduplicated in the storage device of the master storage system. The first classification, which is deduplicated and stored in the storage device, is deduplicated and stored in the storage device of the master storage system, but is deduplicated in the storage device of the replica storage system The at least one of the second classifications stored in the database, and at least investigating whether the data belongs to each of the first classification and the second classification, and calculating the amount of data belonging to each of the first classification and the second classification. The amount of data to be Calculated, further, it transfers the data stored in the data that the replication without being deduplication on the replica side storage system from the master side storage system to the replica side storage system,
In the progress rate calculation step, the data stored in the storage device of the master storage system without being deduplicated based on the data amounts of the first classification and the second classification examined at a predetermined replication time point. A value obtained by subtracting the data amount of the first classification and adding the data amount of the second classification from the data amount is transferred from the master side storage system to the replica side storage system in the entire replication, and the replica Estimated as the total amount of data that is newly stored in the storage device of the side storage system without being deduplicated, and the transfer amount of data transferred from the master side storage system to the replica side storage system at a predetermined replication point Divide by the estimated total amount of data , To calculate the progress rate,
Replication method.