JP2021179717A

JP2021179717A - File server, deduplication system, processing method, and program

Info

Publication number: JP2021179717A
Application number: JP2020083836A
Authority: JP
Inventors: 聡横手; Satoshi Yokote
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2020-05-12
Filing date: 2020-05-12
Publication date: 2021-11-18

Abstract

To provide a file server for preventing the utilization efficiency of deduplication storage from decreasing.SOLUTION: A file server is configured: to generate a hash value for each of a plurality of blocks into which a candidate file for archiving to deduplication storage is divided, and generates a local hash table containing the hash values and a number of duplicates indicating the number of times the same hash value is generated for each hash values; to transmit the local hash table to a duplicate count total server; to receive from the duplicate count total server. a global hash table containing the hash values and the number of duplicates updated based on whether each of the plurality of blocks exists in the deduplication storage; and to calculate the archiving priority of the files based on the number of duplicates, the size of the file, and the block size corresponding to the hash value, wherein the number of duplicates corresponds to the hash value contained in the global hash table and the hash value in the local hash table updated based on the number of duplicates.SELECTED DRAWING: Figure 2

Description

本発明は、ファイルサーバ、重複排除システム、処理方法、プログラムに関する。 The present invention relates to a file server, a deduplication system, a processing method, and a program.

特許文献１には、複数のファイル断片のハッシュ値を算出し、相互のハッシュ値を比較する技術がある。特許文献２には、重複排除が実行された後、排除されたデータの割合を計算する技術がある。ここで、重複排除とは、内容が同じデータが複数存在する場合に、１つのデータを残して、他の同じ内容のデータを消去する技術である。特許文献３には、同一のハッシュ値を持つブロックの数をカウントする技術がある。 Patent Document 1 has a technique for calculating hash values of a plurality of file fragments and comparing the hash values with each other. Patent Document 2 has a technique for calculating the percentage of data excluded after deduplication has been performed. Here, deduplication is a technique for erasing other data having the same content while leaving one data when a plurality of data having the same content exist. Patent Document 3 has a technique for counting the number of blocks having the same hash value.

特開２０１１−０６５３１４号公報Japanese Unexamined Patent Publication No. 2011-06514 特開２０１８−０５５３９４号公報Japanese Unexamined Patent Publication No. 2018-055394 国際公開第２０１６／０３８７１４号International Publication No. 2016/038714

ファイルサーバ中に、アクセス頻度が低い等の属性をもつファイルが増加すると、ファイルサーバのストレージ使用効率が低下する。このような場合に、ファイルサーバは、ティアリング技術等を使用して、アクセス頻度が低い等の属性をもつファイルを、長期保存に適したストレージに移動する。移動先のストレージが、重複排除ストレージである場合がある。このとき、関連するティアリング技術では、重複排除ストレージにファイルが送信され、当該ファイルと重複するデータが当該重複排除ストレージに存在するか否かが確認されるまで、当該ファイルが重複排除の効果が大きいファイルであるか否かを測定できなかった。ここで、重複排除の効果が大きいファイルとは、ファイルの多くの部分が重複排除ストレージに記憶されており、重複排除が実行されると、ファイルの多くの部分が消去されるファイルを示す。逆に、重複排除の効果が小さいファイルとは、ファイルの多くの部分が重複排除ストレージに記憶されておらず、重複排除が実行されても、ファイルの多くの部分が消去されないファイルを示す。重複排除ストレージに送信されたファイルが重複排除の効果が小さいファイルである場合、重複排除により消去されないデータが多くなり、重複排除ストレージの使用済領域が急速に増加し、重複排除ストレージの使用効率が低下するという問題があった。 If the number of files with attributes such as low access frequency increases in the file server, the storage usage efficiency of the file server decreases. In such a case, the file server uses tearing technology or the like to move a file having attributes such as low access frequency to a storage suitable for long-term storage. The destination storage may be deduplication storage. At this time, in the related tiering technique, the file is transmitted to the deduplication storage, and the file is highly effective in deduplication until it is confirmed whether or not the data overlapping with the file exists in the deduplication storage. It was not possible to measure whether it was a file or not. Here, a file having a large deduplication effect means a file in which many parts of the file are stored in the deduplication storage, and when deduplication is executed, many parts of the file are erased. Conversely, a file with a small deduplication effect means a file in which many parts of the file are not stored in the deduplication storage and many parts of the file are not erased even if deduplication is performed. If the file sent to the deduplication storage is a file with a small deduplication effect, more data will not be erased by deduplication, the used space of the deduplication storage will increase rapidly, and the deduplication storage usage efficiency will increase. There was the problem of lowering.

そこでこの発明は、上述の課題を解決するファイルサーバ、重複排除システム、処理方法、プログラムを提供することを目的としている。 Therefore, an object of the present invention is to provide a file server, a deduplication system, a processing method, and a program that solve the above-mentioned problems.

本発明の第１の態様によれば、ファイルサーバは、重複排除ストレージへのアーカイブ候補のファイルを複数に分割したブロックのそれぞれについてハッシュ値を生成し、前記ハッシュ値と、前記ハッシュ値ごとに同一のハッシュ値が生成された回数を示す重複回数とを含むローカルハッシュテーブルを生成するテーブル管理部と、前記ローカルハッシュテーブルを重複回数集計サーバに送信する送信部と、前記重複回数集計サーバから、ハッシュ値と、前記重複排除ストレージに前記複数のブロックのそれぞれが存在するか否かに基づいて更新された重複回数とを含むグローバルハッシュテーブルを受信する受信部と、前記グローバルハッシュテーブルに含まれるハッシュ値及び重複回数に基づいて更新されたローカルハッシュテーブルのハッシュ値に対応する重複回数と、前記ファイルのサイズと、前記ハッシュ値に対応するブロックサイズとに基づいて、前記ファイルのアーカイブ優先度を計算する優先度計算部と、を備える。 According to the first aspect of the present invention, the file server generates a hash value for each of the blocks obtained by dividing the archive candidate file to the deduplication storage into a plurality of blocks, and the hash value is the same as the hash value for each block. Hash from the table management unit that generates a local hash table including the number of duplicates indicating the number of times the hash value of is generated, the transmission unit that sends the local hash table to the duplicate count total server, and the duplicate count total server. A receiver that receives a global hash table containing the value and the number of duplicates updated based on the presence or absence of each of the plurality of blocks in the deduplication storage, and the hash value contained in the global hash table. And the archive priority of the file is calculated based on the number of duplicates corresponding to the hash value of the local hash table updated based on the number of duplicates, the size of the file, and the block size corresponding to the hash value. It is equipped with a priority calculation unit.

本発明の第２の態様によれば、重複排除システムは、１つ以上のファイルサーバと、重複回数集計サーバとを備え、前記１つ以上のファイルサーバは、重複排除ストレージへのアーカイブ候補のファイルを複数に分割したブロックのそれぞれについてハッシュ値を生成し、前記ハッシュ値と、前記ハッシュ値ごとに同一のハッシュ値が生成された回数を示す重複回数とを含むローカルハッシュテーブルを生成し、前記ローカルハッシュテーブルを前記重複回数集計サーバに送信し、前記重複回数集計サーバは、１つ以上のローカルハッシュテーブルを集計することによって、グローバルハッシュテーブルを生成し、前記グローバルハッシュテーブルに含まれるハッシュ値を使用して前記ハッシュ値に対応するブロックが前記重複排除ストレージに存在するか問い合わせ、前記グローバルハッシュテーブルのうち前記重複排除ストレージに存在するブロックに対応する重複回数を更新し、更新されたグローバルハッシュテーブルを前記１つ以上のファイルサーバに送信し、前記１つ以上のファイルサーバは、前記更新されたグローバルハッシュテーブルに含まれるハッシュ値及び重複回数に基づいてローカルハッシュテーブルを更新し、更新されたローカルハッシュテーブルのハッシュ値に対応する重複回数と、前記ファイルのサイズと、前記ハッシュ値に対応するブロックサイズとに基づいて、前記ファイルのアーカイブ優先度を計算する。 According to the second aspect of the present invention, the deduplication system includes one or more file servers and a duplication count counting server, wherein the one or more file servers are archive candidate files to the deduplication storage. A hash value is generated for each of the blocks divided into a plurality of blocks, and a local hash table including the hash value and the number of duplicates indicating the number of times the same hash value is generated for each hash value is generated. The hash table is sent to the duplicate count aggregation server, and the duplicate count aggregation server generates a global hash table by aggregating one or more local hash tables, and uses the hash value included in the global hash table. Then, it inquires whether the block corresponding to the hash value exists in the deduplication storage, updates the number of duplications corresponding to the block existing in the deduplication storage in the global hash table, and updates the global hash table. Sending to the one or more file servers, the one or more file servers update the local hash table based on the hash value and the number of duplicates contained in the updated global hash table, and the updated local hash. The archive priority of the file is calculated based on the number of duplications corresponding to the hash value of the table, the size of the file, and the block size corresponding to the hash value.

本発明の第３の態様によれば、ファイルサーバの処理方法は、ファイルサーバのテーブル管理部が、重複排除ストレージへのアーカイブ候補のファイルを複数に分割したブロックのそれぞれについてハッシュ値を生成し、前記ハッシュ値と、前記ハッシュ値ごとに同一のハッシュ値が生成された回数を示す重複回数とを含むローカルハッシュテーブルを生成し、前記ファイルサーバの送信部が、前記ローカルハッシュテーブルを重複回数集計サーバに送信し、前記ファイルサーバの受信部が、前記重複回数集計サーバから、ハッシュ値と、前記重複排除ストレージに前記複数のブロックのそれぞれが存在するか否かに基づいて更新された重複回数とを含むグローバルハッシュテーブルを受信し、前記ファイルサーバの優先度計算部が、前記グローバルハッシュテーブルに含まれるハッシュ値及び重複回数に基づいて更新されたローカルハッシュテーブルのハッシュ値に対応する重複回数と、前記ファイルのサイズと、前記ハッシュ値に対応するブロックサイズとに基づいて、前記ファイルのアーカイブ優先度を計算する。 According to the third aspect of the present invention, in the processing method of the file server, the table management unit of the file server generates a hash value for each of the blocks in which the archive candidate file for the deduplication storage is divided into a plurality of blocks. A local hash table including the hash value and the number of duplicates indicating the number of times the same hash value is generated for each hash value is generated, and the transmitter of the file server uses the local hash table as the duplicate count totaling server. And the receiver of the file server receives the hash value from the duplicate count totaling server and the duplicate count updated based on whether or not each of the plurality of blocks exists in the deduplication storage. The number of duplications corresponding to the hash value of the local hash table updated based on the hash value included in the global hash table and the number of duplications received by the priority calculation unit of the file server after receiving the global hash table including the above. The archive priority of the file is calculated based on the size of the file and the block size corresponding to the hash value.

本発明の第４の態様によれば、重複排除システムの処理方法は、１つ以上のファイルサーバが、重複排除ストレージへのアーカイブ候補のファイルを複数に分割したブロックのそれぞれについてハッシュ値を生成し、前記ハッシュ値と、前記ハッシュ値ごとに同一のハッシュ値が生成された回数を示す重複回数とを含むローカルハッシュテーブルを生成し、前記ローカルハッシュテーブルを重複回数集計サーバに送信し、前記重複回数集計サーバが、１つ以上のローカルハッシュテーブルを集計することによって、グローバルハッシュテーブルを生成し、前記グローバルハッシュテーブルに含まれるハッシュ値を使用して前記ハッシュ値に対応するブロックが前記重複排除ストレージに存在するか問い合わせ、前記グローバルハッシュテーブルのうち前記重複排除ストレージに存在するブロックに対応する重複回数を更新し、更新されたグローバルハッシュテーブルを前記１つ以上のファイルサーバに送信し、前記１つ以上のファイルサーバは、前記更新されたグローバルハッシュテーブルに含まれるハッシュ値及び重複回数に基づいてローカルハッシュテーブルを更新し、更新されたローカルハッシュテーブルのハッシュ値に対応する重複回数と、前記ファイルのサイズと、前記ハッシュ値に対応するブロックサイズとに基づいて、前記ファイルのアーカイブ優先度を計算する。 According to the fourth aspect of the present invention, in the processing method of the deduplication system, one or more file servers generate a hash value for each of the blocks in which the archive candidate files to the deduplication storage are divided into a plurality of blocks. , A local hash table including the hash value and the number of duplicates indicating the number of times the same hash value is generated for each hash value is generated, and the local hash table is transmitted to the duplicate count totaling server to indicate the number of duplicates. The aggregation server generates a global hash table by aggregating one or more local hash tables, and the hash value contained in the global hash table is used to put the block corresponding to the hash value into the deduplication storage. It queries whether it exists, updates the number of duplicates corresponding to the block existing in the deduplication storage in the global hash table, sends the updated global hash table to the one or more file servers, and sends the updated global hash table to the one or more file servers. The file server updates the local hash table based on the hash value and the number of duplicates contained in the updated global hash table, and the number of duplicates corresponding to the hash value of the updated local hash table and the size of the file. And the block size corresponding to the hash value, the archive priority of the file is calculated.

本発明の第５の態様によれば、プログラムは、ファイルサーバに、重複排除ストレージへのアーカイブ候補のファイルを複数に分割したブロックのそれぞれについてハッシュ値を生成し、前記ハッシュ値と、前記ハッシュ値ごとに同一のハッシュ値が生成された回数を示す重複回数とを含むローカルハッシュテーブルを生成するテーブル管理手段と、前記ローカルハッシュテーブルを重複回数集計サーバに送信する送信手段と、前記重複回数集計サーバから、ハッシュ値と、前記重複排除ストレージに前記複数のブロックのそれぞれが存在するか否かに基づいて更新された重複回数とを含むグローバルハッシュテーブルを受信する受信手段と、前記グローバルハッシュテーブルに含まれるハッシュ値及び重複回数に基づいて更新されたローカルハッシュテーブルのハッシュ値に対応する重複回数と、前記ファイルのサイズと、前記ハッシュ値に対応するブロックサイズとに基づいて、前記ファイルのアーカイブ優先度を計算する優先度計算手段と、を実行させる。 According to the fifth aspect of the present invention, the program generates a hash value for each of the blocks obtained by dividing the archive candidate file to the deduplication storage into a plurality of blocks on the file server, and the hash value and the hash value are generated. A table management means for generating a local hash table including a number of duplicates indicating the number of times the same hash value is generated for each, a transmission means for transmitting the local hash table to the duplicate count total server, and the duplicate count total server. The receiving means for receiving the global hash table including the hash value and the number of duplicates updated based on whether or not each of the plurality of blocks exists in the deduplication storage, and the global hash table include. Archive priority of the file based on the number of duplicates corresponding to the hash value of the local hash table updated based on the hash value and the number of duplicates, the size of the file, and the block size corresponding to the hash value. Priority calculation means to calculate, and to execute.

本発明の第６の態様によれば、プログラムは、重複排除システムに、重複排除ストレージへのアーカイブ候補のファイルを複数に分割したブロックのそれぞれについてハッシュ値を生成し、前記ハッシュ値と、前記ハッシュ値ごとに同一のハッシュ値が生成された回数を示す重複回数とを含むローカルハッシュテーブルを生成し、前記ローカルハッシュテーブルを前記重複回数集計サーバに送信する手段と、１つ以上のローカルハッシュテーブルを集計することによって、グローバルハッシュテーブルを生成し、前記グローバルハッシュテーブルに含まれるハッシュ値を使用して前記ハッシュ値に対応するブロックが前記重複排除ストレージに存在するか問い合わせ、前記グローバルハッシュテーブルのうち前記重複排除ストレージに存在するブロックに対応する重複回数を更新し、更新されたグローバルハッシュテーブルを１つ以上のファイルサーバに送信する手段と、前記更新されたグローバルハッシュテーブルに含まれるハッシュ値及び重複回数に基づいてローカルハッシュテーブルを更新し、更新されたローカルハッシュテーブルのハッシュ値に対応する重複回数と、前記ファイルのサイズと、前記ハッシュ値に対応するブロックサイズとに基づいて、前記ファイルのアーカイブ優先度を計算する手段と、を実行させる。 According to a sixth aspect of the present invention, the program generates a hash value in the deduplication system for each of the blocks obtained by dividing the archive candidate file to the deduplication storage into a plurality of blocks, and the hash value and the hash value are generated. A means for generating a local hash table including a number of duplicates indicating the number of times the same hash value is generated for each value, and sending the local hash table to the duplicate count aggregation server, and one or more local hash tables. By aggregating, a global hash table is generated, and the hash value contained in the global hash table is used to inquire whether a block corresponding to the hash value exists in the deduplication storage, and the global hash table is described. A means of updating the number of duplicates corresponding to a block existing in the deduplication storage and sending the updated global hash table to one or more file servers, and the hash value and the number of duplicates contained in the updated global hash table. The local hash table is updated based on, and the archive priority of the file is based on the number of duplicates corresponding to the hash value of the updated local hash table, the size of the file, and the block size corresponding to the hash value. The means to calculate the degree and to execute.

本発明によれば、重複排除ストレージにファイルが送信される前に、ファイルの優先度付けにより重複排除の効果が大きいファイルが識別される。これにより、重複排除ストレージの使用効率の低下を防ぐという効果が得られる。 According to the present invention, files with a large deduplication effect are identified by file prioritization before the files are sent to the deduplication storage. This has the effect of preventing a decrease in the usage efficiency of the deduplication storage.

本発明の一実施形態に係るファイルサーバの最小構成を示す図である。It is a figure which shows the minimum structure of the file server which concerns on one Embodiment of this invention. 本発明の一実施形態に係る重複排除システムの構成を示す図である。It is a figure which shows the structure of the deduplication system which concerns on one Embodiment of this invention. 本発明の一実施形態に係るローカルハッシュテーブルを示す図である。It is a figure which shows the local hash table which concerns on one Embodiment of this invention. 本発明の一実施形態に係るファイル管理テーブルを示す図である。It is a figure which shows the file management table which concerns on one Embodiment of this invention. 本発明の一実施形態に係るファイルサーバがローカルハッシュテーブルを送信する方法を示す図である。It is a figure which shows the method which the file server which concerns on one Embodiment of this invention sends a local hash table. 本発明の一実施形態に係る重複回数集計サーバの動作方法を示す図である。It is a figure which shows the operation method of the overlap count totaling server which concerns on one Embodiment of this invention. 本発明の一実施形態に係る重複排除ストレージの動作方法を示す図である。It is a figure which shows the operation method of the deduplication storage which concerns on one Embodiment of this invention. 本発明の一実施形態に係るファイルサーバがローカルハッシュテーブルを更新する方法を示す図である。It is a figure which shows the method which the file server which concerns on one Embodiment of this invention updates a local hash table. 本発明の一実施形態に係るファイルサーバが優先度を設定する方法を示す図である。It is a figure which shows the method which the file server which concerns on one Embodiment of this invention sets a priority.

以下、図面を参照して、本発明の一実施形態に係るファイルサーバを説明する。 Hereinafter, a file server according to an embodiment of the present invention will be described with reference to the drawings.

（ファイルサーバの最小構成）
図１は本発明の一実施形態に係るファイルサーバの最小構成を示す図である。本実施形態に係るファイルサーバ１は、少なくともテーブル管理部１１と、送信部１２と、受信部１３と、優先度計算部１４とを備える。テーブル管理部１１は、重複排除ストレージへのアーカイブ候補のファイルを複数に分割した分割ファイル（以下、ブロックという）のそれぞれについてハッシュ値を生成し、前記ハッシュ値と、前記ハッシュ値ごとに同一のハッシュ値が生成された回数を示す重複回数とを含むローカルハッシュテーブルを生成する。送信部１２は、前記ローカルハッシュテーブルを重複回数集計サーバに送信する。受信部１３は、前記重複回数集計サーバから、ハッシュ値と、前記重複排除ストレージに前記複数のブロックのそれぞれが存在するか否かに基づいて更新された重複回数とを含むグローバルハッシュテーブルを受信する。優先度計算部１４は、前記グローバルハッシュテーブルに含まれるハッシュ値及び重複回数に基づいて更新されたローカルハッシュテーブルのハッシュ値に対応する重複回数と、前記ファイルのサイズと、前記ハッシュ値に対応するブロックサイズとに基づいて、前記ファイルのアーカイブ優先度を計算する。 (Minimum configuration of file server)
FIG. 1 is a diagram showing a minimum configuration of a file server according to an embodiment of the present invention. The file server 1 according to the present embodiment includes at least a table management unit 11, a transmission unit 12, a reception unit 13, and a priority calculation unit 14. The table management unit 11 generates a hash value for each of the divided files (hereinafter referred to as blocks) obtained by dividing the archive candidate file to the deduplication storage into a plurality of files, and the hash value and the same hash for each hash value. Generate a local hash table that contains the number of duplicates that indicate how many times the value was generated. The transmission unit 12 transmits the local hash table to the duplicate count totaling server. The receiving unit 13 receives from the duplicate count totaling server a global hash table including a hash value and an updated duplicate count based on whether or not each of the plurality of blocks exists in the deduplication storage. .. The priority calculation unit 14 corresponds to the number of duplications corresponding to the hash value included in the global hash table and the hash value of the local hash table updated based on the number of duplications, the size of the file, and the hash value. The archive priority of the file is calculated based on the block size.

なお、図１における処理部の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによりファイルサーバの動作が実行されてもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（ＲＡＭ）のように、一定時間プログラムを保持しているものも含むものとする。 The operation of the file server is performed by recording a program for realizing the function of the processing unit in FIG. 1 on a computer-readable recording medium, causing the computer system to read the program recorded on the recording medium, and executing the program. May be executed. The term "computer system" as used herein includes hardware such as an OS and peripheral devices. Further, the "computer-readable recording medium" refers to a portable medium such as a flexible disk, a magneto-optical disk, a ROM, or a CD-ROM, and a storage device such as a hard disk built in a computer system. Furthermore, a "computer-readable recording medium" is a volatile memory (RAM) inside a computer system that serves as a server or client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. In addition, it shall include those that hold the program for a certain period of time.

また、上記プログラムは、前述した機能の一部を実現するためのものであっても良い。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であっても良い。 Further, the above program may be for realizing a part of the above-mentioned functions. Further, a so-called difference file (difference program) may be used, which can realize the above-mentioned function in combination with a program already recorded in the computer system.

（重複排除システムの構成）
図２は、本発明の一実施形態に係る重複排除システムの構成を示す図である。重複排除システムは、複数のファイルサーバ１〜１ｎ、重複回数集計サーバ２、重複排除ストレージ３を備える。重複排除とは、データを重複排除ストレージにアーカイブする際に、対象データを解析し、重複するデータを自動的に検出し、排除する技術である。ファイルサーバ１は、図１の構成に加えて、ティアリングの機能を持つ一般的なファイルサーバが備えるべきファイルシステム１５と、ティアリング部１６と、アーカイブ候補ファイル記憶部１７とを備える。さらに、ファイルサーバ１は、ローカルハッシュテーブル１８と、ファイル管理テーブル１９とを備える。図２では、ファイルサーバ１のみの構成が説明されているが、ファイルサーバ１以外のファイルサーバもファイルサーバ１と同様の構成を有している。 (Configuration of deduplication system)
FIG. 2 is a diagram showing a configuration of a deduplication system according to an embodiment of the present invention. The deduplication system includes a plurality of file servers 1 to 1n, a duplication count totaling server 2, and a deduplication storage 3. Deduplication is a technology that analyzes target data and automatically detects and eliminates duplicate data when archiving data in deduplication storage. In addition to the configuration of FIG. 1, the file server 1 includes a file system 15 that a general file server having a tiering function should have, a tiering unit 16, and an archive candidate file storage unit 17. Further, the file server 1 includes a local hash table 18 and a file management table 19. Although the configuration of only the file server 1 is described in FIG. 2, the file servers other than the file server 1 also have the same configuration as the file server 1.

ファイルシステム１５は、ファイルサーバ１において格納されるデータをファイルとして管理し、外部からの要求に従って公開する。ファイルシステム１５は、例えば、ＮＦＳ（ＮｅｔｗｏｒｋＦｉｌｅＳｙｓｔｅｍ）やＣＩＦＳ（ＣｏｍｍｏｎＩｎｔｅｒｎｅｔＦｉｌｅＳｙｓｔｅｍ）などであってもよい。 The file system 15 manages the data stored in the file server 1 as a file and publishes it according to an external request. The file system 15 may be, for example, NFS (Network File System), CIFS (Common Internet File System), or the like.

ティアリング部１６は、ファイルシステム１５が管理するファイルに対して、アクセス頻度が低い等の観点で優先度付けを行う。ティアリング部１６は、優先度に従って、外部ストレージに移動する候補ファイルを、アーカイブ候補ファイル記憶部１７に記憶する。ティアリング部１６は、ファイルシステム１５が管理するファイルに対して、今後使用されない可能性が高いという観点で優先度付けを行ってもよい。例えば、今後使用されない可能性が高い場合は、ファイルへの最終アクセスから所定の期間を経過した場合、ファイルが所定のフォルダに保管された場合、及びファイルのアクセス権が読み取り専用に変更された場合を含んでもよい。 The tearing unit 16 prioritizes the files managed by the file system 15 from the viewpoint of low access frequency and the like. The tearing unit 16 stores the candidate files to be moved to the external storage in the archive candidate file storage unit 17 according to the priority. The tearing unit 16 may prioritize the files managed by the file system 15 from the viewpoint that they are unlikely to be used in the future. For example, if it is unlikely that it will be used in the future, if a specified period has passed since the last access to the file, if the file is stored in a specified folder, or if the access right of the file is changed to read-only. May include.

テーブル管理部１１は、ファイル分割部１１１と、ハッシュ値生成部１１２とを備える。ファイル分割部１１１は、アーカイブ候補ファイル記憶部１７に記憶されたファイルに対し、ファイルを固定長または可変長のブロックに分割する。さらに、ハッシュ値生成部１１２は、分割したブロックのハッシュ値を計算する。テーブル管理部１１は、計算したすべてのハッシュ値をローカルハッシュテーブル１８に登録する。 The table management unit 11 includes a file division unit 111 and a hash value generation unit 112. The file division unit 111 divides the file into blocks having a fixed length or a variable length with respect to the file stored in the archive candidate file storage unit 17. Further, the hash value generation unit 112 calculates the hash value of the divided blocks. The table management unit 11 registers all the calculated hash values in the local hash table 18.

ファイル分割部１１１及びハッシュ値生成部１１２は、重複排除ストレージ３が重複排除のために備えるべき機能と同等の論理で、ファイル分割を実施し、同一のハッシュ値を算出する。すなわち、分割対象のファイルのデータが同一であるとき、重複排除ストレージ３、または、テーブル管理部１１のどちらで処理されても、ブロック分割の結果と、算出されるハッシュ値は同一となる。 The file division unit 111 and the hash value generation unit 112 perform file division and calculate the same hash value with the same logic as the function that the deduplication storage 3 should have for deduplication. That is, when the data of the files to be divided is the same, the block division result and the calculated hash value are the same regardless of whether the data is processed by the deduplication storage 3 or the table management unit 11.

ローカルハッシュテーブル１８は、図３に示されるように、「ハッシュ値」列と、「重複回数」列とを含む。ローカルハッシュテーブル１８は、ハッシュ値と、当該ハッシュ値が生成された回数を示す重複回数とを一組として登録する。図３に示されるハッシュ値の数は、ｋ個であるが、任意の数であってもよい。ローカルハッシュテーブル１８は、任意のハッシュ値を重複なく登録する構造を有している。重複回数は、同一のハッシュ値が生成された回数を示す。あるハッシュ値が２回生成された場合、そのハッシュ値の重複回数は「２」となる。同様にｎ回生成された場合の重複回数は「ｎ」となる。なお、あるハッシュ値が、重複排除ストレージ３が保持するブロックに対応するハッシュ値であると特定されている場合は、ローカルハッシュテーブル１８の重複回数は「最大値」として表現される。重複回数が「最大値」であることは、重複回数が「最大値」であるハッシュ値に対応するブロックが、重複排除ストレージ３に存在していることを示す。重複回数が「最大値」に設定される処理は、後述する。 The local hash table 18 includes a "hash value" column and a "duplicate count" column, as shown in FIG. The local hash table 18 registers the hash value and the number of duplicates indicating the number of times the hash value is generated as a set. The number of hash values shown in FIG. 3 is k, but it may be any number. The local hash table 18 has a structure for registering arbitrary hash values without duplication. The number of duplicates indicates the number of times the same hash value was generated. When a certain hash value is generated twice, the number of times the hash value is duplicated is "2". Similarly, when it is generated n times, the number of duplications is "n". When a certain hash value is specified as a hash value corresponding to a block held by the deduplication storage 3, the number of duplications in the local hash table 18 is expressed as a "maximum value". The fact that the number of duplications is the "maximum value" indicates that the block corresponding to the hash value in which the number of duplications is the "maximum value" exists in the deduplication storage 3. The process in which the number of duplications is set to the "maximum value" will be described later.

ローカルハッシュテーブル１８は、任意のハッシュ値を識別子として、当該ハッシュ値の登録の有無と、対応する重複回数を高速に検索できる構造を有する。検索にハッシュ値が使用されることにより、記憶領域の効率的な使用と高速な検索が実現される。ハッシュ値は、例えば、ＭＤ５（ＭｅｓｓａｇｅＤｉｇｅｓｔＡｌｇｏｒｙｔｈｍ５）や、ＳＨＡ（ＳｅｃｕｒｅＨａｓｈＡｌｇｏｒｉｔｈｍ）のような一般的なハッシュアルゴリズムによって生成されてもよい。 The local hash table 18 has a structure in which the presence / absence of registration of the hash value and the corresponding number of duplicates can be searched at high speed by using an arbitrary hash value as an identifier. Hash values are used in the search to achieve efficient use of storage space and fast search. The hash value may be generated by a general hash algorithm such as MD5 (Message Digital Algorithm 5) or SHA (Secure Hash Algorithm).

テーブル管理部１１は、ブロックに分割されたファイルに対応するファイルポインタと、ファイルサイズと、計算したすべてのハッシュ値と、ブロックサイズとを、ファイル管理テーブル１９に登録する。 The table management unit 11 registers the file pointer corresponding to the file divided into blocks, the file size, all the calculated hash values, and the block size in the file management table 19.

ファイル管理テーブル１９は、図４に示されるように、「ファイルポインタ」列と、「ファイルサイズ」列と、ファイルのアーカイブ優先度を示す「アーカイブ優先度」列と、「ハッシュ値リスト」列とを含む。ファイル管理テーブル１９は、これらのデータを任意の行数だけ登録できる記憶領域である。「ファイルサイズ」は、当該ファイルのサイズを示す。「アーカイブ優先度」は、重複排除ストレージ３に送信される優先度を示す。「アーカイブ優先度」は、本実施例では０（初期値）〜１００の範囲で表現される。「アーカイブ優先度」は、対応するファイルを重複排除ストレージ３に格納したときに、対応するファイル全体の何割が、重複排除ストレージ３に記憶されているデータと重複するかを示す指標である。また、「アーカイブ優先度」の値が高い場合、対応するファイルが、優先的に重複排除ストレージ３に送信されてもよい。「アーカイブ優先度」の設定方法は、後述する。 As shown in FIG. 4, the file management table 19 includes a "file pointer" column, a "file size" column, an "archive priority" column indicating the archive priority of the file, and a "hash value list" column. including. The file management table 19 is a storage area in which any number of rows of these data can be registered. "File size" indicates the size of the file. “Archive priority” indicates the priority transmitted to the deduplication storage 3. The "archive priority" is expressed in the range of 0 (initial value) to 100 in this embodiment. The "archive priority" is an index indicating what percentage of the entire corresponding file overlaps with the data stored in the deduplication storage 3 when the corresponding file is stored in the deduplication storage 3. Further, when the value of "archive priority" is high, the corresponding file may be preferentially transmitted to the deduplication storage 3. The method of setting the "archive priority" will be described later.

「ハッシュ値リスト」は、ファイルを構成するすべてのブロックに対応するハッシュ値のリストである。「ハッシュ値リスト」は、ファイルを構成するブロックから生成される「ハッシュ値」と、当該ブロックのサイズを示す「ブロックサイズ」とを一組として含む。「ハッシュ値リスト」は、任意の数のこれらの組を、重複なく記録できる構造を有する。「ハッシュ値リスト」に、同一のハッシュ値が既に登録されている場合、２度目以降の登録ではハッシュ値が追加されず、当該ハッシュ値に対応するブロックサイズが加算される。例えば、図４に示されるファイルＡを構成するあるブロックによって生成されるハッシュ値が、＃３であり、当該ブロックのブロックサイズが３０ＫＢである。ファイルＡに対応する列の生成において、既にハッシュ値が＃３であるブロックが既に登録されており、ハッシュ値リストの＃３に対応する組は、（＃３，３０ＫＢ）を示しているとする。このとき、ファイルＡのブロックから２度目のハッシュ値＃３が生成されると、＃３に対応する組にブロックサイズが加算される。すなわち、ハッシュ値リストの＃３に対応する組は、（＃３，６０ＫＢ）となる。 The "hash value list" is a list of hash values corresponding to all the blocks constituting the file. The "hash value list" includes a "hash value" generated from the blocks constituting the file and a "block size" indicating the size of the block as a set. The "hash value list" has a structure capable of recording any number of these sets without duplication. When the same hash value is already registered in the "hash value list", the hash value is not added in the second and subsequent registrations, and the block size corresponding to the hash value is added. For example, the hash value generated by a certain block constituting the file A shown in FIG. 4 is # 3, and the block size of the block is 30 KB. In the generation of the column corresponding to the file A, it is assumed that the block having the hash value # 3 has already been registered, and the set corresponding to # 3 in the hash value list indicates (# 3,30KB). .. At this time, when the second hash value # 3 is generated from the block of the file A, the block size is added to the set corresponding to # 3. That is, the set corresponding to # 3 in the hash value list is (# 3,60KB).

重複回数集計サーバ２は、グローバルハッシュテーブル管理部２１と、グローバルハッシュテーブル２２とを備える。重複回数集計サーバ２は、あるブロックが、ファイルサーバ１〜１ｎのアーカイブ候補ファイル記憶部１７に抽出されたすべてファイル中において、何回重複しているかを集計する。さらに、重複回数集計サーバ２は、当該ブロックを重複排除ストレージ３が保持しているか否かを、重複排除ストレージ３に問い合わせる機能を有する。 The duplicate count totaling server 2 includes a global hash table management unit 21 and a global hash table 22. The duplication count totaling server 2 totals how many times a certain block is duplicated in all the files extracted in the archive candidate file storage unit 17 of the file servers 1 to 1n. Further, the duplication count totaling server 2 has a function of inquiring to the deduplication storage 3 whether or not the deduplication storage 3 holds the block.

グローバルハッシュテーブル２２は、図３に示されるローカルハッシュテーブル１８と同様に、「ハッシュ値」列と、「重複回数」列とを含む。 The global hash table 22 includes a “hash value” column and a “duplicate count” column, similar to the local hash table 18 shown in FIG.

グローバルハッシュテーブル管理部２１は、ファイルサーバ１〜１ｎが保持するローカルハッシュテーブル１８のハッシュ値とその重複回数を収集する。グローバルハッシュテーブル管理部２１は、収集したハッシュ値とその重複回数の組を、グローバルハッシュテーブル２２に登録する。収集したハッシュ値が、ファイルサーバ間で重複する場合は、グローバルハッシュテーブル管理部２１は、その重複回数を合算してグローバルハッシュテーブル２２に登録する。 The global hash table management unit 21 collects the hash value of the local hash table 18 held by the file servers 1 to 1n and the number of duplications thereof. The global hash table management unit 21 registers the collected hash value and the set of the number of duplications in the global hash table 22. When the collected hash values are duplicated between the file servers, the global hash table management unit 21 adds up the number of duplicates and registers them in the global hash table 22.

グローバルハッシュテーブル管理部２１は、ファイルサーバ１〜１ｎから収集したすべてのハッシュ値を重複なく、重複排除ストレージ３に送信し、当該ハッシュ値に対応するブロックを重複排除ストレージ３が保持しているか否かの問い合わせを行う。問い合わせに対する重複排除ストレージ３からの応答によって、グローバルハッシュテーブル管理部２１は、問い合わせを行ったハッシュ値に対応するブロックが重複排除ストレージ３に保持されていると判定する。この場合、グローバルハッシュテーブル管理部２１は、グローバルハッシュテーブル２２で管理する問い合わせを行ったハッシュ値に対応する重複回数を「最大値」に更新する。 The global hash table management unit 21 transmits all the hash values collected from the file servers 1 to 1n to the deduplication storage 3 without duplication, and whether or not the deduplication storage 3 holds the block corresponding to the hash value. Make an inquiry. Based on the response from the deduplication storage 3 to the inquiry, the global hash table management unit 21 determines that the block corresponding to the hash value inquired is held in the deduplication storage 3. In this case, the global hash table management unit 21 updates the number of duplicates corresponding to the inquired hash value managed by the global hash table 22 to the "maximum value".

グローバルハッシュテーブル管理部２１は、グローバルハッシュテーブル２２のすべてのハッシュ値について重複排除ストレージ３への重複判定の問い合わせを行う。すべての問い合わせが完了した後、グローバルハッシュテーブル管理部２１は、グローバルハッシュテーブル２２のハッシュ値と重複回数のすべての組をファイルサーバ１〜１ｎに送信する。 The global hash table management unit 21 inquires the deduplication storage 3 for the duplication determination for all the hash values of the global hash table 22. After all the inquiries are completed, the global hash table management unit 21 transmits all the sets of the hash value of the global hash table 22 and the number of duplicates to the file servers 1 to 1n.

本実施形態では、重複回数集計サーバ２は、ファイルサーバ１及び重複排除ストレージ３の外部に置かれているが、ファイルサーバ１〜１ｎのいずれかに備えられてもよい。あるいは、重複回数集計サーバ２は、重複排除ストレージ３に備えられてもよい。 In the present embodiment, the duplication count totaling server 2 is located outside the file server 1 and the deduplication storage 3, but may be provided in any of the file servers 1 to 1n. Alternatively, the duplication count totaling server 2 may be provided in the duplication elimination storage 3.

重複排除ストレージ３は、重複判定部３１と、ブロック管理テーブル３２とを備える。重複排除ストレージ３は、ストレージの記録効率を向上させる重複排除機能によって、ファイルを圧縮することができる機能を持つ一般的な重複排除ストレージであってもよい。重複排除ストレージ３は、所定の論理に従ってファイルを固定長または可変長のブロックに分割し、当該ブロックのデータに対して所定のハッシュ関数を使用してハッシュ値を計算する。重複排除ストレージ３は、ブロックと、計算されたハッシュ値とをブロック管理テーブル３２に登録する。登録時にブロック管理テーブル３２に既に同一のハッシュ値が存在する場合は、対応するブロックが重複していると判断して、当該ブロックはブロック管理テーブル３２に記録されず、ポインタ情報のみが重複排除ストレージ３に記録される。重複排除機能は、例えば、ＮＥＣｉＳｔｏｒａｇｅＨＳシリーズによって採用されている機能である。 The deduplication storage 3 includes a duplication determination unit 31 and a block management table 32. The deduplication storage 3 may be a general deduplication storage having a function of compressing a file by a deduplication function for improving the recording efficiency of the storage. The deduplication storage 3 divides a file into fixed-length or variable-length blocks according to a predetermined logic, and calculates a hash value for the data in the block using a predetermined hash function. The deduplication storage 3 registers the block and the calculated hash value in the block management table 32. If the same hash value already exists in the block management table 32 at the time of registration, it is determined that the corresponding blocks are duplicated, the block is not recorded in the block management table 32, and only the pointer information is deduplication storage. Recorded in 3. The deduplication function is, for example, a function adopted by the NEC iStore HS series.

重複排除ストレージ３は、重複排除機能を有する重複判定部３１を備える。重複判定部３１は、任意のハッシュ値について、対応するブロックを重複排除ストレージ３が保持しているか否かを判定する。 The deduplication storage 3 includes a duplication determination unit 31 having a deduplication function. The duplication determination unit 31 determines whether or not the deduplication storage 3 holds the corresponding block for an arbitrary hash value.

本実施形態では、重複排除ストレージ３は、重複排除システムに備えられているが、重複排除システムの外部に置かれてもよい。 In the present embodiment, the deduplication storage 3 is provided in the deduplication system, but may be placed outside the deduplication system.

（作用、効果）
上述したように、すなわち、本実施形態に係るファイルサーバ１は、テーブル管理部１１と、送信部１２と、受信部１３と、優先度計算部１４とを備える。テーブル管理部１１は、重複排除ストレージへのアーカイブ候補のファイルを複数に分割したブロックのそれぞれについてハッシュ値を生成する。また、テーブル管理部１１は、ハッシュ値と、ハッシュ値ごとに同一のハッシュ値が生成された回数を示す重複回数とを含むローカルハッシュテーブルを生成する。送信部１２は、ローカルハッシュテーブルを重複回数集計サーバに送信する。受信部１３は、重複回数集計サーバから、ハッシュ値と、重複排除ストレージに複数のブロックのそれぞれが存在するか否かに基づいて更新された重複回数とを含むグローバルハッシュテーブルを受信する。優先度計算部１４は、グローバルハッシュテーブルのハッシュ値及び重複回数に基づいて更新されたローカルハッシュテーブルの重複回数と、ファイルのサイズと、ハッシュ値に対応するブロックサイズとに基づいて、ファイルのアーカイブ優先度を計算する。 (Action, effect)
As described above, that is, the file server 1 according to the present embodiment includes a table management unit 11, a transmission unit 12, a reception unit 13, and a priority calculation unit 14. The table management unit 11 generates a hash value for each of the blocks in which the archive candidate file for the deduplication storage is divided into a plurality of blocks. Further, the table management unit 11 generates a local hash table including a hash value and a duplicate number indicating the number of times that the same hash value is generated for each hash value. The transmission unit 12 transmits the local hash table to the duplicate count totaling server. The receiving unit 13 receives from the duplicate count totaling server a global hash table including the hash value and the duplicate count updated based on whether or not each of the plurality of blocks exists in the deduplication storage. The priority calculation unit 14 archives the file based on the number of duplicates of the local hash table updated based on the hash value of the global hash table and the number of duplicates, the size of the file, and the block size corresponding to the hash value. Calculate the priority.

これにより、重複排除ストレージにファイルが送信される前に、ファイルの優先度付けにより重複排除の効果が大きいファイルが識別されるため、ファイルサーバ上のアーカイブ対象のファイルを効果的に選択して重複排除ストレージ３にアーカイブすることができる。その理由は、アーカイブ候補ファイル記憶部１７に抽出された各々のファイルについてアーカイブ優先度が計算されるので、当該ファイルを重複排除ストレージ３に格納したときにファイル全体の何割が重複するかが推定されるためである。それにより、重複排除ストレージの使用効率の低下を防ぐという効果が得られる。 This allows file prioritization to identify files that are highly effective in deduplication before they are sent to deduplication storage, effectively selecting and duplicating files to be archived on the file server. It can be archived in the exclusion storage 3. The reason is that the archive priority is calculated for each file extracted in the archive candidate file storage unit 17, so it is estimated what percentage of the entire file is duplicated when the file is stored in the deduplication storage 3. This is because it is done. As a result, the effect of preventing a decrease in the usage efficiency of the deduplication storage can be obtained.

（重複排除システムの処理フロー）
図５〜図９は、本発明の一実施形態に係る優先度を設定する方法を示す図である。以下、ファイルサーバがローカルハッシュテーブルを作成する段階からファイルサーバが優先度を設定する段階まで順番に説明する。 (Processing flow of deduplication system)
5 to 9 are views showing a method of setting a priority according to an embodiment of the present invention. Hereinafter, the steps from the stage where the file server creates the local hash table to the stage where the file server sets the priority will be described in order.

（ファイルサーバのローカルハッシュテーブルの処理フロー）
図５は、本発明の一実施形態に係るファイルサーバがローカルハッシュテーブルを送信する方法を示す図である。ファイルサーバ１〜１ｎそれぞれのファイルシステム１５は、ファイルサーバ１〜１ｎそれぞれのファイルを記憶している。ティアリング部１６は、ファイルシステム１５に記憶されているアクセス頻度を判定する。ティアリング部１６は、所定のアクセス頻度よりも低い頻度を有するファイルをアーカイブ候補ファイル記憶部１７に記憶する（ステップＳ５０１）。テーブル管理部１１は、アーカイブ候補ファイル記憶部１７に記憶されたファイルを順次取り出す（ステップＳ５０２）。ティアリング部１６は、今後使用されない可能性が高いファイルをアーカイブ候補ファイル記憶部１７に記憶してもよい。例えば、今後使用されない可能性が高いファイルと判定される場合は、ファイルへの最終アクセスから所定の期間を経過した場合、ファイルが所定のフォルダに保管された場合、及びファイルのアクセス権が読み取り専用に変更された場合を含んでもよい。 (Processing flow of local hash table of file server)
FIG. 5 is a diagram showing a method in which a file server according to an embodiment of the present invention transmits a local hash table. The file system 15 of each of the file servers 1 to 1n stores the file of each of the file servers 1 to 1n. The tearing unit 16 determines the access frequency stored in the file system 15. The tearing unit 16 stores files having a frequency lower than a predetermined access frequency in the archive candidate file storage unit 17 (step S501). The table management unit 11 sequentially retrieves the files stored in the archive candidate file storage unit 17 (step S502). The tearing unit 16 may store files that are unlikely to be used in the future in the archive candidate file storage unit 17. For example, if it is determined that the file is unlikely to be used in the future, the specified period has passed since the last access to the file, the file was stored in the specified folder, and the access right of the file is read-only. It may include the case where it is changed to.

テーブル管理部１１は、ステップＳ５０２で取り出したファイルを管理するため、ファイル管理テーブル１９に新規の行を追加する。テーブル管理部１１は、当該行のファイルポインタ列に、ステップＳ５０２で取り出したファイルのファイルポインタと、当該ファイルのサイズとを登録する（ステップＳ５０３）。テーブル管理部１１のファイル分割部１１１は、当該ファイルを、可変長または固定長のブロックに分割する（ステップＳ５０４）。ファイルは、そのサイズに応じて、１個または、複数個のブロックに分割される。 The table management unit 11 adds a new row to the file management table 19 in order to manage the file extracted in step S502. The table management unit 11 registers the file pointer of the file fetched in step S502 and the size of the file in the file pointer column of the row (step S503). The file division unit 111 of the table management unit 11 divides the file into variable-length or fixed-length blocks (step S504). The file is divided into one or a plurality of blocks according to its size.

テーブル管理部１１のハッシュ値生成部１１２は、ステップＳ５０４で生成されたすべてのブロックについて、ハッシュ値を算出する（ステップＳ５０５）。テーブル管理部１１は、ローカルハッシュテーブル１８に、算出されたすべてのハッシュ値を登録する（ステップＳ５０６）。ローカルハッシュテーブル１８の構造により、ハッシュ値は重複なく登録され、重複は「重複回数」列の値で表現される。 The hash value generation unit 112 of the table management unit 11 calculates the hash value for all the blocks generated in step S504 (step S505). The table management unit 11 registers all the calculated hash values in the local hash table 18 (step S506). Due to the structure of the local hash table 18, hash values are registered without duplication, and duplication is represented by the value in the "duplication count" column.

テーブル管理部１１は、ステップＳ５０５で算出されたすべてのハッシュ値と、ハッシュ値に対応するブロックのサイズとを、ファイル管理テーブル１９の対応するファイルの行（ステップＳ５０３で追加した行）の「ハッシュ値リスト」列に登録する（ステップＳ５０７）。ハッシュ値リストの構造により、ハッシュ値は重複なく登録され、同一のハッシュ値が登録されている場合は、２度目以降はハッシュ値を追加せず、当該ハッシュ値に対応するブロックのサイズが加算される。 The table management unit 11 sets all the hash values calculated in step S505 and the block sizes corresponding to the hash values into the "hash" of the corresponding file line (the line added in step S503) of the file management table 19. Register in the "value list" column (step S507). Due to the structure of the hash value list, hash values are registered without duplication, and if the same hash value is registered, the hash value is not added from the second time onward, and the size of the block corresponding to the hash value is added. NS.

テーブル管理部１１は、ローカルハッシュテーブル１８及びファイル管理テーブル１９に登録していない未処理のブロックがあるか否か判定する（ステップＳ５０８）。未処理のブロックがある場合（ステップＳ５０８：Ｙｅｓ）、未処理のブロックのハッシュ値の計算（ステップＳ５０５）に戻る。未処理のブロックがない場合（ステップＳ５０８：Ｎｏ）、テーブル管理部１１は、アーカイブ候補ファイル記憶部１７から取り出していない未処理のファイルがあるか否か判定する（ステップＳ５０９）。未処理のファイルがある場合（ステップＳ５０９：Ｙｅｓ）、アーカイブ候補ファイル記憶部１７から未処理のファイルを取り出すフロー（ステップＳ５０２）に戻る。未処理のファイルがない場合（ステップＳ５０９：Ｎｏ）、ファイルサーバ１の送信部１２は、ローカルハッシュテーブル１８の内容を重複回数集計サーバ２に送信する（ステップ５１０）。 The table management unit 11 determines whether or not there is an unprocessed block that has not been registered in the local hash table 18 and the file management table 19 (step S508). If there is an unprocessed block (step S508: Yes), the process returns to the calculation of the hash value of the unprocessed block (step S505). When there is no unprocessed block (step S508: No), the table management unit 11 determines whether or not there is an unprocessed file that has not been fetched from the archive candidate file storage unit 17 (step S509). If there is an unprocessed file (step S509: Yes), the process returns to the flow of extracting the unprocessed file from the archive candidate file storage unit 17 (step S502). When there is no unprocessed file (step S509: No), the transmission unit 12 of the file server 1 transmits the contents of the local hash table 18 to the duplicate count totaling server 2 (step 510).

（重複回数集計サーバの処理フロー）
図６は、本発明の一実施形態に係る重複回数集計サーバの動作方法を示す図である。重複回数集計サーバ２は、各々のファイルサーバ１〜１ｎがステップＳ５１０で送信したローカルハッシュテーブル１８を構成するハッシュ値と、その重複回数を受信する（ステップ６１０）。重複回数集計サーバ２のグローバルハッシュテーブル管理部２１は、ステップＳ６０１で受信したハッシュ値とその重複回数の組を、グローバルハッシュテーブル２２に登録する（ステップＳ６０２）。グローバルハッシュテーブル２２の構造により、ハッシュ値は重複なく登録され、重複は「重複回数」列の合算で表現される。 (Processing flow of duplicate count total server)
FIG. 6 is a diagram showing an operation method of the duplicate number counting server according to the embodiment of the present invention. The duplicate count totaling server 2 receives the hash value constituting the local hash table 18 transmitted by each file server 1 to 1n in step S510 and the duplicate count (step 610). The global hash table management unit 21 of the duplicate count totaling server 2 registers the hash value received in step S601 and the set of the duplicate counts in the global hash table 22 (step S602). Due to the structure of the global hash table 22, hash values are registered without duplication, and duplication is represented by the sum of the "number of duplications" columns.

グローバルハッシュテーブル管理部２１は、他のファイルサーバからローカルハッシュテーブルが送信されているか否か判定する（ステップＳ６０３）。他のファイルサーバからの送信がある場合（ステップＳ６０３：Ｙｅｓ）、そのファイルサーバからローカルハッシュテーブルの内容の受信（ステップＳ６０１）に戻る。他のファイルサーバからの送信がない場合（ステップＳ６０３：Ｎｏ）、グローバルハッシュテーブル管理部２１は、グローバルハッシュテーブル２２に登録されたすべてのハッシュ値を重複なく、重複排除ストレージ３に送信する（ステップＳ６０４）。これにより、グローバルハッシュテーブル管理部２１は、グローバルハッシュテーブル２２に登録されたハッシュ値に対応するブロックが、重複排除ストレージ３に保持されているか否か問い合わせる。 The global hash table management unit 21 determines whether or not a local hash table is transmitted from another file server (step S603). If there is a transmission from another file server (step S603: Yes), the process returns to receiving the contents of the local hash table from that file server (step S601). When there is no transmission from another file server (step S603: No), the global hash table management unit 21 transmits all the hash values registered in the global hash table 22 to the deduplication storage 3 without duplication (step). S604). As a result, the global hash table management unit 21 inquires whether or not the block corresponding to the hash value registered in the global hash table 22 is held in the deduplication storage 3.

グローバルハッシュテーブル管理部２１は、重複排除ストレージ３から受信したハッシュ値があるか否か判定する（ステップＳ６０５）。受信したハッシュ値がある場合（ステップＳ６０５：Ｙｅｓ）、グローバルハッシュテーブル管理部２１は、ステップＳ６０５で受信したハッシュ値に対応する、グローバルハッシュテーブル２２の重複回数を「最大値」に更新する（ステップＳ６０６）。ステップＳ６０６の後、重複排除ストレージ３からさらに受信したハッシュ値があるか否かの判定（ステップＳ６０５）に戻る。受信したハッシュ値がない場合（ステップＳ６０５：Ｎｏ）、グローバルハッシュテーブル管理部２１は、各々のファイルサーバ１〜１ｎにグローバルハッシュテーブル２２を送信し（ステップ６０７）、処理フローは完了する。 The global hash table management unit 21 determines whether or not there is a hash value received from the deduplication storage 3 (step S605). When there is a received hash value (step S605: Yes), the global hash table management unit 21 updates the number of duplications of the global hash table 22 corresponding to the hash value received in step S605 to the "maximum value" (step). S606). After step S606, the process returns to the determination of whether or not there is a hash value further received from the deduplication storage 3 (step S605). When there is no received hash value (step S605: No), the global hash table management unit 21 transmits the global hash table 22 to each file server 1 to 1n (step 607), and the processing flow is completed.

本実施形態では、重複回数を「最大値」に更新することによって、対応するブロックが、重複排除ストレージ３に存在していることを示した。代替的に、別の手段によって、対応するブロックが重複排除ストレージ３に存在していることが示されてもよい。 In the present embodiment, by updating the number of duplications to the "maximum value", it is shown that the corresponding block exists in the deduplication storage 3. Alternatively, another means may indicate that the corresponding block is present in the deduplication storage 3.

（重複排除ストレージの処理フロー）
図７は、本発明の一実施形態に係る重複排除ストレージの動作方法を示す図である。重複排除ストレージ３は、重複回数集計サーバ２からハッシュ値を受信する（Ｓ７０１）。ステップ７０１は、図６のステップＳ６０４に対応する。重複排除ストレージ３の重複判定部３１は、ステップＳ７０１で受信したハッシュ値に対応するブロックをブロック管理テーブル３２に保持しているかを判定する（Ｓ７０２）。保持していると判定された場合（ステップＳ７０２：Ｙｅｓ）、重複排除ストレージ３は、ステップＳ７０１で受信したハッシュ値を、重複回数集計サーバ２にそのまま送信する（ステップＳ７０３）。保持していないと判定された場合（ステップＳ７０２：Ｎｏ）、又は、ステップＳ７０３の後、重複排除ストレージ３は、他に受信されたハッシュ値があるか否か判定する（ステップＳ７０４）。受信されたハッシュ値がある場合（ステップＳ７０４：Ｙｅｓ）、重複判定部３１は、受信したハッシュ値に対応するブロックをブロック管理テーブル３２に保持しているかの判定（ステップＳ７０２）に戻る。受信されるハッシュ値がない場合（ステップＳ７０４：Ｎｏ）、重複排除ストレージ３の処理フローは完了する。 (Processing flow of deduplication storage)
FIG. 7 is a diagram showing an operation method of the deduplication storage according to the embodiment of the present invention. The deduplication storage 3 receives the hash value from the duplicate count totaling server 2 (S701). Step 701 corresponds to step S604 of FIG. The duplication determination unit 31 of the deduplication storage 3 determines whether the block corresponding to the hash value received in step S701 is held in the block management table 32 (S702). When it is determined that the storage is held (step S702: Yes), the deduplication storage 3 transmits the hash value received in step S701 to the duplication count totaling server 2 as it is (step S703). When it is determined that the hash value is not held (step S702: No), or after step S703, the deduplication storage 3 determines whether or not there is another received hash value (step S704). When there is a received hash value (step S704: Yes), the duplication determination unit 31 returns to the determination (step S702) of whether or not the block corresponding to the received hash value is held in the block management table 32. If there is no hash value to be received (step S704: No), the processing flow of the deduplication storage 3 is completed.

（ファイルサーバのローカルハッシュテーブルの更新フロー）
図８は、本発明の一実施形態に係るファイルサーバがローカルハッシュテーブルを更新する方法を示す図である。ファイルサーバ１〜１ｎの受信部１３は、重複回数集計サーバ２がステップＳ６０７で送信した、グローバルハッシュテーブル２２を構成するハッシュ値とその重複回数を受信する（ステップ８０１）。ファイルサーバ１〜１ｎのテーブル管理部１１は、ステップＳ８０１で受信したハッシュ値が、ローカルハッシュテーブル１８に登録されているか否か判定する（ステップＳ８０２）。登録されている場合（ステップＳ８０２：Ｙｅｓ）、ローカルハッシュテーブル１８における対応するハッシュ値の重複回数を、ステップＳ８０１で受信した重複回数の値に更新する（ステップＳ８０３）。登録されていない場合（ステップＳ８０２：Ｎｏ）、又は、ステップＳ８０３の後、他にステップＳ８０２の判定を行っていないハッシュ値があるか判定する（ステップＳ８０４）。判定を行っていないハッシュ値がある場合（ステップＳ８０４：Ｙｅｓ）、テーブル管理部１１は、そのハッシュ値がローカルハッシュテーブル１８に登録されているか否かの判定（ステップ８０２）に戻る。判定を行っていないハッシュ値がない場合（ステップＳ８０４：Ｎｏ）、テーブル管理部１１は、ファイル管理テーブル１９の「アーカイブ優先度」列の設定を行う（ステップＳ８０５）。「アーカイブ優先度」列の設定方法は、後述する。 (Update flow of local hash table of file server)
FIG. 8 is a diagram showing a method in which a file server according to an embodiment of the present invention updates a local hash table. The receiving unit 13 of the file servers 1 to 1n receives the hash value constituting the global hash table 22 and the number of duplications transmitted by the duplication count totaling server 2 in step S607 (step 801). The table management unit 11 of the file servers 1 to 1n determines whether or not the hash value received in step S801 is registered in the local hash table 18 (step S802). When registered (step S802: Yes), the number of duplicates of the corresponding hash value in the local hash table 18 is updated to the value of the number of duplicates received in step S801 (step S803). When it is not registered (step S802: No), or after step S803, it is determined whether there is another hash value for which the determination of step S802 has not been performed (step S804). If there is a hash value that has not been determined (step S804: Yes), the table management unit 11 returns to the determination (step 802) of whether or not the hash value is registered in the local hash table 18. When there is no hash value that has not been determined (step S804: No), the table management unit 11 sets the "archive priority" column of the file management table 19 (step S805). The setting method of the "archive priority" column will be described later.

（ファイルサーバのアーカイブ優先度の設定フロー）
図９は、本発明の一実施形態に係るファイルサーバが優先度を設定する方法を示す図である。優先度計算部１４は、ファイル管理テーブル１９から、ファイルポインタと、ファイルサイズと、アーカイブ優先度と、ハッシュ値リストとを含む行を、順次取り出す（ステップＳ９０１）。優先度計算部１４は、ステップＳ９０１で取り出したハッシュ値リストから、当該ファイルを構成するブロックに対応するハッシュ値と、ブロックサイズの組を、順次取り出す（ステップＳ９０２）。優先度計算部１４は、ステップＳ９０２で取り出したハッシュ値を識別子として、ローカルハッシュテーブル１８から、当該ハッシュ値に対応するブロックの重複回数を取り出す（ステップＳ９０３）。 (File server archive priority setting flow)
FIG. 9 is a diagram showing a method in which a file server according to an embodiment of the present invention sets a priority. The priority calculation unit 14 sequentially fetches rows including a file pointer, a file size, an archive priority, and a hash value list from the file management table 19 (step S901). The priority calculation unit 14 sequentially extracts a set of a hash value corresponding to a block constituting the file and a block size from the hash value list extracted in step S901 (step S902). The priority calculation unit 14 uses the hash value retrieved in step S902 as an identifier and retrieves the number of duplicate blocks corresponding to the hash value from the local hash table 18 (step S903).

優先度計算部１４は、ステップＳ９０１で取得したファイルサイズと、ステップＳ９０２で取得したブロックサイズとから、処理対象ファイルに対する当該ブロックの占有率（％）を計算する。さらに、優先度計算部１４は、ステップＳ９０３で取得した重複回数と、計算されたブロックの占有率から、当該ブロックの処理対象ファイルに与える優先度を示すブロック優先度を計算する（ステップＳ９０４）。 The priority calculation unit 14 calculates the occupancy rate (%) of the block with respect to the file to be processed from the file size acquired in step S901 and the block size acquired in step S902. Further, the priority calculation unit 14 calculates the block priority indicating the priority given to the processing target file of the block from the number of duplications acquired in step S903 and the calculated block occupancy rate (step S904).

以下、ブロック優先度の計算方法を説明する。例えば、ブロック優先度が０（優先度最低）から、ブロック優先度１００（優先度最高）の範囲で、優先度付けを行う場合の計算方法を説明する。本実施形態では、重複排除ストレージ３に期待する所定の重複回数（期待値）が予め設定されている。 Hereinafter, the calculation method of the block priority will be described. For example, a calculation method in the case of prioritizing in the range of the block priority of 0 (lowest priority) to the block priority of 100 (highest priority) will be described. In the present embodiment, a predetermined number of duplications (expected value) expected for the deduplication storage 3 is preset.

当該ブロックの処理対象ファイルに与える優先度（ブロック優先度）は、以下の式で計算される。
ブロック優先度＝みなし優先度×占有率（％） The priority (block priority) given to the processing target file of the block is calculated by the following formula.
Block priority = deemed priority x occupancy rate (%)

みなし優先度は、ステップＳ９０３で取得した当該ブロックの重複回数に基づいて計算される。重複回数が「最大値」の場合、優先度計算部１４は、みなし優先度を１００に設定する。重複回数が「最大値」ではないが、重複回数が所定の期待値以上である場合、優先度計算部１４は、みなし優先度を１００に設定する。重複回数が所定の期待値未満の場合、以下の式のように、みなし優先度は、重複回数と所定の期待値の比率で決定する。
みなし優先度＝１００×（重複回数／所定の期待値）
例えば、重複回数が２回、所定の期待値が２０回の場合、みなし優先度は１０となる。また、重複回数が１６回、所定の期待値が２０回の場合、みなし優先度は８０となる。 The deemed priority is calculated based on the number of duplicates of the block acquired in step S903. When the number of duplications is the "maximum value", the priority calculation unit 14 sets the deemed priority to 100. When the number of duplications is not the "maximum value" but the number of duplications is equal to or greater than a predetermined expected value, the priority calculation unit 14 sets the deemed priority to 100. When the number of duplications is less than the predetermined expected value, the deemed priority is determined by the ratio of the number of duplications and the predetermined expected value as shown in the following formula.
Deemed priority = 100 x (number of duplicates / predetermined expected value)
For example, when the number of duplications is 2 and the predetermined expected value is 20, the deemed priority is 10. Further, when the number of duplications is 16 and the predetermined expected value is 20, the deemed priority is 80.

ブロック優先度については、例えば、みなし優先度が１０、占有率が５０％の場合、ブロック優先度は５になる。また、みなし優先度が８０、占有率が５０％の場合、ブロック優先度は４０になる。また、みなし優先度が１００、占有率が１００％の場合、ブロック優先度は１００になる。以上が、ブロック優先度の計算方法である。 Regarding the block priority, for example, when the deemed priority is 10 and the occupancy rate is 50%, the block priority is 5. Further, when the deemed priority is 80 and the occupancy rate is 50%, the block priority is 40. Further, when the deemed priority is 100 and the occupancy rate is 100%, the block priority is 100. The above is the calculation method of the block priority.

優先度計算部１４は、ステップＳ９０４で計算したブロック優先度を、ステップＳ９０１で取り出したアーカイブ優先度に加算してファイル管理テーブル１９を更新する（ステップＳ９０５）。 The priority calculation unit 14 updates the file management table 19 by adding the block priority calculated in step S904 to the archive priority taken out in step S901 (step S905).

優先度計算部１４は、対応ファイルのハッシュ値リストに、未処理の（ハッシュ値＋ブロックサイズ）の組があるかを判定する（ステップＳ９０６）。未処理の組がある場合（ステップＳ９０６：Ｙｅｓ）、優先度計算部１４は、未処理の組の取り出し（ステップＳ９０２）に戻る。未処理の組がない場合（ステップＳ９０６：Ｎｏ）、優先度計算部１４は、ファイル管理テーブル１９に登録されたすべてのファイルに、未処理のファイルがあるかを判定する（ステップＳ９０７）。未処理のファイルがある場合（ステップＳ９０７：Ｙｅｓ）、ファイル管理テーブル１９から未処理のファイルに対応する行の取り出し（ステップＳ９０１）に戻る。未処理のファイルがない場合（ステップＳ９０７：Ｎｏ）、処理フローは完了する。 The priority calculation unit 14 determines whether or not there is an unprocessed (hash value + block size) set in the hash value list of the corresponding file (step S906). When there is an unprocessed set (step S906: Yes), the priority calculation unit 14 returns to the extraction of the unprocessed set (step S902). When there is no unprocessed set (step S906: No), the priority calculation unit 14 determines whether or not there is an unprocessed file in all the files registered in the file management table 19 (step S907). If there is an unprocessed file (step S907: Yes), the process returns to fetching the line corresponding to the unprocessed file from the file management table 19 (step S901). If there is no unprocessed file (step S907: No), the processing flow is completed.

以上のとおり、ファイルサーバがローカルハッシュテーブルを作成する段階からファイルサーバが優先度を設定する段階まで順番に説明した。 As described above, the steps from the stage where the file server creates the local hash table to the stage where the file server sets the priority have been described in order.

（作用、効果）
上述したように、すなわち、本実施形態に係るファイルサーバ１の処理方法では、ファイルサーバ１のテーブル管理部１１が、重複排除ストレージ３へのアーカイブ候補のファイルを複数に分割したブロックのそれぞれについてハッシュ値を生成する。テーブル管理部１１が、ハッシュ値と、ハッシュ値ごとに同一のハッシュ値が生成された回数を示す重複回数とを含むローカルハッシュテーブル１８を生成する。ファイルサーバ１の送信部１２が、ローカルハッシュテーブル１８を重複回数集計サーバ２に送信する。ファイルサーバ１の受信部１３が、重複回数集計サーバ２から、ハッシュ値と、重複排除ストレージ３に複数のブロックのそれぞれが存在するか否かに基づいて更新された重複回数とを含むグローバルハッシュテーブル２２を受信する。ファイルサーバ１の優先度計算部１４が、グローバルハッシュテーブル２２に含まれるハッシュ値及び重複回数に基づいて更新されたローカルハッシュテーブル１８のハッシュ値に対応する重複回数と、ファイルのサイズと、ハッシュ値に対応するブロックサイズとに基づいて、ファイルのアーカイブ優先度を計算する。 (Action, effect)
As described above, that is, in the processing method of the file server 1 according to the present embodiment, the table management unit 11 of the file server 1 hashes each of the blocks in which the archive candidate files to the deduplication storage 3 are divided into a plurality of blocks. Generate a value. The table management unit 11 generates a local hash table 18 including a hash value and a duplicate number indicating the number of times the same hash value is generated for each hash value. The transmission unit 12 of the file server 1 transmits the local hash table 18 to the duplicate count totaling server 2. A global hash table in which the receiving unit 13 of the file server 1 includes the hash value from the duplication count totaling server 2 and the duplication count updated based on whether or not each of the plurality of blocks exists in the deduplication storage 3. 22 is received. The priority calculation unit 14 of the file server 1 corresponds to the hash value of the local hash table 18 updated based on the hash value included in the global hash table 22 and the hash value of the local hash table 18, the file size, and the hash value. Calculate the archive priority of the file based on the corresponding block size and.

（本実施形態の変形例）
以上、本実施形態に係るファイルサーバ１について詳細に説明したが、ファイルサーバ１の具体的な態様は、上述のものに限定されることはなく、要旨を逸脱しない範囲内において種々の設計変更等を加えることは可能である。 (Modified example of this embodiment)
Although the file server 1 according to the present embodiment has been described in detail above, the specific embodiment of the file server 1 is not limited to the above, and various design changes and the like are made within a range that does not deviate from the gist. Is possible.

例えば、本実施形態の変形例として、図９の対象ファイルの処理が完了した後（ステップS９０７：Ｎｏ）、優先度計算部１４は、アーカイブ優先度に基づいてファイルが重複排除ストレージへのアーカイブ対象であるか否かを判定する。そして、送信部１２は、アーカイブ対象と判定されたファイルを重複排除ストレージ３へ送信する。 For example, as a modification of the present embodiment, after the processing of the target file of FIG. 9 is completed (step S907: No), the priority calculation unit 14 records the file as an archive target in the deduplication storage based on the archive priority. It is determined whether or not it is. Then, the transmission unit 12 transmits the file determined to be the archive target to the deduplication storage 3.

（作用、効果）
本実施形態の変形例により、重複排除効果が大きいファイルが優先的に重複排除ストレージに送信される。これにより、重複排除ストレージの使用効率の低下を防ぐという効果が得られる。 (Action, effect)
According to the modification of the present embodiment, the file having a large deduplication effect is preferentially transmitted to the deduplication storage. This has the effect of preventing a decrease in the usage efficiency of the deduplication storage.

なお、上述した各処理の過程は、プログラムの形式でコンピュータ読み取り可能な記録媒体に記憶されており、このプログラムをファイルサーバ１又は重複排除システム上のＣＰＵが読み出して実行することによって、上記処理が行われる。ここでコンピュータ読み取り可能な記録媒体とは、磁気ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、半導体メモリ等をいう。また、このコンピュータプログラムを通信回線によってコンピュータに配信し、この配信を受けたコンピュータが当該プログラムを実行するようにしてもよい。 The process of each process described above is stored in a computer-readable recording medium in the form of a program, and the process is performed by reading and executing this program by the file server 1 or the CPU on the deduplication system. Will be done. Here, the computer-readable recording medium refers to a magnetic disk, a magneto-optical disk, a CD-ROM, a DVD-ROM, a semiconductor memory, or the like. Further, this computer program may be distributed to a computer via a communication line, and the computer receiving the distribution may execute the program.

また、上記プログラムは、前述した機能の一部を実現するためのものであってもよい。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であってもよい。また、コンピュータは、１台のコンピュータで構成されていても良いし、通信可能に接続された複数のコンピュータで構成されていてもよい。 Further, the above program may be for realizing a part of the above-mentioned functions. Further, it may be a so-called difference file (difference program) that can realize the above-mentioned function in combination with a program already recorded in the computer system. Further, the computer may be composed of one computer or may be composed of a plurality of computers connected so as to be able to communicate with each other.

以上のとおり、本開示に係るいくつかの実施形態を説明したが、これら全ての実施形態は、例として提示したものであり、発明の範囲を限定することを意図していない。これらの実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で種々の省略、置き換え、変更を行うことができる。これらの実施形態及びその変形は、発明の範囲や要旨に含まれると同様に、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 As described above, some embodiments according to the present disclosure have been described, but all of these embodiments are presented as examples and are not intended to limit the scope of the invention. These embodiments can be implemented in various other embodiments, and various omissions, replacements, and changes can be made without departing from the gist of the invention. These embodiments and variations thereof are included in the scope of the invention described in the claims and the equivalent scope thereof, as are included in the scope and gist of the invention.

１ファイルサーバ
１１テーブル管理部
１１１ファイル分割部
１１２ハッシュ値生成部
１２送信部
１３受信部
１４優先度計算部
１５ファイルシステム
１６ティアリング部
１７アーカイブ候補ファイル記憶部
１８ローカルハッシュテーブル
１９ファイル管理テーブル
２重複回数集計サーバ
２１グローバルハッシュテーブル管理部
２２グローバルハッシュテーブル
３重複排除ストレージ
３１重複判定部
３２ブロック管理テーブル 1 File server 11 Table management unit 111 File division unit 112 Hash value generation unit 12 Transmission unit 13 Reception unit 14 Priority calculation unit 15 File system 16 Tiering unit 17 Archive candidate file storage unit 18 Local hash table 19 File management table 2 Number of duplicates Aggregation server 21 Global hash table management unit 22 Global hash table 3 Deduplication storage 31 Duplicate judgment unit 32 Block management table

Claims

A hash value is generated for each of the blocks obtained by dividing the archive candidate file to the deduplication storage into a plurality of blocks, and the hash value and the number of duplicates indicating the number of times the same hash value is generated for each hash value are included. A table management unit that generates a local hash table, and
A transmitter that sends the local hash table to the duplicate count totaling server, and
A receiver that receives a global hash table containing a hash value and an updated number of duplicates based on whether or not each of the plurality of blocks exists in the deduplication storage from the duplicate count aggregation server.
Based on the number of duplicates corresponding to the hash value contained in the global hash table and the hash value of the local hash table updated based on the number of duplicates, the size of the file, and the block size corresponding to the hash value. A priority calculation unit that calculates the archive priority of the file, and
File server with.

At least one of the duplicates in the global hash table including the updated duplicates indicates the maximum number of duplicates, and the maximum number of duplicates means that the block that produces the corresponding hash value is the duplicate. The file server according to claim 1, indicating that it exists in the excluded storage.

The priority calculation unit is
The block occupancy rate is calculated based on the block size and the file size, the deemed priority is calculated based on the number of duplicates of the updated local hash table and a predetermined expected value, and the block occupancy rate is calculated. And the block priority is calculated based on the deemed priority, and the archive priority is calculated by the sum of the block priorities calculated for the file.
When the number of duplications is the maximum value, the deemed priority is 100%.
The file server according to claim 2, wherein when the number of duplications is not the maximum value and the number of duplications is equal to or greater than a predetermined expected value, the deemed priority is 100%.

The file server according to any one of claims 1 to 3, wherein the plurality of blocks included in the archive candidate file are divided into a fixed length or a variable length.

The priority calculation unit determines whether or not the file is to be archived in the deduplication storage based on the archive priority.
The file server according to any one of claims 1 to 4, wherein the transmission unit transmits the file determined to be the archive target to the deduplication storage.

It is equipped with one or more file servers and a duplicate count totaling server.
The one or more file servers generate a hash value for each of the blocks obtained by dividing the archive candidate file to the deduplication storage into a plurality of blocks, and the hash value and the same hash value are generated for each hash value. A local hash table including the number of duplicates indicating the number of duplicates is generated, and the local hash table is sent to the duplicate count total server.
The duplicate count totaling server generates a global hash table by aggregating one or more local hash tables, and the block corresponding to the hash value is the duplicate using the hash value included in the global hash table. It inquires whether it exists in the exclusion storage, updates the number of duplications corresponding to the blocks existing in the deduplication storage in the global hash table, and sends the updated global hash table to the one or more file servers.
The one or more file servers update the local hash table based on the hash value and the number of duplicates contained in the updated global hash table, and the number of duplicates corresponding to the hash value of the updated local hash table and the number of duplicates. The archive priority of the file is calculated based on the size of the file and the block size corresponding to the hash value.
Deduplication system.

A hash value is generated for each of the blocks obtained by dividing the archive candidate file to the deduplication storage into a plurality of blocks, and the hash value and the number of duplicates indicating the number of times the same hash value is generated for each hash value are included. Generate a local hash table and
Send the local hash table to the duplicate count totaling server and send it to the duplicate count server.
A global hash table containing the hash value and the number of duplicates updated based on whether or not each of the plurality of blocks exists in the deduplication storage is received from the duplicate count aggregation server.
Based on the number of duplicates corresponding to the hash value contained in the global hash table and the hash value of the local hash table updated based on the number of duplicates, the size of the file, and the block size corresponding to the hash value. A file server processing method that calculates the archive priority of the file.

A hash value is generated for each of the blocks obtained by dividing the archive candidate file to the deduplication storage into a plurality of blocks, and the hash value and the number of duplicates indicating the number of times the same hash value is generated for each hash value are included. Generate a local hash table, send the local hash table to the duplicate count total server, and
A global hash table is generated by aggregating one or more local hash tables, and the hash values contained in the global hash table are used to query whether the block corresponding to the hash value exists in the deduplication storage. , Update the number of duplicates corresponding to the block existing in the deduplication storage of the global hash table, and send the updated global hash table to one or more file servers.
The local hash table is updated based on the hash value and the number of duplicates contained in the updated global hash table, the number of duplicates corresponding to the hash value of the updated local hash table, the size of the file, and the hash value. Calculate the archive priority of the file based on the block size corresponding to
How to handle the deduplication system.

On the file server
A hash value is generated for each of the blocks obtained by dividing the archive candidate file to the deduplication storage into a plurality of blocks, and the hash value and the number of duplicates indicating the number of times the same hash value is generated for each hash value are included. A table management method to generate a local hash table,
A transmission means for transmitting the local hash table to the duplicate count totaling server, and
A receiving means for receiving a global hash table containing a hash value and an updated number of duplicates based on whether or not each of the plurality of blocks exists in the deduplication storage from the duplicate count aggregation server.
Based on the number of duplicates corresponding to the hash value contained in the global hash table and the hash value of the local hash table updated based on the number of duplicates, the size of the file, and the block size corresponding to the hash value. A priority calculation means for calculating the archive priority of the file, and
A program to execute.

For the deduplication system,
A hash value is generated for each of the blocks obtained by dividing the archive candidate file to the deduplication storage into a plurality of blocks, and the hash value and the number of duplicates indicating the number of times the same hash value is generated for each hash value are included. A means of generating a local hash table and sending the local hash table to the duplicate count totaling server,
A global hash table is generated by aggregating one or more local hash tables, and the hash values contained in the global hash table are used to query whether the block corresponding to the hash value exists in the deduplication storage. , A means of updating the number of duplications corresponding to the blocks existing in the deduplication storage of the global hash table and transmitting the updated global hash table to one or more file servers.
The local hash table is updated based on the hash value and the number of duplicates contained in the updated global hash table, the number of duplicates corresponding to the hash value of the updated local hash table, the size of the file, and the hash value. A means of calculating the archive priority of the file based on the block size corresponding to
A program to execute.