JP6113816B1

JP6113816B1 - Information processing system, information processing apparatus, and program

Info

Publication number: JP6113816B1
Application number: JP2015225923A
Authority: JP
Inventors: 石山　政浩; 政浩石山; 秀則松崎
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2015-11-18
Filing date: 2015-11-18
Publication date: 2017-04-12
Anticipated expiration: 2035-11-18
Also published as: JP2017097437A

Abstract

【課題】情報処理システムにおいてデータの重複排除を行う。【解決手段】データ保存先装置３は、データに基づいて生成された保存単位サイズの複数の保存データのうちの第１の保存データと、保存単位サイズよりも大きい重複検出サイズの第１の断片データに対応する第１のハッシュ値と、第１の保存データの少なくとも一部に対応する第２のハッシュ値とを受信する。外部の装置２によって保存対象データに基づいて生成された保存単位サイズの第２の保存データを受信する。第１の保存データの少なくとも一部に対応する第２のハッシュ値が、第２の保存データの少なくとも一部に対応する第３のハッシュ値と一致する場合に、第１の断片データに対応する第１のハッシュ値が、複数の装置に保存されており第２の保存データを含み重複検出サイズの第２の断片データに対応する第４のハッシュ値と一致するか判断し、一致する場合に、第２の断片データに対する重複を検出する。【選択図】図１Data deduplication is performed in an information processing system. A data storage destination device includes: first storage data among a plurality of storage data having a storage unit size generated based on the data; and a first fragment having an overlap detection size larger than the storage unit size. A first hash value corresponding to the data and a second hash value corresponding to at least part of the first stored data are received. Second storage data having a storage unit size generated based on the storage target data by the external device 2 is received. Corresponds to the first fragment data when the second hash value corresponding to at least part of the first stored data matches the third hash value corresponding to at least part of the second stored data. When the first hash value matches the fourth hash value stored in a plurality of devices and corresponding to the second fragment data including the second stored data and corresponding to the second fragment data having the duplicate detection size, The duplication with respect to 2nd fragment data is detected. [Selection] Figure 1

Description

本実施形態は、情報処理システム、情報処理装置、及びプログラムに関する。 The present embodiment relates to an information processing system, an information processing apparatus, and a program.

大規模ストレージに用いられるオブジェクトストレージでは、耐障害性の実現のために消失符号が用いられる。 In an object storage used for large-scale storage, an erasure code is used to realize fault tolerance.

米国特許第７９９２０３７号明細書US Pat. No. 7,992,037 特開２０１０−７９８８６号公報JP 2010-79886 A

Rabin, Michael O. Fingerprinting by random polynomials. Center for Research in Computing Techn., Aiken Computation Laboratory, Univ., 1981.Rabin, Michael O. Fingerprinting by random polynomials.Center for Research in Computing Techn., Aiken Computation Laboratory, Univ., 1981. openstack CLOUD SOFTWARE, “The Rings − swift 2.5.1.dev128 documentation - Open Stack Docs”, [2015年11月18日検索], インターネット<URL：http://docs.openstack.org/developer/swift/overview_ring.html>openstack CLOUD SOFTWARE, “The Rings − swift 2.5.1.dev128 documentation-Open Stack Docs”, [searched November 18, 2015], Internet <URL: http://docs.openstack.org/developer/swift/overview_ring .html>

本実施形態は、データの重複排除を行う情報処理システム、情報処理装置、及びプログラムを提供する。 The present embodiment provides an information processing system, an information processing apparatus, and a program for performing deduplication of data.

本実施形態に係る情報処理システムは、第１の情報処理装置と複数の第２の情報処理装置とを含む。第１の情報処理装置は、生成部と計算部と送信部とを含む。生成部は、データに基づいて保存単位サイズの複数の保存データを生成する。計算部は、データに含まれており保存単位サイズよりも大きい重複検出サイズの第１の断片データに対応する第１のハッシュ値を計算するとともに、第１の断片データに含まれている第１の保存データの少なくとも一部に対応する第２のハッシュ値を計算する。送信部は、第１の保存データと第１のハッシュ値と第２のハッシュ値とを、複数の第２の情報処理装置のうちの保存先装置へ送信する。保存先装置は、受信部と処理部とを含む。受信部は、第１の保存データと第１のハッシュ値と第２のハッシュ値とを受信するとともに、外部の情報処理装置によって保存対象データに基づいて生成された保存単位サイズの第２の保存データを受信する。処理部は、第１の保存データの少なくとも一部に対応する第２のハッシュ値が、受信部によって受信された第２の保存データの少なくとも一部に対応する第３のハッシュ値と一致するか否か判断し、第２のハッシュ値が第３のハッシュ値と一致する場合に、第１の断片データに対応する第１のハッシュ値が、複数の第２の情報処理装置に保存されており第２の保存データを含み重複検出サイズの第２の断片データに対応する第４のハッシュ値と一致するか判断し、第１のハッシュ値が第４のハッシュ値と一致する場合に、第２の断片データに対する重複を検出する。 The information processing system according to the present embodiment includes a first information processing device and a plurality of second information processing devices. The first information processing apparatus includes a generation unit, a calculation unit, and a transmission unit. The generation unit generates a plurality of storage data having a storage unit size based on the data. The calculation unit calculates a first hash value corresponding to the first fragment data included in the data and having a duplicate detection size larger than the storage unit size, and the first hash data included in the first fragment data. A second hash value corresponding to at least a part of the stored data is calculated. The transmission unit transmits the first storage data, the first hash value, and the second hash value to a storage destination device among the plurality of second information processing devices. The storage destination device includes a receiving unit and a processing unit. The receiving unit receives the first storage data, the first hash value, and the second hash value, and stores the second storage of the storage unit size generated based on the storage target data by the external information processing apparatus. Receive data. Whether the processing unit matches the second hash value corresponding to at least part of the first stored data with the third hash value corresponding to at least part of the second stored data received by the receiving unit If the second hash value matches the third hash value, the first hash value corresponding to the first fragment data is stored in the plurality of second information processing devices. It is determined whether the fourth hash value corresponding to the second fragment data of the duplicate detection size including the second stored data matches, and if the first hash value matches the fourth hash value, the second Detect duplicates of fragment data.

第１の実施形態に係る情報処理システムの構成を例示するブロック図。1 is a block diagram illustrating the configuration of an information processing system according to a first embodiment. 第１の実施形態に係る情報処理システムの構成を例示する概念図。1 is a conceptual diagram illustrating the configuration of an information processing system according to a first embodiment. オブジェクトとフラグメントサーバのディスクとの関係の一例を示す概念図。The conceptual diagram which shows an example of the relationship between an object and the disk of a fragment server. ディスク選択インデックスファイルを例示する図。The figure which illustrates a disk selection index file. 重複排除方法の第１乃至第３の方式を例示する図。The figure which illustrates the 1st thru | or 3rd system of the de-duplication method. 通常のハッシュ計算とローリングハッシュ計算との比較結果を例示する図。The figure which illustrates the comparison result of normal hash calculation and rolling hash calculation. フラグメント保存処理を例示するフローチャート。The flowchart which illustrates a fragment preservation | save process. フラグメント保存処理を例示する概念図。The conceptual diagram which illustrates a fragment preservation | save process. オブジェクトの読み出し手順を例示するフローチャート。The flowchart which illustrates the read-out procedure of an object. 重複するフラグメントの排除処理の具体例を示す図。The figure which shows the specific example of the removal process of the overlapping fragment. フィンガープリントセットを例示する図。The figure which illustrates a fingerprint set. フィンガープリントセットの保存形式と探索方法を例示する図。The figure which illustrates the preservation | save format and search method of a fingerprint set. 自由位置での重複排除処理を例示するフローチャート。The flowchart which illustrates the duplication removal process in a free position. 重複排除処理におけるフラグメントの第１の読み出し処理を例示する図。The figure which illustrates the 1st reading process of the fragment in a deduplication process. 重複排除処理におけるフラグメントの第２の読み出し処理を例示する図。The figure which illustrates the 2nd reading process of the fragment in a deduplication process. 自由位置での重複排除処理の第１の例を示す図。The figure which shows the 1st example of the duplication removal process in a free position. 自由位置での重複排除処理の第２の例を示す図。The figure which shows the 2nd example of the duplication removal process in a free position. フロントエンドサーバによるオブジェクトの再配置の第１の例を示す図。The figure which shows the 1st example of the rearrangement of the object by a front end server. フロントエンドサーバによるオブジェクトの再配置の第２の例を示す図。The figure which shows the 2nd example of the rearrangement of the object by a front end server.

以下、図面を参照しながら本発明の実施の形態について説明する。なお、以下の説明において、略又は実質的に同一の機能及び構成要素については、同一符号を付し、必要に応じて説明を行う。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the following description, substantially the same or substantially the same functions and components are denoted by the same reference numerals and will be described as necessary.

［第１の実施形態］
大規模ストレージで用いられるオブジェクトストレージは、保存対象のデータを抽象化されたオブジェクト単位で管理するため、例えばＨＤＤ（Hard Disk Drive）又はＳＳＤ（Solid State Drive）といった保存先のハードウェアの種類、台数の制限、又はファイルシステムの仕様による制限を受けない。このため、オブジェクトストレージは、ハードウェアの入れ替え及び容量の増設などを簡単に行うことができるといった高い拡張性を持ち、大規模なストレージシステムを安価に構築可能である。 [First Embodiment]
The object storage used in large-scale storage manages the data to be saved in an abstracted object unit. For example, the type and number of hardware at the save destination such as HDD (Hard Disk Drive) or SSD (Solid State Drive) Is not limited by the file system specifications or file system specifications. Therefore, the object storage has high expandability such that hardware replacement and capacity expansion can be easily performed, and a large-scale storage system can be constructed at low cost.

本実施形態に係る情報処理システムは、オブジェクトストレージを用い、消失符号（Erasure Coding）を用いて高い耐障害性を実現する。また、本実施形態に係る情報処理システムは、消失符号の信頼性を維持しながらシステム全体で保存されるデータの重複排除を適用し、記憶容量を削減、換言すれば記憶効率を向上させる。 The information processing system according to the present embodiment uses object storage and achieves high fault tolerance using erasure codes. In addition, the information processing system according to the present embodiment applies deduplication of data stored in the entire system while maintaining the reliability of the erasure code, thereby reducing the storage capacity, in other words, improving the storage efficiency.

本実施形態では、複数の物理ディスクに分散されて保存されるオブジェクト及び消失符号の集合を管理しながら重複排除を適用することにより、物理ディスクの故障に対する信頼性を維持する。 In this embodiment, reliability against a physical disk failure is maintained by applying deduplication while managing a set of objects and erasure codes distributed and stored in a plurality of physical disks.

本実施形態に係る情報処理システムは、記憶容量の拡張性を維持するため、ＩＰ（Internet Protocol）接続型ドライブのような高機能なドライブの利用を想定している。しかしながら、ＩＰ接続ではない複数台接続可能なドライブであってもよい。ドライブは、例えば、不揮発性の半導体メモリを含むとしてもよい。不揮発性メモリは、例えばＮＡＮＤ型フラッシュメモリとするが、ＮＯＲ型フラッシュメモリ、ＭＲＡＭ（Magnetoresistive Random Access Memory：磁気抵抗メモリ）、ＰＲＡＭ（Phase change Random Access Memory：相変化メモリ）、ＲｅＲＡＭ（Resistive Random Access Memory：抵抗変化型メモリ）、ＦｅＲＡＭ（Ferroelectric Random Access Memory）など他の不揮発性半導体メモリでもよい。例えば、不揮発性メモリは、不揮発性の半導体メモリではない他の不揮発性メモリ、磁気メモリなどでもよい。例えば、不揮発性メモリは、３次元構造のフラッシュメモリでもよい。例えば、不揮発性メモリは、ディスク（例えばｄｉｓｃ又はｄｉｓｋ）でもよい。 The information processing system according to the present embodiment assumes the use of a highly functional drive such as an IP (Internet Protocol) connection type drive in order to maintain the expandability of the storage capacity. However, it may be a drive that can be connected to a plurality of devices other than the IP connection. The drive may include, for example, a nonvolatile semiconductor memory. The nonvolatile memory is, for example, a NAND flash memory, but a NOR flash memory, an MRAM (Magnetoresistive Random Access Memory), a PRAM (Phase change Random Access Memory), a ReRAM (Resistive Random Access Memory). : Resistance change type memory), FeRAM (Ferroelectric Random Access Memory), and other nonvolatile semiconductor memories. For example, the non-volatile memory may be another non-volatile memory or magnetic memory that is not a non-volatile semiconductor memory. For example, the nonvolatile memory may be a flash memory having a three-dimensional structure. For example, the non-volatile memory may be a disk (for example, a disc or a disk).

本実施形態に係る情報処理システムでは、複数のドライブが重複位置の探索を分担して行う。これにより、データの記憶量の増加に対するフロントエンドサーバの負荷を軽減する。 In the information processing system according to the present embodiment, a plurality of drives share the search for overlapping positions. This reduces the load on the front-end server with respect to an increase in the data storage amount.

本実施形態において、ユーザの使用する端末（又は端末上で動作するアプリケーション）は、例えばＲＥＳＴ（Representational State Transfer）ＡＰＩ（Application Programming Interface）などのような所定の規約を使用してオブジェクトを保存又は取得する。このＡＰＩを担当する情報処理装置を、フロントエンドサーバと呼ぶ。フロントエンドサーバは、オブジェクトの保存要求をユーザの端末又はユーザの使用する端末のアプリケーションから受け取ると，オブジェクトを適切な数の断片（以下、ストライドと呼ぶ）へと分解し、それぞれの断片に消失符号を使用して符号化を行う。ストライドは、重複検出サイズを持つ。 In this embodiment, a terminal used by a user (or an application running on the terminal) stores or acquires an object using a predetermined rule such as REST (Representational State Transfer) API (Application Programming Interface). To do. The information processing apparatus in charge of this API is called a front end server. When the front-end server receives an object storage request from the user's terminal or the application of the terminal used by the user, the front-end server decomposes the object into an appropriate number of fragments (hereinafter referred to as a stride), and erasure codes for each fragment. Encode using. The stride has a duplicate detection size.

ストライドは、符号化によって消失符号適用の最小分割単位（以下、フラグメントと呼ぶ）に分割される。フラグメントは、保存単位サイズを持つ。フロントエンドサーバは、これらのフラグメントを、例えばＫＶＳ（Key-Value Store）ＡＰＩを使用して、最終的なデータの保存先である複数のフラグメントサーバへ保存する。 The stride is divided into minimum division units (hereinafter referred to as fragments) to which erasure codes are applied by encoding. A fragment has a storage unit size. The front-end server stores these fragments in a plurality of fragment servers that are final data storage destinations using, for example, a KVS (Key-Value Store) API.

その後、フラグメントサーバは、保存したフラグメントに対し、重複排除を行う。例えば、消失符号の信頼性を維持するため、重複排除は、ストライド単位で行う。 Thereafter, the fragment server performs deduplication on the stored fragments. For example, deduplication is performed in units of strides in order to maintain the reliability of erasure codes.

なお、ユーザの使用する端末とフロントエンドサーバとの間でデータを送受信する際のＡＰＩ、及びフロントエンドサーバとフラグメントサーバとの間でデータを送受信する際のＡＰＩは、上述のものに限られない。 Note that the API used when transmitting / receiving data between the terminal used by the user and the front-end server and the API used when transmitting / receiving data between the front-end server and the fragment server are not limited to those described above. .

図１は、本実施形態に係る情報処理システムの構成を例示するブロック図である。 FIG. 1 is a block diagram illustrating the configuration of the information processing system according to this embodiment.

情報処理システム１は、１台以上のフロントエンドサーバ２、複数のフラグメントサーバ３を備える。この図１では、フロントエンドサーバ２が１台の場合を説明する。また、複数のフラグメントサーバ３のうちの１台のフラグメントサーバを代表して説明する。 The information processing system 1 includes one or more front-end servers 2 and a plurality of fragment servers 3. In FIG. 1, a case where there is one front-end server 2 will be described. Further, one fragment server among the plurality of fragment servers 3 will be described as a representative.

フロントエンドサーバ２は、オブジェクトから複数のストライドを生成し、さらに複数のストライドのそれぞれから複数のフラグメントを生成し、複数のフラグメントを複数のフラグメントサーバ３へ分散して送信する。 The front-end server 2 generates a plurality of strides from the object, generates a plurality of fragments from each of the plurality of strides, and distributes the plurality of fragments to the plurality of fragment servers 3 for transmission.

複数のフラグメントサーバ３のそれぞれは、自機の記憶部３１に保存されているフラグメントの少なくとも一部と、受信されたフラグメントの少なくとも一部とが一致するか否か判断する。複数のフラグメントサーバ３のそれぞれは、保存されているフラグメントの少なくとも一部と、受信されたフラグメントの少なくとも一部とが一致する場合に、当該保存されているフラグメントを含み複数のフラグメントサーバ３に保存されている１ストライド分のデータと、受信されたフラグメントを含みオブジェクトに含まれている１ストライド分のデータとが一致するか判断する。そして、複数のフラグメントサーバ３のそれぞれは、保存されている１ストライド分のデータと、オブジェクトに含まれている１ストライド分のデータとが一致する場合に、オブジェクトにおける重複位置を示す重複位置情報をフロントエンドサーバ２へ送信する。 Each of the plurality of fragment servers 3 determines whether or not at least a part of the fragments stored in the storage unit 31 of the own machine matches at least a part of the received fragments. Each of the plurality of fragment servers 3 stores the stored fragment in the plurality of fragment servers 3 when at least a part of the stored fragment matches at least a part of the received fragment. It is determined whether or not the data for one stride that is received matches the data for one stride that includes the received fragment and is included in the object. Then, each of the plurality of fragment servers 3 sets the overlapping position information indicating the overlapping position in the object when the stored data for one stride matches the data for one stride included in the object. Transmit to the front-end server 2.

フロントエンドサーバ２は、重複位置情報に基づいて、重複部分が同一のストライドに含まれるように、オブジェクトから複数のストライドを再生成し、さらに再生成された複数のストライドのそれぞれから複数のフラグメントを再生成し、再生成された複数のフラグメントを複数のフラグメントサーバ３へ分散して送信する。 The front end server 2 regenerates a plurality of strides from the object based on the overlapping position information so that the overlapping portion is included in the same stride, and further generates a plurality of fragments from each of the regenerated plurality of strides. The regenerated fragments are transmitted to the plurality of fragment servers 3 in a distributed manner.

複数のフラグメントサーバ３のそれぞれは、再生成され受信されたフラグメントが自機の記憶部３１に保存されているフラグメントと一致する場合に、重複排除を行う。 Each of the plurality of fragment servers 3 performs deduplication when the regenerated and received fragment matches the fragment stored in its own storage unit 31.

フロントエンドサーバ２は、コントローラ２０、記憶部２１を含む。 The front end server 2 includes a controller 20 and a storage unit 21.

コントローラ２０は、送受信部２２、プロセッサ２３、メモリ２４、制御部２５を含む。 The controller 20 includes a transmission / reception unit 22, a processor 23, a memory 24, and a control unit 25.

送受信部２２は、ユーザＵの使用する端末１１（又は端末１１上で動作するアプリケーション）、及びフラグメントサーバ３と、例えばコマンド、アドレス、データ、情報、指示、信号などを送受信する。 The transmission / reception unit 22 transmits / receives, for example, commands, addresses, data, information, instructions, signals, and the like to / from the terminal 11 (or application running on the terminal 11) used by the user U and the fragment server 3.

ユーザＵの使用する端末１１は、例えばコンピュータ、ＰＤＡ（Personal Digital Assistant）、スマートフォン、タブレット型端末などであってもよい。 The terminal 11 used by the user U may be, for example, a computer, a PDA (Personal Digital Assistant), a smartphone, a tablet terminal, or the like.

プロセッサ２３は、送受信部２２及び制御部２５からの指示に基づき、制御処理及び演算処理を実行する。プロセッサ１としては、例えば、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro-Processing Unit）、又は、ＤＳＰ（Digital Signal Processor）などが利用される。 The processor 23 executes control processing and arithmetic processing based on instructions from the transmission / reception unit 22 and the control unit 25. As the processor 1, for example, a central processing unit (CPU), a micro processing unit (MPU), or a digital signal processor (DSP) is used.

メモリ２４は、主記憶装置であり、プロセッサ２３からの制御に従う。メモリ２４には、プロセッサ２３の制御に基づいて、送受信部２２により送受信されるデータ又は制御部２５により生成されたデータなどが一時的に格納される。 The memory 24 is a main storage device and follows control from the processor 23. The memory 24 temporarily stores data transmitted / received by the transmitting / receiving unit 22 or data generated by the control unit 25 based on the control of the processor 23.

制御部２５は、オブジェクト分割部２０１、消失符号計算部２０２、フィンガープリント計算部２０３を含む。 The control unit 25 includes an object dividing unit 201, an erasure code calculating unit 202, and a fingerprint calculating unit 203.

オブジェクト分割部２０１は、ユーザＵの使用する端末１１より送受信部２２を介して受信したオブジェクトを、ストライドに分割する。ストライドのサイズは、システムにより任意に定められる。 The object dividing unit 201 divides an object received from the terminal 11 used by the user U through the transmission / reception unit 22 into strides. The size of the stride is arbitrarily determined by the system.

オブジェクトのサイズがストライドのサイズの整数倍でない場合は、整数倍となるまでオブジェクトに意味を持たないパディングデータ（例えばゼロデータ）を付加、すなわちパディングする。なお、パディングは、オブジェクトを複数のストライドに分割し、この分割されたストライドのうち所定のサイズに満たないストライドに、パディングデータを補充することで実現されてもよい。 If the object size is not an integer multiple of the stride size, padding data (for example, zero data) that has no meaning is added to the object until it becomes an integer multiple, that is, padding is performed. Note that padding may be realized by dividing an object into a plurality of strides and supplementing padding data to strides that are less than a predetermined size among the divided strides.

消失符号計算部２０２は、オブジェクト分割部２０１により得られた各ストライドに対する消失符号を計算する符号化を行う。具体的には、消失符号計算部２０２は、ストライドに基づいて複数のフラグメントを生成する。ストライドに対応する複数のフラグメントは、情報シンボル部と、パリティシンボル部とを含む。消失符号の詳細については、図３で後述する。 The erasure code calculation unit 202 performs encoding that calculates an erasure code for each stride obtained by the object division unit 201. Specifically, the erasure code calculation unit 202 generates a plurality of fragments based on the stride. The plurality of fragments corresponding to the stride include an information symbol part and a parity symbol part. Details of the erasure code will be described later with reference to FIG.

フィンガープリント計算部２０３は、任意のデータに対してフィンガープリントを計算する。計算対象のデータは、例えばストライド単位、フラグメント単位、フラグメントの半分の単位（１／２フラグメント単位と呼ぶ）などであってもよい。フィンガープリントは、例えばハッシュ値であり、データの内容に基づいて一意に定まる値である。すなわち、例えば同一のデータからなるストライドのフィンガープリントは、必ず同一となる。従って、フィンガープリントが同一であれば、同じデータである可能性が高いため、フィンガープリントの比較は、重複排除処理において重複データを探索する場合に有効である。本実施形態では、複数のデータに対応するフィンガープリントを比較することで、複数のデータの一致又は不一致を判断する。 The fingerprint calculation unit 203 calculates a fingerprint for arbitrary data. The calculation target data may be, for example, a stride unit, a fragment unit, a half unit of a fragment (referred to as a 1/2 fragment unit), or the like. The fingerprint is, for example, a hash value, and is a value that is uniquely determined based on the content of data. That is, for example, the stride fingerprints made of the same data are always the same. Therefore, if the fingerprints are the same, there is a high possibility that they are the same data. Therefore, fingerprint comparison is effective when searching for duplicate data in the deduplication process. In this embodiment, by comparing fingerprints corresponding to a plurality of data, it is determined whether or not the plurality of data match.

また、フィンガープリント計算部２０３は、算出したフィンガープリントを記憶部２１に保存する。 In addition, the fingerprint calculation unit 203 stores the calculated fingerprint in the storage unit 21.

記憶部２１は、オブジェクト記憶部２１１と、フィンガープリントセット２１２と、ディスク選択インデックスファイル２１３とを含む。 The storage unit 21 includes an object storage unit 211, a fingerprint set 212, and a disk selection index file 213.

オブジェクト記憶部２１１は、送受信部２２が受信したオブジェクトを保存する。 The object storage unit 211 stores the object received by the transmission / reception unit 22.

フィンガープリントセット２１２は、重複排除処理に用いられる各ストライド又はフラグメントのフィンガープリントの集合である。フィンガープリントセット２１２は、例えば１以上のファイルの集合であってもよい。 The fingerprint set 212 is a collection of fingerprints for each stride or fragment used in the deduplication process. The fingerprint set 212 may be a collection of one or more files, for example.

フィンガープリントセット２１２は、フィンガープリントが一定量記録されると、コントローラ２０の送受信部２２からフラグメントサーバ３に送信される。 The fingerprint set 212 is transmitted from the transmission / reception unit 22 of the controller 20 to the fragment server 3 when a certain amount of fingerprints are recorded.

ディスク選択インデックスファイル２１３は、フラグメントの保存先となるフラグメントサーバの集合を示す。ディスク選択インデックスファイル２１３は、参照ハッシュ値と前記参照ハッシュ値に対応して決定される保存先識別情報とを関係付けている。制御部２５は、フィンガープリント計算部２０３により計算されたフィンガープリントに基づいて定まる値をキーとして、ディスク選択インデックスファイル２１３を参照することにより、各フラグメントの保存先を決定する。フィンガープリントからディスク選択インデックスファイル２１３を参照するためのキー情報の算出は、例えばハッシュ関数などを用いて行われてもよい。 The disk selection index file 213 indicates a set of fragment servers that are fragment storage destinations. The disk selection index file 213 associates a reference hash value with storage destination identification information determined corresponding to the reference hash value. The control unit 25 determines the storage destination of each fragment by referring to the disk selection index file 213 using a value determined based on the fingerprint calculated by the fingerprint calculation unit 203 as a key. The calculation of the key information for referring to the disc selection index file 213 from the fingerprint may be performed using, for example, a hash function.

ディスク選択インデックスファイル２１３を用いたフラグメントの保存先の決定については、図４を用いて後述する。なお、ディスク選択インデックスファイル２１３は、ファイル形式で保存されていなくてもよい。 The determination of the fragment storage destination using the disk selection index file 213 will be described later with reference to FIG. Note that the disk selection index file 213 may not be stored in a file format.

フラグメントサーバ３は、コントローラ３０、記憶部３１を含む。 The fragment server 3 includes a controller 30 and a storage unit 31.

コントローラ３０は、送受信部３２、プロセッサ３３、メモリ３４、制御部３５を含む。 The controller 30 includes a transmission / reception unit 32, a processor 33, a memory 34, and a control unit 35.

送受信部３２は、フロントエンドサーバ２と、例えばコマンド、アドレス、データ、情報、指示、信号などの送受信を行う。 The transmission / reception unit 32 performs transmission / reception of commands, addresses, data, information, instructions, signals, and the like with the front-end server 2.

プロセッサ３３は、送受信部３２及び制御部３５からの指示に基づき、制御処理及び演算処理を実行する。プロセッサ３３としては、プロセッサ２３と同様に、例えばＣＰＵ、ＭＰＵ、又は、ＤＳＰなどが利用される。 The processor 33 executes control processing and arithmetic processing based on instructions from the transmission / reception unit 32 and the control unit 35. As the processor 33, for example, a CPU, MPU, DSP, or the like is used as in the processor 23.

メモリ３４は、主記憶装置であり、プロセッサ３３からの制御に従う。メモリ３４には、プロセッサ３３の制御に基づいて、送受信部３２により送受信されるデータ又は制御部３５により生成されたデータなどが一時的に格納される。 The memory 34 is a main storage device and follows control from the processor 33. The memory 34 temporarily stores data transmitted / received by the transmission / reception unit 32 or data generated by the control unit 35 based on the control of the processor 33.

制御部３５は、重複排除処理部３０１を含む。 The control unit 35 includes a deduplication processing unit 301.

重複排除処理部３０１は、フィンガープリントセット３１１を用い、フラグメント記憶部３１２に保存されたフラグメントのうち重複する部分を探索し、重複排除を行う。 The deduplication processing unit 301 uses the fingerprint set 311 to search for duplicate portions of the fragments stored in the fragment storage unit 312 and performs deduplication.

また、重複排除はストライド単位で行われるため、フロントエンドサーバ２は、例えば重複部分の起点がストライドの途中に存在する場合は、ストライドの起点と重複部分の起点を一致させるために、重複部分の起点でストライドを分割することによりストライドを再配置する。このように、重複排除のためにストライドを再配置する必要が生じた場合は、重複排除処理部３０１は、ストライドの再配置の指示をフロントエンドサーバ２に送信する。重複排除処理の詳細については後述する。 In addition, since deduplication is performed in units of strides, the front-end server 2, for example, if the starting point of the overlapping part exists in the middle of the stride, in order to make the starting point of the overlapping part coincide with the starting point of the overlapping part, Rearrange the stride by dividing the stride at the starting point. As described above, when it becomes necessary to rearrange the strides for deduplication, the deduplication processing unit 301 transmits a stride rearrangement instruction to the front-end server 2. Details of the deduplication processing will be described later.

記憶部３１は、フィンガープリントセット３１１、フラグメント記憶部３１２、重複位置情報３１３を含む。以下、本実施形態では、記憶部３１をディスクとする。しかしながら、記憶部３１をディスクとすることは、記憶部３１がディスクに限定されることを意味しない。記憶部３１としては様々な不揮発性メモリを用いることができる。 The storage unit 31 includes a fingerprint set 311, a fragment storage unit 312, and overlapping position information 313. Hereinafter, in the present embodiment, the storage unit 31 is a disk. However, using the storage unit 31 as a disk does not mean that the storage unit 31 is limited to a disk. Various nonvolatile memories can be used as the storage unit 31.

フィンガープリントセット３１１は、フロントエンドサーバ２より、フラグメントサーバ３のコントローラ３０の送受信部３２を経由して、記憶部３１に受信され、記憶部３１に保存される。フィンガープリントセット３１１は、重複排除の際に参照される。 The fingerprint set 311 is received by the storage unit 31 from the front end server 2 via the transmission / reception unit 32 of the controller 30 of the fragment server 3 and stored in the storage unit 31. The fingerprint set 311 is referred to when deduplication is performed.

フラグメント記憶部３１２は、フロントエンドサーバ２より、フラグメントサーバ３のコントローラ３０の送受信部３２を経由して、記憶部３１に受信されたフラグメントを保存する。 The fragment storage unit 312 stores the fragment received from the front-end server 2 in the storage unit 31 via the transmission / reception unit 32 of the controller 30 of the fragment server 3.

重複位置情報３１３は、保存対象のオブジェクトにおける重複データの存在を示す情報、例えば重複データが存在する場合に、フロントエンドサーバ２がストライド再配置を実行する際に参照する重複データの位置を含む。より具体的には、重複位置情報３１３は、重複排除処理部３０１が重複検出を行った結果、１ストライド分の重複データがあると判断された重複データの起点を含む。制御部３５は、重複位置情報３１３が一定量保存されると、送受信部３２を通じてフロントエンドサーバ２へ送信する。 The duplicate position information 313 includes information indicating the existence of duplicate data in the object to be saved, for example, the position of duplicate data that is referenced when the front-end server 2 executes stride rearrangement when duplicate data exists. More specifically, the duplication position information 313 includes the origin of duplicate data that is determined to have duplicate data for one stride as a result of duplication detection performed by the duplicate elimination processing unit 301. When a certain amount of the overlapping position information 313 is stored, the control unit 35 transmits the information to the front end server 2 through the transmission / reception unit 32.

重複排除処理の詳細については、後述する。 Details of the deduplication processing will be described later.

図２は、本実施形態に係る情報処理システム１の構成を例示する概念図である。 FIG. 2 is a conceptual diagram illustrating the configuration of the information processing system 1 according to this embodiment.

情報処理システム１は、複数のフロントエンドサーバ２と複数のフラグメントサーバ３とを含む。フロントエンドサーバ２を増設することにより、ユーザＵの使用する端末１１及びフラグメントサーバ３から受信したデータの処理能力が向上する、すなわち情報処理システム１全体としての処理能力が向上する。 The information processing system 1 includes a plurality of front end servers 2 and a plurality of fragment servers 3. By adding the front-end server 2, the processing capability of data received from the terminal 11 and the fragment server 3 used by the user U is improved, that is, the processing capability of the information processing system 1 as a whole is improved.

ユーザＵの端末１１又は端末１１のアプリケーションは、複数のフロントエンドサーバ２のうちのいずれかのフロントエンドサーバに例えばＲＥＳＴＡＰＩに基づいてオブジェクトＯＢを送る。オブジェクトは、例えばファイルでもよい。 The terminal 11 of the user U or the application of the terminal 11 sends the object OB to any one of the plurality of front end servers 2 based on, for example, the REST API. The object may be a file, for example.

本実施形態において、フラグメントサーバ３は、消失符号化後にストライドから生成されるフラグメント数以上の台数で構成される。また、フラグメントサーバ３を増設することにより、オブジェクトＯＢの最終的な保存先が増える、すなわち情報処理システム１全体としての保存容量が増加する。 In the present embodiment, the fragment server 3 is configured with a number equal to or greater than the number of fragments generated from the stride after erasure coding. Further, by adding the fragment server 3, the final storage destination of the object OB increases, that is, the storage capacity of the information processing system 1 as a whole increases.

まず、フロントエンドサーバ２は、オブジェクトＯＢを複数のストライドＳＴに分割する。次に、フロントエンドサーバ２は、各ストライドＳＴに対して消失符号による符号化を行い、複数のフラグメントを生成する。複数のフラグメントは、情報シンボル部ＩＳとパリティシンボル部ＰＳとを含む。この図２では、１つのストライドから、４つのフラグメントを含む情報シンボル部ＩＳと、２つのフラグメントを含むパリティシンボル部ＰＳとが生成される。情報シンボル部ＩＳに含まれるフラグメントの数と、パリティシンボル部ＰＳに含まれるフラグメントの数とは、１以上で変更可能である。 First, the front end server 2 divides the object OB into a plurality of strides ST. Next, the front-end server 2 encodes each stride ST with an erasure code to generate a plurality of fragments. The plurality of fragments include an information symbol part IS and a parity symbol part PS. In FIG. 2, an information symbol part IS including four fragments and a parity symbol part PS including two fragments are generated from one stride. The number of fragments included in the information symbol part IS and the number of fragments included in the parity symbol part PS can be changed by one or more.

フロントエンドサーバ２は、生成された各フラグメントの保存先のフラグメントサーバ３を、ディスク選択インデックスファイル２１３に基づいて選択し、選択結果に応じて各フラグメントを例えばＫＶＳＡＰＩに基づいて保存先のフラグメントサーバ３に送る。 The front-end server 2 selects the fragment server 3 that stores the generated fragments based on the disk selection index file 213, and selects each fragment according to the selection result based on, for example, the KVS API. Send to 3.

図３は、オブジェクトＯＢとフラグメントサーバ３のディスク３１との関係の一例を示す概念図である。図３では、オブジェクトＯＢから複数のストライドＳＴへの分割は省略している。 FIG. 3 is a conceptual diagram showing an example of the relationship between the object OB and the disk 31 of the fragment server 3. In FIG. 3, the division from the object OB into a plurality of strides ST is omitted.

オブジェクトＯＢは、分割され、消失符号を用いて符号化される。符号化された結果、ｋ個の情報シンボル部ＩＳと、ｍ個のパリティシンボル部ＰＳのフラグメントが生成される。 The object OB is divided and encoded using an erasure code. As a result of encoding, fragments of k information symbol parts IS and m parity symbol parts PS are generated.

個々のフラグメントは、ｋ＋ｍ個の異なるフラグメントサーバ３のディスク３１へ保存される。このようにして保存されたデータは、ｍ個までの任意のフラグメントを喪失しても元データであるオブジェクトＯＢを復元できる。 Individual fragments are stored in the disks 31 of k + m different fragment servers 3. The data stored in this way can restore the original object OB even if up to m arbitrary fragments are lost.

この図３は、ｋ＝６、ｍ＝３とした場合を例示している。この場合は、フラグメントをそれぞれ９台以上の異なるフラグメントサーバ３のディスク３１へ保存する。そして、任意の３個のフラグメントの喪失、例えば３台のフラグメントサーバ３の故障まで許容される。 FIG. 3 illustrates the case where k = 6 and m = 3. In this case, each fragment is stored in the disks 31 of nine or more different fragment servers 3. Any loss of three fragments, for example, failure of three fragment servers 3 is allowed.

なお、消失符号化後の保存容量の増加率Ｒ（％）は、Ｒ＝（ｍ／ｋ）×１００で求められる。図３の例では、保存容量の増加率Ｒは５０％である。 The increase rate R (%) of the storage capacity after erasure coding is obtained by R = (m / k) × 100. In the example of FIG. 3, the increase rate R of the storage capacity is 50%.

図４は、ディスク選択インデックスファイル２１３を例示する図である。 FIG. 4 is a diagram illustrating the disk selection index file 213.

ディスク選択インデックスファイル２１３は、フラグメントの保存先となるフラグメントサーバ３の集合を示す。各フラグメントサーバ３のディスク３１には、各ディスク３１を識別する固有のＩＤ（以下、ディスクＩＤとする）が付与される。フラグメントの保存先となるフラグメントサーバ３の集合は、ディスクＩＤの集合で表され、その集合ごとに固有のＩＤ（以下、ディスクセットＩＤとする）が付与される。 The disk selection index file 213 indicates a set of fragment servers 3 that are fragment storage destinations. Each disk 31 of each fragment server 3 is given a unique ID for identifying each disk 31 (hereinafter referred to as a disk ID). A set of fragment servers 3 serving as a fragment storage destination is represented by a set of disk IDs, and a unique ID (hereinafter referred to as a disk set ID) is assigned to each set.

ディスク選択インデックスファイル２１３は、フラグメントサーバ３の追加又は削除の際に事前に計算される。各フロントエンドサーバ２は、ディスク選択インデックスファイル２１３を保持する。 The disk selection index file 213 is calculated in advance when the fragment server 3 is added or deleted. Each front-end server 2 holds a disk selection index file 213.

例えば、ディスク選択インデックスファイル２１３におけるそのディスクＩＤの出現確率は各フラグメントサーバ３のデータ記憶容量に比例する。例えばディスクＩＤ＝ｘのフラグメントサーバ３のデータ記憶容量が１ＴＢ、ディスクＩＤ＝ｙのフラグメントサーバ３のデータ記憶容量が２ＴＢである場合、ｙの出現確率はｘの２倍となる。また、ディスク選択インデックスファイル２１３において、各ディスクＩＤは、この出現確率を満たした上でランダムに分布している。ただし、各フラグメントは必ず異なるディスクへ保存されるため、同一の列には必ず異なるディスクＩＤが現れ、同じディスクＩＤは同じ列に２以上含まれない。 For example, the appearance probability of the disk ID in the disk selection index file 213 is proportional to the data storage capacity of each fragment server 3. For example, when the data storage capacity of the fragment server 3 with disk ID = x is 1 TB and the data storage capacity of the fragment server 3 with disk ID = y is 2 TB, the appearance probability of y is twice that of x. In the disk selection index file 213, each disk ID is randomly distributed while satisfying the appearance probability. However, since each fragment is always stored in a different disk, different disk IDs always appear in the same column, and two or more same disk IDs are not included in the same column.

ディスク選択インデックスファイル２１３をあるディスクセットＩＤ（ｚとする）に基づいて参照する、といった場合、ディスク選択インデックスファイル２１３のｚ列の集合値が返却される。この集合をディスクセットと呼ぶ。前述のとおり、ディスクセットには複数のディスクＩＤが含まれるが、１つのディスクセット上でディスクＩＤが重複することはない。例えば、ディスクセットＩＤ＝３の保存先は[０,３,８,６,７,４, ... ]となる。 When referring to the disk selection index file 213 based on a certain disk set ID (z), the set value of the z column of the disk selection index file 213 is returned. This set is called a disk set. As described above, a disk set includes a plurality of disk IDs, but disk IDs do not overlap on one disk set. For example, the storage destination of the disk set ID = 3 is [0, 3, 8, 6, 7, 4,.

フロントエンドサーバ２は、保存したいデータ（具体的にはあるストライドから生成されたフラグメントの集合）に対してディスクセットＩＤを決定し、決定されたディスクセットＩＤに基づいてディスク選択インデックスファイル２１３を参照し、ディスクセットＩＤに対応するディスクＩＤに従ってデータを保存する。これにより、各フラグメントを必ず異なるフラグメントサーバ３へ保存できる。また、ディスクＩＤの出現確率はフラグメントサーバ３のデータ記憶容量に比例している。このため、データ記憶容量の異なるフラグメントサーバ３があっても、フラグメントサーバ３のデータ記憶容量に従ってフラグメントが適度な分布で保存される。 The front-end server 2 determines a disk set ID for data to be stored (specifically, a set of fragments generated from a certain stride), and refers to the disk selection index file 213 based on the determined disk set ID. Then, the data is stored according to the disk ID corresponding to the disk set ID. Thereby, each fragment can always be stored in a different fragment server 3. The appearance probability of the disk ID is proportional to the data storage capacity of the fragment server 3. For this reason, even if there are fragment servers 3 having different data storage capacities, the fragments are stored in an appropriate distribution according to the data storage capacities of the fragment servers 3.

図５は、重複排除方法の第１乃至第３の方式を例示する図である。同じ符号を付したデータ内容は、同じであるとする。 FIG. 5 is a diagram illustrating the first to third methods of the deduplication method. It is assumed that data contents with the same reference numerals are the same.

オブジェクトＯ１とオブジェクトＯ２は、先頭のデータ内容Ａ、及び中間のデータ内容Ｂは同一であるものの、末尾のデータ内容がＣとＺとで異なっている。オブジェクトＯ３は、オブジェクトＯ２に対して、先頭にデータ内容Ｙが挿入され、またデータ内容Ａとデータ内容Ｂの間にＸが挿入され、データ内容Ｚが削除された点で異なっている。 Object O1 and object O2 have the same data content A at the beginning and data content B in the middle, but the data content at the end differs between C and Z. Object O3 differs from object O2 in that data content Y is inserted at the beginning, X is inserted between data content A and data content B, and data content Z is deleted.

オブジェクトストレージにおける重複排除には、例えば第１乃至第３の方式が考えられる。 For example, the first to third methods can be considered for deduplication in the object storage.

第１の方式は、オブジェクト（例えばファイル）単位での重複排除である。しかしながら、第１の方式はある２つのオブジェクトにおいて、例えば１ビットの差異があっただけで重複排除を行うことができなくなり、重複排除の効率は高くない。 The first method is deduplication in units of objects (for example, files). However, in the first method, for example, deduplication cannot be performed with only one bit difference between two objects, and deduplication efficiency is not high.

図５では、オブジェクトＯ１とオブジェクトＯ２は、重複するデータ内容Ａ，Ｂを含むが、互いに重複しないデータ内容Ｃ，Ｚを含むため、不一致である。また、オブジェクトＯ２とオブジェクトＯ３は、重複するデータ内容Ａ，Ｂを含むが、オブジェクトＯ２はオブジェクトＯ３が含まないデータ内容Ｚを含み、オブジェクトＯ３はオブジェクトＯ２が含まないデータ内容Ｙ，Ｘを含むため、不一致である。したがって、第１の方式において、オブジェクトＯ１〜Ｏ３の間で重複排除は実行されない。 In FIG. 5, the object O1 and the object O2 include overlapping data contents A and B, but do not match because they include data contents C and Z that do not overlap each other. The objects O2 and O3 include overlapping data contents A and B, but the object O2 includes data contents Z that does not include the object O3, and the object O3 includes data contents Y and X that do not include the object O2. , Disagreement. Therefore, in the first method, deduplication is not executed between the objects O1 to O3.

第２の方式は、重複排除を行う固定長でオブジェクトを分割し、その固定長単位で重複排除を行う方式である（以下、固定位置による重複排除と呼ぶ）。この第２の方式では、複数のオブジェクトにおける同じ位置に現れる固定長単位の重複は排除できる。しかしながら、挿入などにより同じデータ内容が複数のオブジェクトの異なる位置にある場合にはこの同じデータ内容の重複を排除することはできない。例えば、第２の方式では、前述の１ビットの差異がある部分には重複排除が適用されないが、同じ位置に固定長単位で同じデータ内容があれば、この同じ位置で同じデータ内容の重複排除が行われる。図５の第２の方式では、オブジェクトＯ１とオブジェクトＯ２との間で、データ内容Ａとデータ内容Ｂとは、位置と値が一致するため重複排除可能である。オブジェクトＯ２とオブジェクトＯ３との間で、データ内容Ａ，Ｂは同じ値であるが、位置が異なる。このため、オブジェクトＯ２とオブジェクトＯ３との間で、固定長単位でデータ内容は不一致となり、重複排除されない。第２の方式は、第１の方式に対してより多くの重複排除が行える。しかしながら、この第２の方式は、例えばあるファイルの先頭に１バイトのデータが挿入された場合に、重複排除されない。 The second method is a method in which an object is divided by a fixed length for performing deduplication, and deduplication is performed in units of the fixed length (hereinafter referred to as deduplication by a fixed position). In the second method, it is possible to eliminate duplication of fixed length units appearing at the same position in a plurality of objects. However, when the same data content is at different positions of a plurality of objects due to insertion or the like, the duplication of the same data content cannot be eliminated. For example, in the second method, deduplication is not applied to the portion having the one-bit difference described above. However, if there is the same data content in a fixed length unit at the same position, deduplication of the same data content at this same position is performed. Is done. In the second method of FIG. 5, the data content A and the data content B between the object O1 and the object O2 can be de-duplicated because their positions and values match. The data contents A and B are the same value between the object O2 and the object O3, but the positions are different. For this reason, the data contents are inconsistent in fixed length units between the object O2 and the object O3, and deduplication is not performed. The second method can perform more deduplication than the first method. However, in the second method, for example, when 1 byte data is inserted at the head of a certain file, deduplication is not performed.

第３の方式は、重複部分の検出位置を固定せず、自由位置で重複排除を行う方式である。（以下、自由位置による重複排除と呼ぶ）。この第３の方式では、たとえオブジェクトにおける位置が異なっていても固定長単位でデータ内容が同じであれば重複する部分を排除できる。図５の第３の方式では、オブジェクトＯ１とオブジェクトＯ２との間で、データ内容Ａ及びデータ内容Ｂの重複排除が可能である。オブジェクトＯ２とオブジェクトＯ３との間で、位置が異なるが一致するデータ内容Ａ，Ｂの重複排除が可能である。第３の方式では、例えば前述のように１バイトのデータが挿入された場合でも、その１バイトのデータのみが重複排除の対象にならず、残りの部分は重複排除の対象となり、高い重複排除効果が得られる。 The third method is a method of performing deduplication at a free position without fixing the detection position of the overlapping portion. (Hereinafter referred to as de-duplication by free positions). In the third method, even if the positions in the object are different, overlapping portions can be eliminated if the data contents are the same in a fixed length unit. In the third method of FIG. 5, deduplication of the data content A and the data content B is possible between the object O1 and the object O2. It is possible to deduplicate the data contents A and B that match at different positions between the object O2 and the object O3. In the third method, for example, even when 1-byte data is inserted as described above, only the 1-byte data is not subject to deduplication, and the remaining portion is subject to deduplication, resulting in high deduplication. An effect is obtained.

第３の方式では、オブジェクト間で一致か否かを判断する単位である固定長にも依存するが、ファイル全体に対する重複排除である第１の方式と比較して１５％から２０％程度のデータ量を削減可能である。 In the third method, depending on the fixed length, which is a unit for determining whether or not the objects match, the data is about 15% to 20% compared to the first method, which is deduplication for the entire file. The amount can be reduced.

本実施形態に係る情報処理システム１では、オブジェクト単位で重複排除を行う第１の方式、固定長で同じデータ内容がオブジェクトにおける同じ位置に配置されている場合に重複排除を行う第２の方式、オブジェクトにおける位置が同じであっても異なる場合であっても固定長で同じデータ内容を重複排除する第３の方式を併用可能である。 In the information processing system 1 according to the present embodiment, a first method for performing deduplication in units of objects, a second method for performing deduplication when the same data content is arranged at the same position in an object, Even if the positions in the object are the same or different, the third method of deduplicating the same data content with a fixed length can be used in combination.

なお、以下では重複を排除するために一致か否かを判断する固定長（重複排除単位）をストライド長と呼ぶ。図５において、データ内容Ａ，Ｂ，Ｃ，Ｚのバイト長は、ストライド長である。 Hereinafter, a fixed length (duplication elimination unit) for determining whether or not they match in order to eliminate duplication will be referred to as stride length. In FIG. 5, the byte lengths of the data contents A, B, C, and Z are stride lengths.

図６は、通常のハッシュ計算とローリングハッシュ計算との比較結果を例示する図である。 FIG. 6 is a diagram illustrating a comparison result between normal hash calculation and rolling hash calculation.

本実施形態において、例えば、フィンガープリントの計算には、ローリングハッシュと呼ばれる方法を用いる。ローリングハッシュは、通常のハッシュ計算と比べて、計算量を低減することができる。しかしながら、ローリングハッシュに代えて、データ内容が一致しているか否か判断可能な他の方法を用いてもよい。 In the present embodiment, for example, a method called rolling hash is used for calculating the fingerprint. A rolling hash can reduce the amount of calculation compared with a normal hash calculation. However, instead of the rolling hash, another method that can determine whether or not the data contents match may be used.

通常のハッシュ関数Ｈで固定位置の重複排除を行う場合は、ストライド毎のフィンガープリントを計算すればよく、例えばハッシュ関数Ｈにストライドの文字列を引数として渡し、計算させる。この場合の計算量は、文字列の長さをｎ（ｎは整数）とすると、Ｏ（ｎ）である。 When deduplication of fixed positions is performed using the normal hash function H, the fingerprint for each stride may be calculated. For example, the character string of the stride is passed to the hash function H as an argument to be calculated. The amount of calculation in this case is O (n), where n is the length of the character string (n is an integer).

しかしながら、通常のハッシュ関数Ｈで自由位置の重複排除を行う際には、重複排除を行う文字列の箇所を、例えば１バイト（１文字）ずつずらしながら探索を行うため、文字列長ｎの全ての位置でハッシュ関数Ｈにより計算を行うと、計算量はＯ（ｎ^２）となる。 However, when performing deduplication at a free position with the normal hash function H, the character string length n is all searched because the search is performed while shifting the position of the character string to be deduplicated, for example, by 1 byte (one character) at a time. The calculation amount is O (n ² ) when the calculation is performed with the hash function H at the position.

これに対し、ローリングハッシュ関数Ｒｕは、ある長さＬについてのハッシュ値（フィンガープリント）に対して、例えば１バイト付加した値及び１バイト削除した値を計算することができる関数である。このローリングハッシュ関数Ｒｕを用いることにより、最初の文字列のハッシュ値の計算より後の１回あたりのハッシュ値の計算量をＯ（１）に低減することができる。 On the other hand, the rolling hash function Ru is a function that can calculate, for example, a value obtained by adding 1 byte and a value obtained by deleting 1 byte with respect to a hash value (fingerprint) for a certain length L. By using this rolling hash function Ru, it is possible to reduce the amount of calculation of the hash value per time after the calculation of the hash value of the first character string to O (1).

図６では、例えば、文字列長ｎの文字列「ＡＢＣＤＥＦＧＨ…」に対し、Ｌ＝５の文字列のフィンガープリントを計算する場合を例示する。 For example, FIG. 6 illustrates a case where the fingerprint of a character string of L = 5 is calculated for a character string “ABCDEFGH...” Having a character string length n.

図６において、Ｓｔを内部状態、Ｃｔをハッシュ値とする（ｔは整数）。通常のハッシュ関数Ｈ、ローリングハッシュ関数Ｒｕは、それぞれハッシュ計算結果として、（Ｓｔ，Ｃｔ）の組み合わせを出力する。 In FIG. 6, St is an internal state and Ct is a hash value (t is an integer). The normal hash function H and the rolling hash function Ru each output a combination of (St, Ct) as a hash calculation result.

通常のハッシュ関数Ｈを用いる場合、最初の文字列「ＡＢＣＤＥ」に対する計算量はＯ（ｎ）である。１文字ずらして得られる全ての文字列の組み合わせで同じ計算量、すなわちＯ（ｎ）が必要であることから、全体で近似的にＯ（ｎ^２）の計算量が必要となる。 When the normal hash function H is used, the calculation amount for the first character string “ABCDE” is O (n). Since the same calculation amount, that is, O (n) is required for all combinations of character strings obtained by shifting by one character, the calculation amount of O (n ² ) is approximately required as a whole.

これに対し、ローリングハッシュ関数Ｒｕを用いる場合は、最初の文字列「ＡＢＣＤＥ」に対しては、通常のハッシュ関数Ｈを用いるため、計算量はＯ（ｎ）である。しかしながら、次の文字列「ＢＣＤＥＦ」に対しては、ローリングハッシュ関数Ｒｕに対して、最初の計算で得られた内部状態Ｓ０と、最初の文字列に対しＡを除く旨の引数（-”Ａ”）、及びＦを追加する旨の引数（+”Ｆ”）を与えることにより、計算Ｏ（１）で計算可能である。残りの文字列に対しても同様にＯ（１）の計算量となるため、全体では近似的にＯ（ｎ）となり、ハッシュ関数Ｈのみを用いた場合に比べ、計算量を大幅に低減することができる。 On the other hand, when the rolling hash function Ru is used, since the normal hash function H is used for the first character string “ABCDE”, the calculation amount is O (n). However, for the next character string “BCDEF”, for the rolling hash function Ru, the internal state S0 obtained by the first calculation and the argument (− “A” for excluding A from the first character string) ") And an argument (+" F ") for adding F can be given by calculation O (1). Since the calculation amount of O (1) is similarly applied to the remaining character strings, the overall calculation is approximately O (n), and the calculation amount is greatly reduced as compared with the case where only the hash function H is used. be able to.

なお、上記で示した全体での計算量は、文字列の長さｎが文字列Ｌの長さに対して十分に長い場合の収束値である。 The overall calculation amount shown above is a convergence value when the length n of the character string is sufficiently longer than the length of the character string L.

以下、図７及び図８を用いて、具体的にフラグメントの保存処理を説明する。 Hereinafter, the fragment storing process will be specifically described with reference to FIGS. 7 and 8.

図７は、フラグメント保存処理を例示するフローチャートである。 FIG. 7 is a flowchart illustrating fragment storage processing.

図８は、フラグメント保存処理を例示する概念図である。図８に付した番号は、図７の各ステップに付した番号と対応する。 FIG. 8 is a conceptual diagram illustrating fragment storage processing. The numbers given in FIG. 8 correspond to the numbers given to the steps in FIG.

まず、保存する各オブジェクトには、ユーザＵ又は端末１１のアプリケーションが識別子を与える。これをオブジェクトＩＤと呼ぶ。オブジェクトＩＤは、例えば従来のファイルシステムにおけるファイル名の役割を持つ。 First, the user U or the application of the terminal 11 gives an identifier to each object to be stored. This is called an object ID. The object ID has a role of a file name in a conventional file system, for example.

ステップＳ７０１において、フロントエンドサーバ２は、ユーザＵの端末１１又は端末１１のアプリケーションからオブジェクトＯＢの保存要求を受けると、送受信部２２を経由して、オブジェクト記憶部２１１にオブジェクトＯＢを格納する。 In step S 701, when the front-end server 2 receives a request to save the object OB from the terminal 11 of the user U or the application of the terminal 11, the front-end server 2 stores the object OB in the object storage unit 211 via the transmission / reception unit 22.

なお、送受信部２２を経由して受信したオブジェクトＯＢが直接メモリ２４に格納される場合は、オブジェクトＯＢはオブジェクト記憶部２１１に保存されなくてもよい。 When the object OB received via the transmission / reception unit 22 is directly stored in the memory 24, the object OB does not have to be stored in the object storage unit 211.

ステップＳ７０２において、オブジェクト分割部２０１は、オブジェクトＯＢをメモリ２４に読み出し、ストライドＳＴに分割する。この際、オブジェクトＯＢのサイズがストライドＳＴのサイズの整数倍でない場合は、オブジェクトＯＢに対しストライドＳＴの整数倍と等しいサイズまでパディングする。図８（Ｓ７０２）の例では、オブジェクト分割部２０１は、オブジェクトＯＢのサイズがストライドのサイズの３倍となるようにパディングが行われ、ストライドＡ，Ｂ，Ｃに分割される。 In step S702, the object dividing unit 201 reads the object OB into the memory 24 and divides it into strides ST. At this time, if the size of the object OB is not an integral multiple of the size of the stride ST, the object OB is padded to a size equal to the integral multiple of the stride ST. In the example of FIG. 8 (S702), the object dividing unit 201 performs padding so that the size of the object OB is three times the size of the stride, and is divided into strides A, B, and C.

ステップＳ７０３において、消失符号計算部２０２は、各ストライドＡ，Ｂ，Ｃに対して消失符号による符号化を行い、情報シンボル部ＩＳとパリティシンボル部ＰＳに含まれるそれぞれのフラグメントを生成する。図８（Ｓ７０３）の例では、消失符号のパラメータはｋ＝４、ｍ＝２であるとする。消失符号計算部２０２がストライドＡに対して符号化を行った結果、情報シンボル部ＩＳとして４つのフラグメントＡ１，Ａ２，Ａ３，Ａ４、パリティシンボル部ＰＳとして２つのフラグメントＡ５，Ａ６が生成される。ストライドＢ及びＣについても、情報シンボル部ＩＳとして４つのフラグメントが生成され、パリティシンボル部ＰＳとして２つのフラグメントが生成される。 In step S703, the erasure code calculation unit 202 encodes each of the strides A, B, and C using an erasure code to generate respective fragments included in the information symbol part IS and the parity symbol part PS. In the example of FIG. 8 (S703), it is assumed that the parameters of the erasure code are k = 4 and m = 2. As a result of the erasure code calculation unit 202 encoding the stride A, four fragments A1, A2, A3 and A4 are generated as the information symbol part IS, and two fragments A5 and A6 are generated as the parity symbol part PS. Also for the strides B and C, four fragments are generated as the information symbol part IS, and two fragments are generated as the parity symbol part PS.

ステップＳ７０４において、フィンガープリント計算部２０３は、各ストライドＡ，Ｂ，Ｃに対してローリングハッシュを用いてフィンガープリントを計算する。計算されたフィンガープリントは、フィンガープリントセット２１２に格納される。図８（Ｓ７０４）の例では、フロントエンドサーバ２が各ストライドＡ，Ｂ，Ｃに対してローリングハッシュを用いてフィンガープリントを計算した結果、それぞれフィンガープリントＦＰ＿Ａ，ＦＰ＿Ｂ，ＦＰ＿Ｃが生成される。 In step S 704, the fingerprint calculation unit 203 calculates a fingerprint for each stride A, B, and C using a rolling hash. The calculated fingerprint is stored in the fingerprint set 212. In the example of FIG. 8 (S704), as a result of the front end server 2 calculating the fingerprint for each stride A, B, and C using the rolling hash, the fingerprints FP_A, FP_B, and FP_C are generated.

ステップＳ７０５において、制御部２５は、各ストライドＡ，Ｂ，Ｃに対するフィンガープリントＦＰ＿Ａ，ＦＰ＿Ｂ，ＦＰ＿Ｃを用いてディスク選択インデックスファイル２１３を参照し、各フラグメントの保存先のフラグメントサーバ３を決定する。各ストライドＡ，Ｂ，Ｃに対するフィンガープリントＦＰ＿Ａ，ＦＰ＿Ｂ，ＦＰ＿Ｃにより保存先のフラグメントサーバ３が決まるため、同一の内容を持つストライドＳＴは必ず同一のフラグメントサーバ３に保存される。図８（Ｓ７０５）の例では、ＦＰ＿Ａの値よりディスクセットＩＤ＝２が決定され、図４に示されるディスク選択インデックスファイル２１３を参照することにより、ストライドＡに属するフラグメントの保存先の配列が[６,１,０,９,３,７]と決定される。配列[６,１,０,９,３,７]のそれぞれの要素は、フラグメントＡ１〜Ａ６に対応する保存先のフラグメントサーバ３の番号を示す。ストライドＢ及びＣについても同様にフラグメントの保存先が決定される。 In step S705, the control unit 25 refers to the disk selection index file 213 using the fingerprints FP_A, FP_B, and FP_C for the strides A, B, and C, and determines the fragment server 3 that is the storage destination of each fragment. Since the storage destination fragment server 3 is determined by the fingerprints FP_A, FP_B, and FP_C for each of the strides A, B, and C, the stride ST having the same contents is always stored in the same fragment server 3. In the example of FIG. 8 (S705), the disk set ID = 2 is determined from the value of FP_A, and by referring to the disk selection index file 213 shown in FIG. 6, 1, 0, 9, 3, 7]. Each element of the array [6, 1, 0, 9, 3, 7] indicates the number of the storage destination fragment server 3 corresponding to the fragments A1 to A6. Similarly, for the strides B and C, the storage destination of the fragment is determined.

ステップＳ７０６において、制御部２５は、送受信部２２を通じて、各フラグメントを保存先のフラグメントサーバ３に送信する。図８（Ｓ７０６）の例において、ステップＳ７０５で参照されたディスク選択インデックスファイル２１３の内容に沿って、各フラグメントがフラグメントサーバ３へ保存される。例えば、ストライドＡに属するフラグメントＡ１〜Ａ６であれば、フラグメントＡ１がフラグメントサーバＫ６に、フラグメントＡ２がフラグメントサーバＫ１に、フラグメントＡ３がフラグメントサーバＫ０に、フラグメントＡ４がフラグメントサーバＫ９に、フラグメントＡ５がフラグメントサーバＫ３に、フラグメントＡ６がフラグメントサーバＫ７に保存される。ストライドＢ及びＣについても同様に各フラグメントが保存先のフラグメントサーバ３へ保存される。 In step S 706, the control unit 25 transmits each fragment to the storage destination fragment server 3 through the transmission / reception unit 22. In the example of FIG. 8 (S706), each fragment is stored in the fragment server 3 in accordance with the contents of the disk selection index file 213 referred to in step S705. For example, for fragments A1 to A6 belonging to stride A, fragment A1 is fragment server K6, fragment A2 is fragment server K1, fragment A3 is fragment server K0, fragment A4 is fragment server K9, and fragment A5 is fragment. The fragment A6 is stored in the fragment server K7 in the server K3. Similarly, each of the strides B and C is stored in the storage destination fragment server 3.

なお、各フラグメントを保存する際、各フラグメントと共にメタデータを保存する。本実施形態では、メタデータの一例として、保存対象のフラグメントを先頭とする１ストライド分のフィンガープリントと、ローリングハッシュを用いて計算した際に得られる内部状態とを保存する（これを、ロールサム状態（roll sum state）とする）。また、重複排除処理が行われたかどうかを示すフラグを保存する（これを、参照カウンタ値とする）。例えば、参照カウンタ値の初期値は１であり、重複排除のために参照されるたびに数値が増加してもよい。メタデータの詳細については表１乃至表５にて後述する。 When each fragment is stored, metadata is stored together with each fragment. In the present embodiment, as one example of metadata, a fingerprint for one stride starting from a fragment to be stored and an internal state obtained by calculation using a rolling hash are stored (this is a roll sum state). (Roll sum state)). Also, a flag indicating whether or not deduplication processing has been performed is stored (this is referred to as a reference counter value). For example, the initial value of the reference counter value is 1, and the numerical value may increase each time it is referred to for deduplication. Details of the metadata will be described later in Tables 1 to 5.

以下表１乃至表５を用いて、各フラグメントに付すメタデータについて説明する。 The metadata attached to each fragment will be described below using Tables 1 to 5.

表１は、本実施形態で用いるメタデータを例示する表である。

Table 1 is a table illustrating metadata used in this embodiment.

フラグメントをフラグメントサーバ３へ保存する際、オブジェクトＯＢ全体の構成情報及び各ストライドＳＴの構成情報など、様々な付随情報が必要となる。この付随情報をメタデータと呼ぶ。メタデータは、オブジェクトＯＢの保存時にフラグメントサーバ３に保存される。 When the fragment is stored in the fragment server 3, various accompanying information such as configuration information of the entire object OB and configuration information of each stride ST is required. This accompanying information is called metadata. The metadata is stored in the fragment server 3 when the object OB is stored.

なお、表１で例示する各メタデータは、例えばキー・バリュー構成をとる。キー・バリュー構成とは、データがキー情報と値の組み合わせで保存されており、キー情報を指定することで値を読み出せる構成である。 Each metadata exemplified in Table 1 has a key / value structure, for example. The key-value configuration is a configuration in which data is stored as a combination of key information and value, and the value can be read by specifying the key information.

各メタデータ値には、メタデータを参照するためのキー情報が与えられるが、異なるメタデータ値に対して同一のキーが使用される場合がある。そのため、各メタデータにはキー・プレフィックスが割り当てられ、保存時の実キーにはこのキー・プレフィックスとキー情報を結合した値が使用される。 Each metadata value is given key information for referring to the metadata, but the same key may be used for different metadata values. Therefore, a key prefix is assigned to each metadata, and a value obtained by combining this key prefix and key information is used as an actual key at the time of storage.

表１中の「複製可能？」欄はそのメタデータが複製されて保存されるか否かを示している。フラグメントそのものに関するメタデータは複製されず、他のメタデータについては複製される。 The “replicatable?” Column in Table 1 indicates whether or not the metadata is copied and stored. Metadata about the fragment itself is not duplicated, and other metadata is duplicated.

各キー及びメタデータの概要は以下のとおりである。 The outline of each key and metadata is as follows.

本実施形態において、オブジェクトＯＢには、オブジェクトＩＤに加えて、内部の識別子が与えられる。この内容の識別子をカウンタと呼ぶ。例えば、カウンタは、符号なし６４ビットの単調増加する値とする。本実施形態において、カウンタは、例えば、第１の情報、第２の情報、フロントエンドサーバＩＤを含む。 In the present embodiment, an internal identifier is given to the object OB in addition to the object ID. This identifier is called a counter. For example, the counter is a monotonically increasing value of 64 bits with no sign. In the present embodiment, the counter includes, for example, first information, second information, and a front-end server ID.

第１の情報は、対応するオブジェクトＯＢが内部的に割り当てられた際の第１の時刻における秒の部分を示す。 The first information indicates the second part at the first time when the corresponding object OB is internally allocated.

第２の情報は、第１の時刻と第２の時刻との差をマイクロ秒で表した時刻の値を所定の値（例えば２５）で割った値である。 The second information is a value obtained by dividing a time value representing the difference between the first time and the second time in microseconds by a predetermined value (for example, 25).

フロントエンドサーバＩＤは、各フロントエンドサーバ２に割り当てられる固有の識別子である。カウンタにこのフロントエンドサーバＩＤが含まれることで、例えば同一の時刻に異なるフロントエンドサーバ２でカウンタが割り当てられたとしても、このカウンタの値は異なることが保証される。 The front end server ID is a unique identifier assigned to each front end server 2. By including this front-end server ID in the counter, for example, even if the counter is assigned by different front-end servers 2 at the same time, it is guaranteed that the value of this counter is different.

また、ユーザＵ又は端末１１のアプリケーションが異なるオブジェクトに同じオブジェクトＩＤを与えて保存する場合であっても、カウンタの値は異なることが保証される。 Even when the application of the user U or the terminal 11 gives the same object ID to different objects and saves them, it is guaranteed that the counter values are different.

本実施形態において、各フロントエンドサーバ２で管理されている時刻は、同期されていると仮定する。この場合、カウンタを比較することで、複数のオブジェクトＯＢ又はオブジェクトＯＢに関連する複数のメタデータのいずれが新しいかを比較することが可能である。 In the present embodiment, it is assumed that the time managed by each front-end server 2 is synchronized. In this case, it is possible to compare which of the plurality of objects OB or the plurality of metadata related to the object OB is new by comparing the counters.

オブジェクト・メタ・カウンタは、キー情報に使用されているオブジェクトＩＤを持つオブジェクトのカウンタのリストであり、最大のカウンタが最新のオブジェクトとなる。オブジェクト・メタデータは、オブジェクトのメタデータの集合であり、一例を後述の表２に示す。ストライド・メタデータは、各ストライドに関するメタデータの集合であり、一例を後述の表３に示す。ストライドＩＤは、そのオブジェクトで何番目のストライドかを示す番号である。フラグメント・メタデータは、各フラグメントに関するメタデータであり、一例を後述の表４に示す。フラグメント位置情報は、フラグメントの位置によりフラグメントの実体を特定するためのメタデータであり、カウンタ、外部ストライドＩＤ（後述）、フラグメントＩＤにより定まる。参照カウント値は、フラグメント参照数であり、重複排除されている場合には２以上の値を持つ。誤り訂正符号は、例えばCyclic Redundancy Check（ＣＲＣ）などの誤り検出に用いられる符号である。ピボットは、ドライブ側で探索された想定重複位置であり、内部ストライドＩＤ、オフセット、ピボット長の３つの組で表現される。ピボットの詳細については、後述する。 The object meta counter is a list of counters of objects having object IDs used for key information, and the largest counter is the latest object. The object metadata is a set of object metadata, and an example is shown in Table 2 described later. Stride metadata is a set of metadata related to each stride, and an example is shown in Table 3 to be described later. The stride ID is a number indicating the number of stride in the object. Fragment metadata is metadata relating to each fragment, and an example is shown in Table 4 described later. The fragment position information is metadata for specifying the entity of the fragment by the fragment position, and is determined by a counter, an external stride ID (described later), and a fragment ID. The reference count value is the number of fragment references, and has a value of 2 or more when deduplication is performed. The error correction code is a code used for error detection such as a cyclic redundancy check (CRC). The pivot is an assumed overlapping position searched on the drive side, and is expressed by three sets of internal stride ID, offset, and pivot length. Details of the pivot will be described later.

同じ誤り訂正符号を持つフラグメントＩＤの集合は、比較前（ＮＣ：not compared)の集合と比較後（ＣＣ：compared）の集合の２種類の集合に分類されて保存される。フロントエンドサーバ２による保存時には比較前の集合として保存される。 A set of fragment IDs having the same error correction code is classified and stored in two types of sets, a set before comparison (NC: not compared) and a set after comparison (CC: compared). When saved by the front-end server 2, it is saved as a set before comparison.

表２は、オブジェクト・メタデータの内容を例示する表である。

Table 2 is a table illustrating the contents of the object metadata.

表３は、ストライド・メタデータの内容を例示する表である。

Table 3 is a table illustrating the content of stride metadata.

表４は、フラグメント・メタデータの内容を例示する表である。

Table 4 is a table illustrating the contents of fragment metadata.

また、各メタデータの保存先については、メタデータが複製される場合にはディスクセットＩＤを求める必要があり、メタデータが複製されない場合にはディスクＩＤを求める必要がある。 As for the storage destination of each metadata, it is necessary to obtain the disk set ID when the metadata is duplicated, and it is necessary to obtain the disk ID when the metadata is not duplicated.

表５は、メタデータとメタデータの保存先との関係を例示する表である。

Table 5 is a table illustrating the relationship between the metadata and the storage destination of the metadata.

表５のＨＲ（ｘ）は、ハッシュ関数Ｈ（引数ｘ）で求まる値を、ディスク選択インデックスファイル２１３の列数で割った余り値である。ディスクセットＩＤが得られる場合には、そのディスクセットの先頭からフラグメントＩＤ番目のディスクＩＤに対応するフラグメントサーバ３にメタデータが保存される。 HR (x) in Table 5 is a remainder value obtained by dividing the value obtained by the hash function H (argument x) by the number of columns of the disk selection index file 213. When the disk set ID is obtained, the metadata is stored in the fragment server 3 corresponding to the fragment ID-th disk ID from the top of the disk set.

メタデータ間の関係を明らかにするために、ユーザＵの使用する端末１１がオブジェクトを読み出す際の手順を例に各メタデータの関係について述べる。 In order to clarify the relationship between metadata, the relationship between each metadata will be described by taking as an example a procedure when the terminal 11 used by the user U reads an object.

図９は、オブジェクトＩＤ＝ｏ１であるオブジェクトの読み出し手順を例示するフローチャートである。 FIG. 9 is a flowchart illustrating a procedure for reading an object with object ID = o1.

ステップＳ９０１において、フロントエンドサーバ２は、オブジェクトＩＤであるｏ１をキー情報として、フラグメントサーバ３に保存されているオブジェクトカウンタを読み出す。オブジェクトカウンタの保存先のディスクセットＩＤは、ハッシュ関数ＨＲ（引数ｏ１）により求められる。 In step S901, the front-end server 2 reads out the object counter stored in the fragment server 3 using o1 that is the object ID as key information. The disk set ID of the storage destination of the object counter is obtained by the hash function HR (argument o1).

ステップＳ９０２において、フロントエンドサーバ２は、得られたオブジェクトカウンタから、最大のカウンタを求め（この値をｃ１とする)、ｃ１をキーとしてオブジェクト・メタデータを読み出す。オブジェクト・メタデータの保存先のディスクセットＩＤは、ハッシュ関数ＨＲ（引数ｃ１）により求められる。 In step S902, the front-end server 2 obtains the maximum counter from the obtained object counter (this value is c1), and reads the object metadata using c1 as a key. The disk set ID of the object metadata storage destination is obtained by the hash function HR (argument c1).

ステップＳ９０３において、フロントエンドサーバ２は、得られたオブジェクト・メタデータに含まれるルートストライドディスクセットＩＤ（最初のストライド・メタデータが保存されているディスクセットＩＤ、表２参照）を得て、カウンタｃ１をキーとしてストライド・メタデータを読み出す。 In step S903, the front-end server 2 obtains the root stride disk set ID (disk set ID in which the first stride metadata is stored, see Table 2) included in the obtained object metadata, and the counter The stride metadata is read using c1 as a key.

次に、フロントエンドサーバ２は、ストライド内のフラグメントを読み出す。 Next, the front-end server 2 reads the fragment in the stride.

ステップＳ９０４において、フロントエンドサーバ２は、ストライド・メタデータと同一のディスクセットＩＤを用いて、例えばｉ番目のフラグメント・メタデータを得る。 In step S904, the front-end server 2 obtains, for example, the i-th fragment metadata using the same disk set ID as the stride metadata.

ステップＳ９０５において、フロントエンドサーバ２は、フラグメント・メタデータに含むまれるフラグメントＩＤ（表４参照）を得て、フラグメント・メタデータと同一のディスクＩＤからフラグメント位置情報を読み出す。そして、フロントエンドサーバ２は、フラグメント位置情報に基づいて、フラグメントの内容を読み出す。 In step S905, the front-end server 2 obtains the fragment ID included in the fragment metadata (see Table 4), and reads the fragment position information from the same disk ID as the fragment metadata. Then, the front end server 2 reads the content of the fragment based on the fragment position information.

ステップＳ９０６において、フロントエンドサーバ２は、すべてのフラグメントを読み出した後、ストライド・メタデータに含まれる次ストライドディスクセットＩＤ（次のストライドが含まれるディスクセットＩＤ、表３参照）を得て、次のストライド・メタデータを読み出す。 In step S906, after reading all the fragments, the front-end server 2 obtains the next stride disk set ID (disc set ID including the next stride, see Table 3) included in the stride metadata, Read stride metadata for.

ステップＳ９０７において、フロントエンドサーバ２は、次のストライドが存在する場合は、ステップＳ９０４に戻り、フラグメントの読み出し処理を繰り返す。最後のストライドを読み出した場合、フロントエンドサーバ２は、オブジェクトの読み出し処理を終了する。 In step S907, if there is a next stride, the front-end server 2 returns to step S904 and repeats the fragment reading process. When the last stride is read, the front-end server 2 ends the object reading process.

図１０は、重複するフラグメントの排除処理の具体例を示す図である。 FIG. 10 is a diagram illustrating a specific example of the duplicate fragment elimination process.

前述のように、フロントエンドサーバ２は、同じストライドに対して同じフィンガープリントを計算し、同じストライドから分割されたフラグメントを同じフラグメントサーバ３に保存する。また、同一の誤り訂正符号を持つフラグメントは同一の内容を持つフラグメントである可能性が高い。フラグメントサーバ３は、定期的にＮＣ（重複排除のための比較前のフラグメントＩＤの集合、表１参照）に含まれる同じ誤り訂正符号を持つフラグメントを比較し、同じ誤り訂正符号を持つフラグメントが同一の内容であれば重複排除を行う。フラグメントサーバ３は、同じ誤り訂正符号を持つフラグメントが同一の内容でなければ、比較されたフラグメントのフラグメントＩＤをＮＣからＣＣ（重複排除のための比較後のフラグメントＩＤの集合、表１参照）へと移動する。 As described above, the front-end server 2 calculates the same fingerprint for the same stride, and stores the fragments divided from the same stride in the same fragment server 3. In addition, there is a high possibility that fragments having the same error correction code have the same contents. The fragment server 3 periodically compares fragments having the same error correction code included in the NC (set of fragment IDs before comparison for deduplication, see Table 1), and fragments having the same error correction code are the same. If it is the content of, deduplication is performed. If the fragments having the same error correction code do not have the same content, the fragment server 3 changes the fragment ID of the compared fragment from NC to CC (set of fragment IDs after comparison for deduplication, see Table 1). And move.

重複排除処理を行う場合、重複排除処理部３０１は、次の第１乃至第４の重複排除処理を実行する。なお、重複排除で残すフラグメントをfrag_o、重複排除で削除するフラグメントをfrag_nと表記する。 When performing deduplication processing, the deduplication processing unit 301 performs the following first to fourth deduplication processing. It should be noted that a fragment left by deduplication is denoted as frag_o, and a fragment to be deleted by deduplication is denoted as frag_n.

第１の処理は、フラグメント・メタデータ内のフラグメントfrag_nのフラグメントＩＤを、フラグメントfrag_oのフラグメントＩＤに変更する処理である。 The first process is a process of changing the fragment ID of the fragment frag_n in the fragment metadata to the fragment ID of the fragment frag_o.

第２の処理は、フラグメントfrag_oの参照カウント値をインクリメントする処理である。 The second process is a process of incrementing the reference count value of the fragment frag_o.

第３の処理は、同一の誤り訂正符号を持つフラグメントＩＤの比較前の集合から、フラグメントfrag_nのフラグメントＩＤを削除する処理である。 The third process is a process of deleting the fragment ID of the fragment frag_n from the set before the comparison of the fragment IDs having the same error correction code.

第４の処理は、フラグメントfrag_nを削除する処理である。 The fourth process is a process for deleting the fragment frag_n.

メタデータ１００１は、重複排除処理前のフラグメントfrag_oのメタデータである。メタデータ１００１におけるフラグメントfrag_oのオブジェクトＩＤをoid_o、ストライドＩＤをsi_o、フラグメントＩＤをfid_oとする。 The metadata 1001 is metadata of the fragment frag_o before deduplication processing. The object ID of the fragment frag_o in the metadata 1001 is oid_o, the stride ID is si_o, and the fragment ID is fid_o.

メタデータ１００２は、重複排除処理前のフラグメントfrag_nのメタデータである。メタデータ１００２におけるフラグメントfrag_nのオブジェクトＩＤをoid_n、ストライドＩＤをsi_n、フラグメントＩＤをfid_nとする。 The metadata 1002 is metadata of the fragment frag_n before deduplication processing. The object ID of the fragment frag_n in the metadata 1002 is oid_n, the stride ID is si_n, and the fragment ID is fid_n.

フロントエンドサーバ２によって保存される時点で、フラグメントfrag_nとフラグメントfrag_oが同一だったと仮定する。この場合、フラグメントfrag_oの誤り訂正符号crc_oとフラグメントfrag_nの誤り訂正符号crc_nは同一の値となる。従って、重複排除処理部３０１は、（ＣＣ，crc_o）をキーとして参照されるフラグメントＩＤfid_oと、（ＮＣ，crc_o）をキーとして参照されるフラグメントＩＤfid_nとを比較対象として決定する。 Assume that the fragment frag_n and the fragment frag_o are the same when saved by the front-end server 2. In this case, the error correction code crc_o of the fragment frag_o and the error correction code crc_n of the fragment frag_n have the same value. Accordingly, the deduplication processing unit 301 determines the fragment ID fid_o referred to using (CC, crc_o) as a key and the fragment ID fid_n referred to using (NC, crc_o) as a key for comparison.

次に、重複排除処理部３０１は、（ＢＫ，fid_o）、（ＢＫ，fid_n）を参照することで、フラグメントＩＤfid_o，fid_nそれぞれが示すフラグメントの実体として、フラグメントfrag_o，frag_nを取得し、重複排除処理を実行する。 Next, the deduplication processing unit 301 refers to (BK, fid_o) and (BK, fid_n), acquires fragments frag_o and frag_n as the entity of the fragment indicated by each of the fragment IDs fid_o and fid_n, and performs deduplication processing. Execute.

メタデータ１００３は重複排除処理後のfrag_oのメタデータである。メタデータ１００４は、重複排除処理後のfrag_nのメタデータである。重複排除処理部３０１は、メタデータ１００４において、フラグメントfrag_oとフラグメントfrag_nとが一致するため、フラグメントfrag_nを削除し、frag_nのフラグメント・メタデータにおいて、フラグメントＩＤfid_nをフラグメントＩＤfid_oに代える。すなわち、重複排除処理部３０１は、フラグメントfrag_nの実体を指し示す先をフラグメントfrag_oへ変更する。 Metadata 1003 is frag_o metadata after deduplication processing. Metadata 1004 is metadata of frag_n after deduplication processing. Since the fragment frag_o and the fragment frag_n match in the metadata 1004, the deduplication processing unit 301 deletes the fragment frag_n and replaces the fragment IDfid_n with the fragment IDfid_o in the fragment metadata of frag_n. That is, the de-duplication processing unit 301 changes the destination indicating the entity of the fragment frag_n to the fragment frag_o.

また、重複排除処理部３０１は、メタデータ１００３において、フラグメントfid_oの参照カウント値を２に変更し、メタデータ１００３に重複排除が行われたことを記録する。 Further, the deduplication processing unit 301 changes the reference count value of the fragment fid_o to 2 in the metadata 1003, and records that deduplication has been performed in the metadata 1003.

上述の処理によって、固定位置での重複排除を実現することができる。続いて、自由位置での重複排除の方法について述べる。 By the above process, deduplication at a fixed position can be realized. Next, a method for deduplication at a free position will be described.

図１１は、フィンガープリントセットを例示する図である。 FIG. 11 is a diagram illustrating a fingerprint set.

フロントエンドサーバ２は、ストライドＳＴの最初のフラグメントを１／２ずつに分割して得られる前半部分のフィンガープリント及び後半部分のフィンガープリントを計算する。フロントエンドサーバ２は、最初のフラグメントに含まれる前半部分及び後半部分の２種類のフィンガープリントと、ストライドＳＴのフィンガープリントを保存する。この集合をフィンガープリントセットと呼ぶ。 The front-end server 2 calculates the first half fingerprint and the second half fingerprint obtained by dividing the first fragment of the stride ST by 1/2. The front-end server 2 stores two types of fingerprints of the first half and the second half included in the first fragment, and the stride ST fingerprint. This set is called a fingerprint set.

図１１の例では、ストライドＡについてフロントエンドサーバ２が保持するフィンガープリントセット２１２には、フラグメントＡ１の前半部分のフィンガープリントＦＰ＿ａ１、フラグメントＡ１の後半部分のフィンガープリントＦＰ＿ａ２、ストライドＡ全体のフィンガープリントＦＰ＿Ａが含まれる。なお、ストライドＢ及びストライドＣについても同様である。以下では、フラグメント長の半分の長さのフィンガープリントを、１／２フィンガープリントと呼ぶ。 In the example of FIG. 11, the fingerprint set 212 held by the front-end server 2 for the stride A includes the fingerprint FP_a1 of the first half of the fragment A1, the fingerprint FP_a2 of the second half of the fragment A1, and the fingerprint FP_A of the entire stride A. Is included. The same applies to stride B and stride C. Hereinafter, a fingerprint that is half the fragment length is referred to as a ½ fingerprint.

図１２は、フロントエンドサーバ２におけるフィンガープリントセットの保存形式と探索方法を例示する図である。 FIG. 12 is a diagram illustrating a fingerprint set storage format and search method in the front-end server 2.

フロントエンドサーバ２は、フィンガープリントセット２１２の保存量が一定量に達した場合、フィンガープリントセット２１２をフラグメントサーバ３へ送信する。フラグメントサーバ３は、受信したフィンガープリントセット２１２を、記憶部３１に保存する。フィンガープリントセット２１２及びフィンガープリントセット３１１の内容は、例えばブルームフィルタ（bloom filter）を用いることにより圧縮して保存されてもよい。ブルームフィルタが用いられる場合、フィンガープリントセットの内容は、１つのブルームフィルタにまとめられて保存される。 The front-end server 2 transmits the fingerprint set 212 to the fragment server 3 when the storage amount of the fingerprint set 212 reaches a certain amount. The fragment server 3 stores the received fingerprint set 212 in the storage unit 31. The contents of the fingerprint set 212 and the fingerprint set 311 may be compressed and stored by using, for example, a bloom filter. When a Bloom filter is used, the contents of the fingerprint set are stored together in one Bloom filter.

ブルームフィルタは、ビット配列で表される確率的データ構造であり、要素が集合のメンバーに含まれるか否かを判定する際に有用である。例えば、文字列検索にブルームフィルタを適用する場合では、ある文字列Ｖがあらかじめ定められた文字列の集合Ｗに含まれる場合は真を返し、文字列Ｖが集合Ｗに含まれない場合は偽を返す。 A Bloom filter is a probabilistic data structure represented by a bit array, and is useful in determining whether an element is included in a member of a set. For example, when a Bloom filter is applied to a character string search, true is returned when a certain character string V is included in a predetermined character string set W, and false when the character string V is not included in the set W. return it.

ブルームフィルタの具体的な計算方法は、まず複数の文字列の集合Ｗの全要素を、任意に定められたハッシュ関数を用いて１つのビット配列に変換する。次に、比較対象の文字列Ｖに対しても同じようにハッシュ関数を適用し、ビット配列を得る。そして、得られた文字列Ｖのビット配列と集合Ｗのビット配列とをビット単位で比較することにより、文字列の集合Ｗに比較対象の文字列Ｖが含まれるかどうかを判定する。具体的には、文字列Ｖのビット配列が１である位置に対応する集合Ｗのビット配列中に１つでも０が存在する場合は、文字列Ｖは文字列Ｗに含まれていないと判定される。 As a specific calculation method of the Bloom filter, first, all elements of a plurality of character string sets W are converted into one bit array using an arbitrarily defined hash function. Next, a hash function is similarly applied to the character string V to be compared to obtain a bit array. Then, by comparing the obtained bit array of the character string V and the bit array of the set W in bit units, it is determined whether or not the character string V to be compared is included in the character string set W. Specifically, if at least one 0 exists in the bit array of the set W corresponding to the position where the bit array of the character string V is 1, it is determined that the character string V is not included in the character string W. Is done.

本実施形態において、フィンガープリント計算部２０３は、図１１に示した各フィンガープリント値を計算する。各フィンガープリント値は、フィンガープリントの種類に応じて任意に決められた固定値を連結した値で、各ハッシュ値を計算することで得られる。得られたブルームフィルタには、カウンタが割り当てられ、ブルームフィルタはカウンタをキー情報として保存される（表１のＦＳに該当する）。 In the present embodiment, the fingerprint calculation unit 203 calculates each fingerprint value shown in FIG. Each fingerprint value is a value obtained by connecting fixed values arbitrarily determined according to the type of fingerprint, and is obtained by calculating each hash value. A counter is assigned to the obtained Bloom filter, and the Bloom filter is stored with the counter as key information (corresponding to FS in Table 1).

また、フィンガープリントセットＦＰＳの探索を容易にするために、フィンガープリントセットＦＰＳに使用されたカウンタの集合を保存できるフィンガープリントセットリストＦＰＳＬが生成される。フィンガープリントセットリストＦＰＳＬの生成時には、カウンタが割り当てられ、カウンタをキー情報としてフィンガープリントセットリストＦＰＳＬが保存される（表１のＦＬに該当する）。フィンガープリントセットリストＦＰＳＬのカウンタが一定値以上となった場合には、新しいフィンガープリントセットリストが生成される。 In order to facilitate the search for the fingerprint set FPS, a fingerprint set list FPSL that can store a set of counters used in the fingerprint set FPS is generated. When the fingerprint set list FPSL is generated, a counter is assigned, and the fingerprint set list FPSL is stored using the counter as key information (corresponding to FL in Table 1). If the counter of the fingerprint set list FPSL reaches a certain value or more, a new fingerprint set list is generated.

さらに、フィンガープリントセットリストＦＰＳＬで使用されているカウンタの集合を保持するために、フィンガープリントセットリストヘッドＦＰＳＬＨが生成される。フィンガープリントセットリストヘッドＦＰＳＬＨのキー情報は固定値として、システム全体で共有される。 In addition, a fingerprint set list head FPSLH is generated to hold a set of counters used in the fingerprint set list FPSL. The key information of the fingerprint set list head FPSLH is shared as a fixed value throughout the system.

図１２に示すとおり、フィンガープリントセットリストＦＰＳＬ及びフィンガープリントセットＦＰＳの保存先のディスクセットＩＤは、例えばカウンタを引数としてハッシュ関数ＨＲで求める。また、フィンガープリントセットリストヘッドＦＰＳＬＨの保存先のディスクセットＩＤは、システム全体で共有される固定値である。 As shown in FIG. 12, the fingerprint set list FPSL and the disk set ID of the fingerprint set FPS storage destination are obtained by a hash function HR, for example, using a counter as an argument. The storage set disk set ID of the fingerprint set list head FPSLH is a fixed value shared by the entire system.

フィンガープリントセットＦＰＳを探索するには、まずフィンガープリントセットリストヘッドＦＰＳＬＨを読み出し、フィンガープリントセットリストＦＰＳＬのカウンタのリストを得る。このカウンタから、フィンガープリントセットリストＦＰＳＬを読み出し、フィンガープリントセットＦＰＳのカウンタのリストを得る。そしてこのカウンタから、フィンガープリントセットＦＰＳを順に得ることができる。 In order to search for the fingerprint set FPS, the fingerprint set list head FPSLH is first read to obtain a counter list of the fingerprint set list FPSL. From this counter, the fingerprint set list FPSL is read to obtain a list of fingerprint set FPS counters. From this counter, the fingerprint set FPS can be obtained in order.

また、各フィンガープリントセットＦＰＳにはカウンタが与えられているため、本実施形態では、複数のフィンガープリントセットＦＰＳの間での生成順序をカウンタの値から判定することが可能である。 Since each fingerprint set FPS is provided with a counter, in the present embodiment, the generation order among a plurality of fingerprint sets FPS can be determined from the value of the counter.

以下、自由位置での重複排除処理の処理内容について述べる。 The processing contents of the deduplication processing at the free position will be described below.

自由位置での重複排除処理において、重複排除処理部３０１は、次の第１乃至第６の処理を実行する。 In the deduplication process at the free position, the deduplication processing unit 301 executes the following first to sixth processes.

第１の処理は、保存される第１のデータにおける第１の部分について、第１の部分的ハッシュ値を予め計算する処理である。また、第１のデータにおける第１の部分から始まる重複検出範囲に対応する第１のハッシュ値を予め計算する処理である。 The first process is a process for calculating in advance a first partial hash value for the first part of the first data to be stored. In addition, the first hash value corresponding to the duplication detection range starting from the first portion in the first data is calculated in advance.

第２の処理は、第１のデータの後に記憶される第２のデータの第２の部分について、第２の部分的ハッシュ値を計算する処理である。 The second process is a process of calculating a second partial hash value for the second part of the second data stored after the first data.

第３の処理は、第１の部分的ハッシュ値と第２の部分的ハッシュ値とが一致するか判断する処理である。 The third process is a process for determining whether the first partial hash value matches the second partial hash value.

第４の処理は、第３の処理で第１の部分的ハッシュ値と第２の部分的ハッシュ値とが一致する場合、第１のデータにおける第１の部分から始まる重複検出範囲に対応する第１のハッシュ値を読み出すとともに、第２のデータにおける第２の部分から始まる重複検出範囲に対応する第２のハッシュ値を計算する処理である。 In the fourth process, when the first partial hash value and the second partial hash value match in the third process, the fourth process corresponds to the duplication detection range that starts from the first part in the first data. This is a process of reading the hash value of 1 and calculating the second hash value corresponding to the duplication detection range starting from the second portion in the second data.

第５の処理は、第１のハッシュ値と第２のハッシュ値とが一致するか判断する処理である。 The fifth process is a process for determining whether or not the first hash value and the second hash value match.

第６の処理は、第５の処理において第１のハッシュ値と第２のハッシュ値とが一致する場合、第２のデータにおける第２の部分から始まる重複検出範囲が、第１のデータにおける第１の部分から始まる重検出範囲と重複するため、重複を排除した状態でデータの保存を行うための処理である。 In the sixth process, when the first hash value and the second hash value match in the fifth process, the duplication detection range starting from the second portion in the second data is the first in the first data. This is a process for storing data in a state where duplication is eliminated because it overlaps with the overlap detection range starting from part 1.

以下、図１３乃至図１９を用いて、より具体的に自由位置での重複排除処理を説明する。 Hereinafter, the deduplication processing at the free position will be described more specifically with reference to FIGS.

図１３は、自由位置での重複排除処理を例示するフローチャートである。 FIG. 13 is a flowchart illustrating the deduplication processing at a free position.

ステップＳ１３０１において、フロントエンドサーバ２は、フラグメントサーバ３に対し、フィンガープリントセット２１２を送信する。送信のタイミングは任意でよく、例えばフィンガープリントセット２１２のサイズが一定量に達した場合に送信するとしてもよい。また、すべてのフラグメントサーバ３に対し、同じフィンガープリントセットを送信するとしてもよい。 In step S 1301, the front end server 2 transmits the fingerprint set 212 to the fragment server 3. The transmission timing may be arbitrary. For example, the transmission may be performed when the size of the fingerprint set 212 reaches a certain amount. Further, the same fingerprint set may be transmitted to all the fragment servers 3.

ステップＳ１３０２において、フラグメントサーバ３は、フロントエンドサーバ２よりフィンガープリントセットを受信し、記憶部３１に保存する。 In step S 1302, the fragment server 3 receives the fingerprint set from the front end server 2 and stores it in the storage unit 31.

ステップＳ１３０３において、フラグメントサーバ３の重複排除処理部３０１は、記憶部３１に保存したフィンガープリントセット３１１の内容を読み込み、一時的に保存する。 In step S1303, the deduplication processing unit 301 of the fragment server 3 reads the content of the fingerprint set 311 stored in the storage unit 31 and temporarily stores it.

重複排除処理部３０１は、フラグメント記憶部３１２に保存されたフラグメントの先頭からフラグメントの１／２の長さで１バイトずつずらしながら、１／２フィンガープリントを計算する。１／２フィンガープリントの計算には、図６で示したローリングハッシュを適用してもよい。なお、フロントエンドサーバ２が複数存在する場合は、フラグメント記憶部３１２に保存されたフラグメントは、どのフロントエンドサーバ２から受信したものであってもよい。 The de-duplication processing unit 301 calculates a ½ fingerprint while shifting one byte at a time from the beginning of the fragment stored in the fragment storage unit 312 by ½ of the fragment. The rolling hash shown in FIG. 6 may be applied to the 1/2 fingerprint calculation. When there are a plurality of front-end servers 2, the fragments stored in the fragment storage unit 312 may be received from any front-end server 2.

ステップＳ１３０４において、重複排除処理部３０１は、ステップＳ１３０３において計算されたフィンガープリントと、読み込んだフィンガープリントセットに保存されているフラグメントの前半又は後半部分の１／２フィンガープリントとを比較する。すなわち重複排除処理部３０１は、第１の重複探索を行う。読み込むフィンガープリントセットは複数でもよい。 In step S1304, the deduplication processing unit 301 compares the fingerprint calculated in step S1303 with the ½ fingerprint of the first half or the latter half of the fragment stored in the read fingerprint set. That is, the duplicate elimination processing unit 301 performs a first duplicate search. Multiple fingerprint sets may be read.

１／２フィンガープリントが一致しない場合、処理はステップＳ１３０４に戻り、１／２フィンガープリントが一致するまで、順次１バイトずつずらしながら１／２フィンガープリントの計算と比較とが行われる。 If the ½ fingerprint does not match, the process returns to step S1304, and the ½ fingerprint is calculated and compared while sequentially shifting by 1 byte until the ½ fingerprint matches.

１／２フィンガープリントが一致する場合、ステップＳ１３０５において、重複排除処理部３０１は、１／２フィンガープリントの一致位置から１ストライド分のフラグメントのうち、先頭のフラグメントと最後のフラグメントを自機の又は他のフラグメントサーバから読み出し、一致した位置から１ストライド分のフィンガープリントを計算する。フラグメントの読み出し処理の詳細については、図１４で後述する。 If the ½ fingerprints match, in step S1305, the de-duplication processing unit 301 sets the first fragment and the last fragment among the fragments for one stride from the matching position of the ½ fingerprint. Read from other fragment servers and calculate the fingerprint for one stride from the matching position. Details of the fragment reading process will be described later with reference to FIG.

１ストライド分のフィンガープリントの計算には、例えば図６で示したローリングハッシュを適用してもよい。ローリングハッシュでは、あるフラグメントの先頭よりｎバイトずれた位置から１ストライド分のフィンガープリントを計算する場合、そのフラグメントを先頭とした１ストライド分のローリングハッシュ計算の内部状態と、計算対象となるストライドの先頭と最後を含む２つのフラグメントの内容が分かればよく、その間のフラグメントは計算には不要である。具体的には、重複排除処理部３０１は、フラグメント・メタデータとして保存されているロールサム状態を初期値とし、図６に示したように１バイトずらしてフィンガープリントを計算する。これをｎ回繰り返すことにより、フラグメントの先頭からｎバイト離れた１ストライド分のフィンガープリントを計算できる。１バイトずつずらす際に、除く１バイトと加える１バイトは、前述の先頭と最後を含む２つのフラグメントより知ることができる。 For example, the rolling hash shown in FIG. 6 may be applied to calculate the fingerprint for one stride. In a rolling hash, when calculating a fingerprint for one stride from a position shifted by n bytes from the beginning of a fragment, the internal state of the rolling hash calculation for one stride starting from that fragment and the stride to be calculated are calculated. It is only necessary to know the contents of two fragments including the beginning and the end, and the fragment between them is not necessary for the calculation. Specifically, the deduplication processing unit 301 sets the roll sum state stored as fragment metadata as an initial value, and calculates a fingerprint with a shift of 1 byte as shown in FIG. By repeating this n times, the fingerprint for one stride that is n bytes away from the beginning of the fragment can be calculated. When shifting one byte at a time, one byte to be added and one byte to be added can be known from the two fragments including the head and the end.

ステップＳ１３０６において、重複排除処理部３０１は、ステップＳ１３０５において計算された１ストライドのフィンガープリントと、読み込んだフィンガープリントセットに保存されている１ストライドのフィンガープリントとを比較する。すなわち重複排除処理部３０１は、第２の重複探索を行う。 In step S1306, the deduplication processing unit 301 compares the fingerprint of one stride calculated in step S1305 with the fingerprint of one stride stored in the read fingerprint set. That is, the duplicate elimination processing unit 301 performs a second duplicate search.

ストライド単位でのフィンガープリントが一致した場合、一致した場所をピボットと呼ぶ。ピボットは、前述のとおり、内部ストライドＩＤ、オフセット、ピボット長を含む値で表される。オフセットはその一致位置がそのストライドの何バイト目かを表している。ピボット長は、ピボットが連続している場合の長さを示している。内部ストライドＩＤは、ピボットでオブジェクトを再分割した際に、何番目のストライドに属するかを示す番号である。内部ストライドについての詳細は、後述する。 When the fingerprints in the stride unit match, the matching location is called a pivot. As described above, the pivot is represented by a value including the internal stride ID, the offset, and the pivot length. The offset indicates how many bytes of the stride the matching position is. The pivot length indicates the length when the pivot is continuous. The internal stride ID is a number indicating to which stride it belongs when the object is subdivided by the pivot. Details of the internal stride will be described later.

ステップＳ１３０７において、重複排除処理部３０１は、得られたピボットを重複位置情報３１３へ保存する。 In step S 1307, the duplicate elimination processing unit 301 stores the obtained pivot in the duplicate position information 313.

ストライド単位でのフィンガープリントが一致しない場合、処理はステップＳ１３０４に戻る。 If the fingerprints in units of strides do not match, the process returns to step S1304.

ステップＳ１３０８において、フラグメントサーバ３は、重複位置情報３１３に保存されたピボットが一定量になっているものについて、フロントエンドサーバ２へと送信する。送信のタイミングは、例えばピボットを重複位置情報３１３に保存するタイミングであってもよい。 In step S 1308, the fragment server 3 transmits to the front-end server 2 information about the pivot stored in the overlapping position information 313 having a certain amount. The transmission timing may be a timing for storing the pivot in the overlap position information 313, for example.

ステップＳ１３０９において、フロントエンドサーバ２は、ピボットのリストを受信すると、ピボット位置が開始ストライドの先頭となるようにストライドを再配置する。再配置処理の詳細は、図１８及び図１９で後述する。 In step S1309, when the front-end server 2 receives the list of pivots, the front-end server 2 rearranges the strides so that the pivot position is at the head of the start stride. Details of the rearrangement process will be described later with reference to FIGS.

図１４は、フラグメントの第１の読み出し処理を例示する図である。 FIG. 14 is a diagram illustrating a first fragment reading process.

図１５は、フラグメントの第２の読み出し処理を例示する図である。 FIG. 15 is a diagram illustrating a second fragment reading process.

前述のように、重複排除処理部３０１で計算した１／２フィンガープリントと、フィンガープリントセットに保存されている１／２フィンガープリントが一致した場合、重複排除処理部３０１は、一致位置から１ストライド分のフラグメントのうち先頭と最後のフラグメントを読み出し、読み出された先頭と最後のフラグメントに基づいて一致位置から１ストライド分のフィンガープリントを計算する。そして、重複排除処理部３０１は、計算結果とフィンガープリントセットに保存されている１ストライドのフィンガープリントとを比較する。 As described above, when the ½ fingerprint calculated by the deduplication processing unit 301 matches the ½ fingerprint stored in the fingerprint set, the deduplication processing unit 301 determines that one stride from the matching position. Among the minute fragments, the first and last fragments are read out, and the fingerprint for one stride is calculated from the coincidence position based on the read first and last fragments. Then, the de-duplication processing unit 301 compares the calculation result with the fingerprint of one stride stored in the fingerprint set.

ここで、フィンガープリントセットに保存されている１／２フィンガープリントのうち、フラグメントの前半部分の１／２フィンガープリントと一致したか、フラグメントの後半部分の１／２フィンガープリントと一致したかによって、１ストライド分のフィンガープリントを計算する際に読み出すフラグメントが異なる場合がある。 Here, out of the ½ fingerprints stored in the fingerprint set, whether it matches the ½ fingerprint of the first half of the fragment or the ½ fingerprint of the second half of the fragment, The fragment read when calculating the fingerprint for one stride may be different.

図１４は、前半部分の１／２フィンガープリントと一致した場合の読み出し処理を例示する。図１４の上段はフラグメントサーバ３に保存されているフラグメントを表し、図１４の下段はフィンガープリントセットに保存されているフィンガープリント（図１１に対応）を表す。 FIG. 14 exemplifies the reading process in the case where the first fingerprint matches the ½ fingerprint. The upper part of FIG. 14 represents the fragments stored in the fragment server 3, and the lower part of FIG. 14 represents the fingerprints (corresponding to FIG. 11) stored in the fingerprint set.

図１４では、重複排除処理部３０１によって探索中の１／２フラグメント１４０１のフィンガープリントとＦＰ＿ａ１（フラグメントＡ１の前半部分の１／２フィンガープリント）が一致する。従って、重複排除処理部３０１は、次にＦＰ＿Ａ（ストライドＡのフィンガープリント）との一致を調べるため、フラグメントＡ１の先頭に相当する位置から１ストライド分のフィンガープリントを計算する。この際、フラグメントＡ１の先頭に相当する位置から１ストライド長のストライドの先頭を含むフラグメント（又は部分）と最後を含むフラグメント（又は部分）があればよい。ストライドの先頭を含むフラグメントについては、すでに重複排除処理部３０１の計算対象となっている１／２フラグメント１４０１の先頭と同じであるため、フラグメントの読み込みは不要である。従って、重複排除処理部３０１は、ストライドの最後を含むフラグメント１４０２を他のフラグメントサーバより読み出す。 In FIG. 14, the fingerprint of the ½ fragment 1401 that is being searched by the deduplication processing unit 301 matches FP_a1 (the ½ fingerprint of the first half of the fragment A1). Therefore, the deduplication processing unit 301 calculates a fingerprint for one stride from the position corresponding to the head of the fragment A1 in order to check the next match with FP_A (stride A fingerprint). At this time, a fragment (or portion) including the head of a stride having a length of one stride from a position corresponding to the head of the fragment A1 and a fragment (or portion) including the end may be sufficient. Since the fragment including the head of the stride is the same as the head of the ½ fragment 1401 that has already been calculated by the deduplication processing unit 301, it is not necessary to read the fragment. Accordingly, the deduplication processing unit 301 reads the fragment 1402 including the end of the stride from another fragment server.

一方、図１５では、重複排除処理部３０１によって探索中の１／２フラグメント１４０３のフィンガープリントとＦＰ＿ａ２（フラグメントＡ１の後半部分の１／２フィンガープリント）が一致する。従って、重複排除処理部３０１は、次にＦＰ＿Ａ（ストライドＡのフィンガープリント）との一致を調べるため、フラグメントＡ１の先頭に相当する位置から１ストライド分のフィンガープリントを計算する。この際、フラグメントＡ１の先頭に相当する位置から１ストライド長のストライドの先頭を含むフラグメント（又は部分）と最後を含むフラグメント（又は部分）があればよい。従って、重複排除処理部３０１は、ストライドの先頭を含むフラグメント１４０４及びストライドの最後を含むフラグメント１４０５を他のフラグメントサーバより読み出す。 On the other hand, in FIG. 15, the fingerprint of the ½ fragment 1403 being searched by the deduplication processing unit 301 matches FP_a2 (the ½ fingerprint of the second half of the fragment A1). Therefore, the deduplication processing unit 301 calculates a fingerprint for one stride from the position corresponding to the head of the fragment A1 in order to check the next match with FP_A (stride A fingerprint). At this time, a fragment (or portion) including the head of a stride having a length of one stride from a position corresponding to the head of the fragment A1 and a fragment (or portion) including the end may be sufficient. Accordingly, the de-duplication processing unit 301 reads the fragment 1404 including the head of the stride and the fragment 1405 including the end of the stride from other fragment servers.

図１６は、自由位置での重複排除処理の第１の例を示す図である。 FIG. 16 is a diagram illustrating a first example of deduplication processing at a free position.

図１７は、自由位置での重複排除処理の第２の例を示す図である。この図１７は、図１６における重複の検出をより理解しやすく表現している。 FIG. 17 is a diagram illustrating a second example of deduplication processing at a free position. FIG. 17 expresses the detection of duplication in FIG. 16 in an easy-to-understand manner.

図１６及び図１７に付した番号は、図１３の各ステップに付した番号と対応する。 The numbers given in FIGS. 16 and 17 correspond to the numbers given to the respective steps in FIG.

図１６及び図１７の例では、フロントエンドサーバＰ１が、フラグメントサーバＫ１へ、フィンガープリントセット２１２を送信する。フラグメントサーバＫ１は、受信されたフィンガープリントセット２１２を、フィンガープリントセット３１１として記憶部３１に保存する。 16 and 17, the front end server P1 transmits the fingerprint set 212 to the fragment server K1. The fragment server K1 stores the received fingerprint set 212 in the storage unit 31 as the fingerprint set 311.

フラグメントサーバＫ１では、フラグメントＡ３が探索の対象となっているとする。フラグメントサーバＫ１の重複排除処理部３０１は、フラグメントＡ３に対して先頭から１／２フラグメント長のローリングハッシュを順に計算し、保持しているフィンガープリントセット３１１の前半又は後半部分のフラグメントのフィンガープリントと比較する。 In the fragment server K1, it is assumed that the fragment A3 is a search target. The deduplication processing unit 301 of the fragment server K1 calculates a rolling hash of ½ fragment length from the head for the fragment A3 in order, and holds the fingerprints of the first half or the latter half of the fingerprint set 311 held. Compare.

フラグメントサーバＫ１の重複排除処理部３０１は、フラグメントＡ３の先頭からｎバイト進んだところで同一の１／２フィンガープリントを発見した場合、１ストライド後のフラグメントであるフラグメントＢ３を他のフラグメントサーバＫ０から読み込む。 When the deduplication processing unit 301 of the fragment server K1 finds the same ½ fingerprint at a position advanced by n bytes from the beginning of the fragment A3, the fragment B3 which is a fragment after one stride is read from another fragment server K0. .

フラグメントサーバＫ１の重複排除処理部３０１は、フラグメントＢ３とフラグメントＡ３のフラグメント・メタデータに保存されているロールサム状態を使用してフラグメントＡ３の先頭からｎバイト進んだ地点から始まる１ストライドのフィンガープリントを計算し、計算された１ストライドのフィンガープリントとフィンガープリントセット３１１のストライドフィンガープリントとを比較する。 The deduplication processing unit 301 of the fragment server K1 uses the roll sum state stored in the fragment metadata of the fragment B3 and the fragment A3, and prints one stride fingerprint starting from the point advanced n bytes from the beginning of the fragment A3. Calculate and compare the calculated stride fingerprint with the stride fingerprint of the fingerprint set 311.

ストライドフィンガープリント中に、計算された１ストライドのフィンガープリントと一致する部分がある場合、フラグメントサーバＫ１の重複排除処理部３０１は、このピボット位置を追加する。なお、ピボット位置は、具体的には、ストライドの先頭であるフラグメントＡ１からのバイト数であってもよい。 If there is a portion in the stride fingerprint that matches the calculated one stride fingerprint, the deduplication processing unit 301 of the fragment server K1 adds this pivot position. Specifically, the pivot position may be the number of bytes from the fragment A1 that is the head of the stride.

フラグメントサーバＫ１の重複排除処理部３０１は、ピボット位置を重複位置情報３１３へ保存する。その後フラグメントサーバＫ１は、再配置するオブジェクトを特定するカウンタ値及び重複位置情報３１３をフロントエンドサーバＰ１へ送信する。 The deduplication processing unit 301 of the fragment server K1 stores the pivot position in the duplication position information 313. Thereafter, the fragment server K1 transmits a counter value specifying the object to be rearranged and the overlapping position information 313 to the front-end server P1.

図１８は、フロントエンドサーバ２によるオブジェクトの再配置の第１の例を示す図である。 FIG. 18 is a diagram illustrating a first example of object rearrangement by the front-end server 2.

フロントエンドサーバ２は、フラグメントサーバ３より再配置するオブジェクトを特定するカウンタ値及び重複位置情報３１３を受信すると、カウンタ値からディスク選択インデックスファイル２１３を参照し、ピボットを保存するフラグメントサーバ３を特定する。さらに、フロントエンドサーバ２は、ストライドの再配置のため対象のオブジェクトを読み出す。この際、フロントエンドサーバ２は、フラグメントごとにすでに重複排除済であるかどうかを示す参照カウント値も読み出す。参照カウント値は、初期状態すなわち重複排除が行われていない状態では１であり、重複排除のための参照が行われる毎に１ずつ増加するものとする。 When the front-end server 2 receives the counter value specifying the object to be rearranged and the overlapping position information 313 from the fragment server 3, the front-end server 2 refers to the disk selection index file 213 from the counter value and specifies the fragment server 3 that stores the pivot. . Further, the front-end server 2 reads the target object for rearrangement of the stride. At this time, the front-end server 2 also reads a reference count value indicating whether or not deduplication has already been performed for each fragment. The reference count value is 1 in the initial state, that is, the state where deduplication is not performed, and is incremented by 1 each time a reference for deduplication is performed.

フロントエンドサーバ２は、読み出したオブジェクトを各ピボットに合わせてストライドへと分割する。この時生成されるストライドは内部ストライドと呼ばれ、実際の保存単位のストライドは外部ストライドと呼ばれる。外部ストライドの長さは必ずストライド長、すなわちフラグメント長の整数倍に等しいが、内部ストライドはストライド長よりも短くてよい。また、内部ストライドに対して順に与えられる番号を、内部ストライドＩＤと呼ぶ。ただし、内部ストライドは、すでに重複排除されているフラグメント（参照カウント値が２以上）については移動させないように生成される。 The front-end server 2 divides the read object into strides according to each pivot. The stride generated at this time is called the internal stride, and the actual storage unit stride is called the external stride. The length of the outer stride is always equal to the stride length, ie, an integral multiple of the fragment length, but the inner stride may be shorter than the stride length. A number given in order to the internal stride is called an internal stride ID. However, the internal stride is generated so that fragments that have already been deduplicated (reference count value of 2 or more) are not moved.

図１８の例では、フラグメントＡ３の途中にピボットがあると仮定する。まず、状態（１−１）では、すべてのフラグメントがまだ重複排除の対称となっていない、すなわち参照カウント値が１である。フラグメントＡ３のピボット位置から１ストライド分、すなわちフラグメントＢ３の途中までに重複部分が存在する可能性がある場合、状態（１−２）で示すように、ピボットの位置にストライドの境界が作られるように内部ストライドｉｓ１〜ｉｓ４が生成される。 In the example of FIG. 18, it is assumed that there is a pivot in the middle of the fragment A3. First, in the state (1-1), all the fragments are not yet symmetrical with respect to deduplication, that is, the reference count value is 1. When there is a possibility that an overlap portion exists from the pivot position of the fragment A3 by one stride, that is, in the middle of the fragment B3, as shown in the state (1-2), a stride boundary is created at the pivot position. Internal strides is1 to is4 are generated.

また、複数の内部ストライドが、ストライド長よりある程度短い場合には、これらをまとめて一つの外部ストライドとすることができる。このようなストライドを集合ストライドと呼ぶ。状態（１−３）では、内部ストライドｉｓ１と内部ストライドｉｓ４とを含む集合ストライドが例示されている。状態（１−３）における集合ストライドでは、フラグメントＣ４がパディングされている。集合ストライドでは、内部ストライドを組み合わせた後の残りのサイズに、さらに他の内部ストライドを挿入できるか、評価される。なお、内部ストライドｉｓ１、内部ストライドｉｓ４、及び集合ストライドに内部ストライドｉｓ１及び内部ストライドｉｓ４をまとめたことにより必要となるヘッダ、のそれぞれの長さの和がストライド長よりも短かった場合には、外部ストライドｏｓ１に対してパディングが行われる。 When a plurality of internal strides are somewhat shorter than the stride length, they can be combined into one external stride. Such a stride is called a collective stride. In the state (1-3), a collective stride including the internal stride is1 and the internal stride is4 is illustrated. In the collective stride in the state (1-3), the fragment C4 is padded. In the collective stride, it is evaluated whether other internal strides can be inserted into the remaining size after the internal strides are combined. If the sum of the lengths of the internal stride is1, the internal stride is4, and the header required by combining the internal stride is1 and the internal stride is4 with the collective stride is shorter than the stride length, Padding is performed on the stride os1.

フロントエンドサーバ２は、外部ストライドｏｓ１〜ｏｓ３に対して消失符号を計算し、外部ストライドｏｓ１〜ｏｓ３をフラグメント長に分割して保存する。このため、オブジェクトを読み出す際には、外部ストライドｏｓ１〜ｏｓ３が順に読み出される。 The front-end server 2 calculates erasure codes for the external strides os1 to os3, and stores the external strides os1 to os3 by dividing them into fragment lengths. For this reason, when reading the object, the external strides os1 to os3 are sequentially read.

ただし、オブジェクトを読み出す際には、外部ストライドｏｓ１〜ｏｓ３に含まれている内部ストライドｉｓ１〜ｉｓ４を、内部ストライドＩＤの順に並べる必要がある。そのため、集合ストライドを生成するために内部ストライドｉｓ１〜ｉｓ４の位置関係を変更する場合には、内部ストライドｉｓ１〜ｉｓ４の新たな位置が内部ストライドｉｓ１〜ｉｓ４のもとの位置に近いほど、読み出し効率が良くなる。 However, when reading an object, it is necessary to arrange the internal strides is1 to is4 included in the external strides os1 to os3 in the order of the internal stride IDs. Therefore, when the positional relationship of the internal strides is1 to is4 is changed in order to generate the aggregate stride, the read efficiency increases as the new positions of the internal strides is1 to is4 are closer to the original positions of the internal strides is1 to is4. Will be better.

図１９は、フロントエンドサーバ２によるオブジェクトの再配置の第２の例を示す図である。 FIG. 19 is a diagram illustrating a second example of object rearrangement by the front-end server 2.

図１９の状態（２−１）では、フラグメントＣ１〜Ｃ４から構成されるストライドに対してすでに重複排除が適用されている。この場合、フラグメントＣ１〜Ｃ４の参照カウント値は２以上となる。パディングは、ピボットの位置にストライドの境界が位置するように行われる。状態（２−２）では、すでに重複が生じているフラグメントＣ１〜Ｃ４を移動することができないため、フラグメントＢ４の後にパディングデータが挿入されている。 In the state (2-1) of FIG. 19, deduplication has already been applied to the stride composed of the fragments C1 to C4. In this case, the reference count value of the fragments C1 to C4 is 2 or more. Padding is performed such that the stride boundary is located at the pivot position. In the state (2-2), since the overlapping fragments C1 to C4 cannot be moved, padding data is inserted after the fragment B4.

フロントエンドサーバ２は、この再配置されたストライドを保存する。この保存処理は最初にオブジェクトを保存する場合と同様であり、再帰的に各フラグメントサーバ３による重複排除処理が適用される。上記の手順により、自由位置の重複排除を実現することができる。 The front-end server 2 stores this rearranged stride. This saving process is similar to the case of saving the object first, and the deduplication process by each fragment server 3 is applied recursively. By the above procedure, it is possible to realize deduplication of free positions.

以上説明した本実施形態に係る情報処理システム１では、各フラグメントサーバ３が重複位置の探索を分担して行う。これにより、データの記憶量が増加しても、フロントエンドサーバ２の負荷が増加しない。そのため、上述の重複排除方式は、記憶容量の増加に対してもパフォーマンスを落とすことなく実行可能である。 In the information processing system 1 according to the present embodiment described above, each fragment server 3 shares and searches for overlapping positions. Thereby, even if the data storage amount increases, the load on the front-end server 2 does not increase. Therefore, the above-described deduplication method can be executed without reducing performance even when the storage capacity increases.

本実施形態に係る情報処理システム１は、低価格・大容量・高信頼なストレージの実現を目標としており、以下のような特徴を持つ。 The information processing system 1 according to the present embodiment aims to realize a low price, large capacity, and highly reliable storage, and has the following features.

本実施形態に係る情報処理システム１においては、記憶容量の増減はフラグメントサーバの追加又は削除により実現され、フロントエンドサーバ２の台数及び構成などへの依存は存在しない。そのため、ストレージ管理者は記憶容量の増減についてはフラグメントサーバ３の台数のみを管理すればよく、管理が容易である。 In the information processing system 1 according to the present embodiment, the increase / decrease of the storage capacity is realized by adding or deleting fragment servers, and there is no dependency on the number and configuration of the front-end servers 2. Therefore, the storage administrator only needs to manage the number of fragment servers 3 for the increase and decrease of the storage capacity, and the management is easy.

本実施形態に係る情報処理システム１においては、システム全体の処理能力はフロントエンドサーバ２の台数に比例して向上する。そのため、例えばストレージ利用者の増加などで性能が不足した場合には、フロントエンドサーバ２を増設するだけでシステム全体の処理能力を向上させることができる。フロントエンドサーバ２はデータの最終的な保存場所ではなく、またフロントエンドサーバ２の台数及び構成は、フラグメントサーバ３の台数及び構成に影響しない。そのため、ストレージ管理者は性能についてはフロントエンドサーバ２の台数のみを管理すればよく、管理が容易である。 In the information processing system 1 according to the present embodiment, the processing capacity of the entire system is improved in proportion to the number of front-end servers 2. Therefore, for example, when the performance is insufficient due to an increase in the number of storage users, the processing capacity of the entire system can be improved by simply adding the front-end server 2. The front-end server 2 is not the final storage location of data, and the number and configuration of the front-end servers 2 do not affect the number and configuration of the fragment servers 3. Therefore, the storage administrator only needs to manage the number of front-end servers 2 for performance, and management is easy.

本実施形態に係る情報処理システム１においては、固定位置による重複排除及び自由位置による重複排除を行う際、ストライド単位で重複排除を行う。すなわち、重複排除処理後であっても、重複排除処理前と同じく、１つのストライドから生成されたフラグメントは異なるディスクに保存される。その結果、重複排除処理後も消失符号の信頼性が維持されるため、耐障害性が確保される。 In the information processing system 1 according to the present embodiment, deduplication is performed in units of strides when deduplication is performed at fixed positions and deduplication is performed at free positions. That is, even after deduplication processing, the fragments generated from one stride are stored on different disks as before deduplication processing. As a result, the reliability of the erasure code is maintained even after the de-duplication processing, so that fault tolerance is ensured.

本実施形態に係る情報処理システム１においては、フロントエンドサーバ２とフラグメントサーバ３は通常のＩＰネットワークで接続される。そのため、例えば繁忙期だけはフロントエンドサーバ２を社内クラウド上の仮想マシンなどで一時的に補うといった運用も容易となる。また、フラグメントサーバ３の追加又は削除を行う際には、ディスク選択インデックスファイル２１３をメンテナンスするだけでよいため、ユーザ所望の容量を得ることも容易である。すなわち、必要に応じてシステム構成の拡張又は縮小を行うことができるため、システム導入時に将来の処理量増加を考慮してあらかじめ大規模なシステムを導入する必要がなくなり、情報処理システムの導入コストを削減することができる。 In the information processing system 1 according to the present embodiment, the front end server 2 and the fragment server 3 are connected by a normal IP network. For this reason, for example, during the busy season, the operation of temporarily supplementing the front-end server 2 with a virtual machine on the in-house cloud is also facilitated. Further, when adding or deleting the fragment server 3, it is only necessary to maintain the disk selection index file 213, so that it is easy to obtain the capacity desired by the user. In other words, since the system configuration can be expanded or reduced as necessary, it is no longer necessary to introduce a large-scale system in advance in consideration of future increases in processing volume when introducing the system. Can be reduced.

本実施形態に係る情報処理システム１において、例えば構成要素をフロントエンドサーバ２とフラグメントサーバ３との２種類とすることにより、容易に故障原因を分析することができる。 In the information processing system 1 according to the present embodiment, the cause of the failure can be easily analyzed by using, for example, two types of components, that is, the front-end server 2 and the fragment server 3.

また、ユーザによって保存されるデータは最終的にフラグメントサーバ３にのみ保存される。すなわち、データを保持する構成要素が１種類に限られるため、障害時の復旧及び分析を容易化することができる。 Further, data stored by the user is finally stored only in the fragment server 3. That is, since the number of components that hold data is limited to one, recovery and analysis at the time of failure can be facilitated.

本実施形態に係る情報処理システム１においては、信頼性の確保のため消失符号を利用する。従って、大規模ストレージに対して用いられるデータを複製することにより信頼性を確保する手法（例えば、ＲＡＩＤ（Redundant Arrays of Inexpensive Disks）１など）よりも、データの記憶効率を向上させることができる。 In the information processing system 1 according to the present embodiment, an erasure code is used to ensure reliability. Therefore, the data storage efficiency can be improved as compared with a technique (for example, RAID (Redundant Arrays of Inexpensive Disks) 1) that ensures reliability by duplicating data used for a large-scale storage.

本実施形態においては、例えばハッシュ値の比較によりデータの一致又は不一致を判断している。しかしながら、効率的に処理可能であれば、他の手法によりデータの一致又は不一致を判断してもよい。例えば、ハッシュ値に代えて、データの一致又は不一致を判断するための他の情報を用いてもよい。 In the present embodiment, data match or mismatch is determined by comparing hash values, for example. However, data matching or mismatching may be determined by other methods as long as efficient processing is possible. For example, instead of the hash value, other information for determining whether data matches or does not match may be used.

本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 Although several embodiments of the present invention have been described, these embodiments are presented by way of example and are not intended to limit the scope of the invention. These novel embodiments can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the scope of the invention. These embodiments and modifications thereof are included in the scope and gist of the invention, and are included in the invention described in the claims and the equivalents thereof.

１…情報処理システム、２…フロントエンドサーバ、３…フラグメントサーバ、２０，３０…コントローラ、２１，３１…記憶部、２２，３２…送受信部、２３，３３…プロセッサ、２４，３４…メモリ、２５，３５…制御部、２０１…オブジェクト分割部、２０２…消失符号計算部、２０３…フィンガープリント計算部、２１１…オブジェクト記憶部、２１２，３１１…フィンガープリントセット、２１３…ディスク選択インデックスファイル、３０１…重複排除処理部、３１２…フラグメント記憶部、３１３…重複位置情報。 DESCRIPTION OF SYMBOLS 1 ... Information processing system, 2 ... Front end server, 3 ... Fragment server, 20, 30 ... Controller, 21, 31 ... Storage part, 22, 32 ... Transmission / reception part, 23, 33 ... Processor, 24, 34 ... Memory, 25 , 35 ... control unit, 201 ... object division unit, 202 ... erasure code calculation unit, 203 ... fingerprint calculation unit, 211 ... object storage unit, 212, 311 ... fingerprint set, 213 ... disk selection index file, 301 ... duplication Exclusion processing unit, 312... Fragment storage unit, 313.

Claims

Comprising a first information processing device and a plurality of second information processing devices;
The first information processing apparatus includes:
A generator that generates a plurality of storage data of a storage unit size based on the data;
A first hash value corresponding to first fragment data included in the data and having a duplicate detection size larger than the storage unit size is calculated, and a first hash value included in the first fragment data is calculated. A calculation unit for calculating a second hash value corresponding to at least a part of the stored data;
A transmission unit that transmits the first storage data, the first hash value, and the second hash value to a storage destination device among the plurality of second information processing devices;
Comprising
The storage destination device is:
Receiving the first storage data, the first hash value, and the second hash value, and the second storage of the storage unit size generated by the external information processing apparatus based on the storage target data A receiver for receiving data;
Whether the second hash value corresponding to at least a part of the first stored data matches a third hash value corresponding to at least a part of the second stored data received by the receiving unit If the second hash value matches the third hash value, the first hash value corresponding to the first fragment data is the second information processing device. And whether the first hash value matches the fourth hash value corresponding to the second fragment data having the second stored data and corresponding to the second fragment data of the duplicate detection size. A processor that detects duplication of the second fragment data when the value matches,
An information processing system comprising:

When the first hash value matches the fourth hash value, the processing unit transmits overlapping position information indicating a matching position in the storage target data to the external information processing apparatus,
The external information processing device generates a plurality of new fragment data for the storage target data based on the overlap position information so that an overlap portion is included in the same fragment data, and the plurality of new fragments Generating a plurality of new storage data based on the data, transmitting the plurality of new storage data to the plurality of second information processing devices;
The receiving unit receives the new saved data,
The processing unit performs deduplication of the new storage data when the new storage data matches the first storage data.
The information processing system according to claim 1.

The first information processing apparatus includes:
A storage unit for storing selection information in which a reference hash value and storage destination identification information determined corresponding to the reference hash value are associated;
A control unit that determines a second information processing apparatus that is a storage destination of each of the plurality of storage data based on the first hash value and the selection information;
Further comprising
The transmission unit transmits each of the plurality of stored data to the determined second information processing apparatus;
The information processing system according to claim 1.

The processing unit calculates the third hash value corresponding to at least a part of the second stored data based on a rolling hash function while shifting a calculation range in the second stored data.
The information processing system according to claim 1.

The second hash value includes a hash value of the first half and a hash value of the second half of the first stored data included in the first fragment data,
The processing unit determines whether the first hash value matches the fourth hash value when the hash value of the first half part or the hash value of the second half part matches the third hash value. And when the first hash value matches the fourth hash value, the duplication with respect to the second fragment data is detected.
The information processing system according to claim 1.

When the second hash value matches the third hash value, the processing unit calculates the fourth hash value corresponding to the second fragment data corresponding to the duplicate detection size starting from the matching position. Calculating and detecting duplication for the second fragment data when the first hash value matches the fourth hash value;
The information processing system according to claim 1.

The generating unit generates the plurality of stored data including a plurality of divided data obtained by dividing the storage target data and error correction data corresponding to the plurality of divided data;
The information processing system according to claim 1.

Corresponding to first storage data among a plurality of storage data having a storage unit size generated based on the data, and first fragment data having a duplicate detection size included in the data and larger than the storage unit size And a second hash value corresponding to at least a part of the first stored data included in the first fragment data and stored by an external information processing apparatus A receiving unit for receiving second storage data of the storage unit size generated based on data;
Whether the second hash value corresponding to at least a part of the first stored data matches a third hash value corresponding to at least a part of the second stored data received by the receiving unit If the second hash value matches the third hash value, the first hash value corresponding to the first fragment data is stored in a plurality of information processing devices. And determining whether the second hash value matches the fourth hash value corresponding to the second fragment data of the duplicate detection size, and the first hash value matches the fourth hash value. A processor for detecting duplication with respect to the second fragment data,
An information processing apparatus comprising:

Computer
Corresponding to first storage data among a plurality of storage data having a storage unit size generated based on the data, and first fragment data having a duplicate detection size included in the data and larger than the storage unit size And a second hash value corresponding to at least a part of the first stored data included in the first fragment data and stored by an external information processing apparatus A receiving unit for receiving second storage data of the storage unit size generated based on data;
Whether the second hash value corresponding to at least a part of the first stored data matches a third hash value corresponding to at least a part of the second stored data received by the receiving unit If the second hash value matches the third hash value, the first hash value corresponding to the first fragment data is stored in a plurality of information processing devices. And determining whether the second hash value matches the fourth hash value corresponding to the second fragment data of the duplicate detection size, and the first hash value matches the fourth hash value. A processor for detecting duplication with respect to the second fragment data,
Program to make it function.