JP2018173802A

JP2018173802A - Storage system, data sorting method, and program

Info

Publication number: JP2018173802A
Application number: JP2017071246A
Authority: JP
Inventors: 阿部　裕司; Yuji Abe; 裕司阿部
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2017-03-31
Filing date: 2017-03-31
Publication date: 2018-11-08
Anticipated expiration: 2037-03-31
Also published as: JP6853506B2

Abstract

PROBLEM TO BE SOLVED: To provide a storage system capable of improving reading performance, a data sorting method, and a program.SOLUTION: The data generated by dividing one parent data into a plurality of data is called child data. A storage system: sequentially stores child data in a file of a storage device with parent data different to each other as a division source; stores the position read lastly from the file by file read processing in the storage device according to the execution of file read processing; and subjects the child file stored after the position readout to sort processing for sorting child data in the file.SELECTED DRAWING: Figure 1

Description

本発明は、ストレージシステムに関し、特に、ストレージシステムで行う再格納処理に関する。 The present invention relates to a storage system, and more particularly to a re-store process performed in the storage system.

例えばバックアップストレージのようなストレージシステムには、読み出し性能の向上を図るため、格納したストリームデータを連続の並びになるようにソートして再配置乃至再格納する処理を実行するものがある。 For example, some storage systems, such as backup storage, execute a process of rearranging or re-storing the stored stream data so as to be continuously arranged in order to improve read performance.

ストレージシステムの負荷が高い状態にあるとき、ソート処理の完了には時間がかかる。このため、ストレージシステムに高負荷の状態が続くと、ソート処理が大量に滞留することがある。このようなとき、ストレージシステムで読み出し処理を実行すると、ソート処理を未処理のデータがストレージシステムの記憶領域上に大量に存在するので、読み出し処理に通常よりも長い時間がかかる。 When the load on the storage system is high, it takes time to complete the sort process. For this reason, if the storage system continues to be heavily loaded, a large amount of sort processing may remain. In such a case, if a read process is executed in the storage system, a large amount of unprocessed data exists in the storage area of the storage system, so the read process takes longer than usual.

本発明に関連するものとして特許文献１を挙げる。特許文献1は、複数ストリームで書き込んだデータの読み出し性能の向上を図るため、ストリームデータをバッファメモリからデータ記憶装置に書き出す時に、同じストリームIDを持つデータを連続配置することを開示している。 Patent document 1 is given as a thing relevant to this invention. Patent Document 1 discloses that data having the same stream ID is continuously arranged when the stream data is written from the buffer memory to the data storage device in order to improve the reading performance of the data written in a plurality of streams.

バッファメモリを超えるデータについては断片化され、例えば図９のようなデータ配置状態となる。F1〜F12はデータ格納領域を表し、利用者が書き込んだデータを上から順に格納する。ST1〜ST3は、複数のバックアップ対象データを区別するために割り当てた識別するストリームIDであり、図９においては、それぞれ利用者によって書き込まれた異なるデータを表す。この状態から、更なるデータ読み出し性能の向上を図るため、同じストリームIDを持つデータをソートする処理を再格納処理という。 The data exceeding the buffer memory is fragmented and becomes, for example, a data arrangement state as shown in FIG. F1 to F12 represent data storage areas, in which data written by the user is stored in order from the top. ST1 to ST3 are stream IDs that are assigned to distinguish a plurality of pieces of backup target data. In FIG. 9, each represents different data written by the user. From this state, in order to further improve the data reading performance, the process of sorting data having the same stream ID is referred to as a re-storing process.

特許第５４１３９４８号公報Japanese Patent No. 5413948

一般に、再格納処理はシステムリソースに空きがある時に行われる。外部からの負荷がある場合、再格納処理の実行を待つ。システムの負荷が高い場合、再格納処理の実行前に、読み出し処理が実行される場合がある。 Generally, the re-storing process is performed when system resources are available. When there is a load from the outside, it waits for execution of the re-storing process. When the system load is high, the read process may be executed before the re-store process.

図９のようにデータを格納したストレージシステムにおいて、再格納処理が途中まで進行したところで、図１０に示すように読み出し処理が発生したとする。図１０の例では、データ格納フォルダＦ１〜Ｆ１２それぞれの先頭から再格納処理を実行し、ＰＯＳ_Ｐまで完了している。一方、読み出し処理はＰＯＳ_Ｒまで進んでいる。データはＰＯＳ_Ｅまで格納され、再格納処理を待っている状況にある。 In the storage system storing data as shown in FIG. 9, it is assumed that a read process occurs as shown in FIG. In the example of FIG. 10, the re-storing process is executed from the head of each of the data storage folders F1 to F12, and the process up to POS _{P is} completed. On the other hand, the reading process proceeds to POS _R. Data is stored up to POS _E and is waiting for a re-store process.

この状況ではＰＯＳ_ＲがＰＯＳ_Ｐに先行している。この状況から、ＰＯＳ_ＰがＰＯＳ_Ｒを追い抜いて、進行具合が逆転することは通常は起きない。このため、以後の読み出し処理は、常に、再格納処理が未処理の状態にあるデータに対して行われる。その結果、ストレージシステムの読み出し性能に悪影響を与えている。 In this situation, POS _R precedes POS _P. From this situation, it is not normal for POS _P to overtake POS _R and reverse the progress. For this reason, subsequent read processing is always performed on data that has not yet been re-stored. As a result, the read performance of the storage system is adversely affected.

本発明はこのような状況を鑑みてなされたものであり、本発明が解決しようとする課題は、読み出し性能の向上を図ることが可能なストレージシステム、データソート方法及びプログラムを提供することである。 The present invention has been made in view of such a situation, and a problem to be solved by the present invention is to provide a storage system, a data sort method, and a program capable of improving read performance. .

上述の課題を解決するため、本発明は、その一態様として、ひとつの親データを複数に分割して生成したデータを子データと呼ぶとき、互いに異なる親データを分割元とする子データを順番に記憶装置内のファイルに格納する手段、前記ファイルの読み出し処理の実行に応じて、当該読み出し処理が前記ファイルから最後に読み出した位置を記憶装置に記憶する手段、前記ファイル内で前記子データをソートするソート処理を、前記位置以後に格納されている前記子ファイルに対して行う手段を備えるストレージシステムを提供する。 In order to solve the above-mentioned problems, the present invention, as one aspect thereof, in the case where data generated by dividing one parent data into a plurality of pieces is called child data, the child data having different parent data as a division source is ordered. Means for storing in a file in the storage device, means for storing in the storage device the position read last from the file by the read processing in response to execution of the file read processing, and storing the child data in the file There is provided a storage system comprising means for performing a sorting process for sorting on the child files stored after the position.

また、本発明は、他の一態様として、ひとつの親データを複数に分割して生成したデータを子データと呼ぶとき、互いに異なる親データを分割元とする子データを順番に記憶装置内のファイルに格納する段階、前記ファイルの読み出し処理の実行に応じて、当該読み出し処理が前記ファイルから最後に読み出した位置を記憶装置に記憶する段階、前記ファイル内で前記子データをソートするソート処理を、前記位置以後に格納されている前記子ファイルに対して行う段階を含む、ストレージシステムに記憶したファイル内のデータをソートする方法を提供する。 Further, according to another aspect of the present invention, when data generated by dividing one parent data into a plurality of pieces is called child data, child data having different parent data as a division source is sequentially stored in the storage device. A step of storing in a file; a step of storing in a storage device a position read last from the file by the reading process in accordance with execution of the reading process of the file; a sorting process for sorting the child data in the file; A method for sorting data in a file stored in a storage system, including the step of performing the process on the child file stored after the location.

また、本発明は、他の一態様として、ひとつの親データを複数に分割して生成したデータを子データと呼ぶとき、互いに異なる親データを分割元とする子データを順番に記憶装置内のファイルに格納する手段、前記ファイルの読み出し処理の実行に応じて、当該読み出し処理が前記ファイルから最後に読み出した位置を記憶装置に記憶する手段、前記ファイル内で前記子データをソートするソート処理を、前記位置以後に格納されている前記子ファイルに対して行う手段をコンピュータに実現させるためのプログラムを提供する。 Further, according to another aspect of the present invention, when data generated by dividing one parent data into a plurality of pieces is called child data, child data having different parent data as a division source is sequentially stored in the storage device. Means for storing in a file; means for storing in a storage device the position read last from the file by the read process in accordance with execution of the file read process; and a sort process for sorting the child data in the file A program for causing a computer to realize means for the child file stored after the location is provided.

本発明によれば、ストレージシステムの読み出し性能の向上を図ることができる。 According to the present invention, it is possible to improve the read performance of the storage system.

本発明の第１実施形態に係るストレージシステム１のブロック図である。1 is a block diagram of a storage system 1 according to a first embodiment of the present invention. 親データD1、D2、D3とそれぞれの子データとの関係を説明するための図である。It is a figure for demonstrating the relationship between parent data D1, D2, D3 and each child data. ストレージシステム１の記憶装置31に格納したファイルF内における子データの初期配置の例を説明するための図である。4 is a diagram for explaining an example of initial arrangement of child data in a file F stored in a storage device 31 of the storage system 1. FIG. ストレージシステム１において、ファイルFを読み出し位置POS_Rまで読み出した後に実行する再格納処理（ソート処理）について説明するための図である。In the storage system 1 is a diagram for describing re-storing process (sorting) to be executed after reading to the position POS _R reads the file F. ストレージシステム１において、再格納処理を実行した後のファイルF内のデータ配置の例を説明するための図である。6 is a diagram for explaining an example of data arrangement in a file F after executing a re-storing process in the storage system 1. FIG. 本発明の第２実施形態に係るストレージシステム１０のブロック図である。It is a block diagram of the storage system 10 which concerns on 2nd Embodiment of this invention. ストレージシステム１０の動作を説明するためのフローチャートである。4 is a flowchart for explaining the operation of the storage system 10. 再格納処理の終了／再開位置ＰＯＳ_Ｐの値を、読み出し済みポインタの位置ＰＯＳ_Ｒの値で更新した後の、読み出し処理と再格納処理の進行について説明するための図である。The value of the termination / restart position POS _P of restoring process, after updating the value of the position POS _R of the read pre pointer is a diagram for explaining the progress of the reading process and re-storing process. 特許文献１のストレージシステムに、３つのストリームデータＳＴ１、ＳＴ２、ＳＴ３を格納したときの、記憶装置内でのデータ配置の例を示す図である。It is a figure which shows the example of a data arrangement | positioning in a memory | storage device when three stream data ST1, ST2, ST3 are stored in the storage system of patent document 1. FIG. 特許文献１のストレージシステムが図９の配置でデータを格納した状態にある場合における、読み出し処理と再格納処理の進行について説明するための図である。FIG. 10 is a diagram for explaining the progress of a read process and a restore process when the storage system of Patent Document 1 is in a state where data is stored in the arrangement of FIG. 9.

［第１実施形態］
本発明の第１実施形態に係るストレージシステム１について説明する。図１を参照すると、ストレージシステム１は、記憶装置３１、分割記憶制御部２５、読み出し済み情報記憶部３３、格納位置管理部２６を備える。 [First Embodiment]
The storage system 1 according to the first embodiment of the present invention will be described. Referring to FIG. 1, the storage system 1 includes a storage device 31, a divided storage control unit 25, a read information storage unit 33, and a storage location management unit 26.

分散記憶制御部２５は、ストレージシステム１に入力されたデータを記憶装置３１に書き込む。その際、分散記憶制御部２５はひとつのデータを複数のデータに分割して書き込みを行う。 The distributed storage control unit 25 writes the data input to the storage system 1 in the storage device 31. At this time, the distributed storage control unit 25 divides one data into a plurality of data and writes the data.

分割前後のデータの関係について説明する。ここでは分割前のデータを親データと呼び、分割後のデータを子データと呼ぶものとする。図２を参照すると、親データD1は子データd1-1、d1-2、…、d1-Lからなる（Lは自然数）。各子データは、入力された順に親データD1を所定のサイズで区切ったものである。同様に、親データD2は子データd2-1、d2-2、…、d2-M（Mは自然数）からなり、親データD3は子データd3-1、d3-2、…、d3-N（Nは自然数）からなる。 The relationship between the data before and after the division will be described. Here, data before division is called parent data, and data after division is called child data. Referring to FIG. 2, the parent data D1 includes child data d1-1, d1-2,..., D1-L (L is a natural number). Each child data is obtained by dividing the parent data D1 by a predetermined size in the input order. Similarly, the parent data D2 includes child data d2-1, d2-2,..., D2-M (M is a natural number), and the parent data D3 includes child data d3-1, d3-2,. N is a natural number).

ストレージシステム１に複数のデータが並行して入力される場合を考える。分散記憶制御部２５は、入力された順序に従って記憶装置３１に書き込む。その際、親データ単位での入力順序ではなく、子データ単位での入力順序に従って記憶装置３１に書き込む。書き込みの直後、子データは例えば図３のように配置される。図３に示すように、分散記憶制御部２５は、ファイルFの先頭から末尾に向かって、その子データが入力された順序に従ってファイルFに書き込む。 Consider a case where a plurality of data is input to the storage system 1 in parallel. The distributed storage control unit 25 writes in the storage device 31 according to the input order. At this time, the data is written in the storage device 31 according to the input order in the child data unit, not in the input order in the parent data unit. Immediately after the writing, the child data is arranged as shown in FIG. 3, for example. As shown in FIG. 3, the distributed storage control unit 25 writes the file F in the file F according to the input order of the child data from the head to the end of the file F.

今、図３のファイルFに対して読み出し処理が発生し、図４に示すように、ファイルFを先頭から順に位置POS_Rまで読み出したとする。このとき、格納位置管理部２６は、ファイルの読み出し処理の実行に応じて、その読み出し処理がそのファイルから最後に読み出した位置POS_Rを読み出し済み情報記憶部３３に記憶する。 Now, the reading process is generated for the file F of FIG. 3, as shown in FIG. 4, and reads the file F from the head to position POS _R in order. In this case, the storage position management unit 26, in response to execution of the file reading process, the reading process is last read stored in the position POS _R reads completion information storage unit 33 from the file.

このとき、分散記憶制御部２５は、ファイル内で子データをソートするソート処理、即ち、再格納処理を、読み出し済み情報記憶部３３に記憶した位置以後に格納されている子ファイルに対して行う。再格納処理は、ファイルF内における子ファイルの配置を整理することにより、ファイルFの読み出し速度の改善を図るためのものである。 At this time, the distributed storage control unit 25 performs a sorting process for sorting the child data in the file, that is, a re-storing process on the child file stored after the position stored in the read information storage unit 33. . The re-storing process is for improving the reading speed of the file F by arranging the arrangement of the child files in the file F.

再格納処理後のファイルF内のデータ配置の例を図５に示す。本例では、位置POS_R以後、まず、分割元の親データによって各子データがまとめて配置されている。親データD1を分割元とする子データd1-9、d1-10が連続して格納され、次に、親データD2を分割元とする子データd2-6, d2-7が連続して格納され、最後に、親データD3を分割元とする子データd3-5, d3-6, …,d3-9が連続して格納されている。また、同じ親データを分割元とする子データは、入力順序に応じた順序に配置されている。 An example of the data arrangement in the file F after the re-storage process is shown in FIG. In this example, position POS _R after, firstly, each child data are arranged together by dividing original parent data. Child data d1-9 and d1-10 with parent data D1 as the division source are stored in succession, and then child data d2-6 and d2-7 with parent data D2 as the division source are stored in succession. Finally, child data d3-5, d3-6,..., D3-9 having the parent data D3 as a division source are successively stored. Further, the child data having the same parent data as the division source is arranged in an order corresponding to the input order.

このように、読み出し処理が発生したとき、その最後の読み出し位置POS_Rを記憶し、その位置以後に格納されているデータに対して再格納処理を実行することにより、読み出し位置POS_R以後に格納されているデータの読み出し処理を高速化することができる。 Thus storage, when the reading process has occurred, storing the last read position POS _R, by executing the re-storing process to the data stored at that location after, the reading position POS _R after It is possible to speed up the reading process of the data that has been read.

［第２実施形態］
本発明の第２実施形態に係るストレージシステム１０について説明する。図６を参照すると、ストレージシステム１０はデータ処理装置２０及びデータ記憶装置３０を備える。ストレージシステム１０は単独或いは複数のコンピュータによって構成されるシステムである。データ処理装置２０及びデータ記憶装置３０を単独のコンピュータによって構成してもよいし、データ処理装置２０及びデータ記憶装置３０をそれぞれ別のコンピュータとして構成することとしてもよい。更に、データ処理装置２０及びデータ記憶装置３０をそれぞれ複数のコンピュータにより構成することとしてもよい。 [Second Embodiment]
A storage system 10 according to the second embodiment of the present invention will be described. Referring to FIG. 6, the storage system 10 includes a data processing device 20 and a data storage device 30. The storage system 10 is a system constituted by a single computer or a plurality of computers. The data processing device 20 and the data storage device 30 may be configured by a single computer, or the data processing device 20 and the data storage device 30 may be configured as separate computers. Furthermore, each of the data processing device 20 and the data storage device 30 may be configured by a plurality of computers.

ストレージシステム１０はコンテンツアドレスストレージシステムである。コンテンツアドレスストレージシステムでは、データを分割及び冗長化し、分散して複数の記憶装置に記憶する。また、コンテンツアドレスストレージシステムでは、記憶するデータの内容に応じて設定される固有のコンテンツアドレスによって、当該データを格納した格納位置を特定する。 The storage system 10 is a content address storage system. In the content address storage system, data is divided and made redundant, distributed and stored in a plurality of storage devices. In the content address storage system, the storage location where the data is stored is specified by a unique content address set according to the content of the stored data.

データ処理装置２０は、データ記憶装置３０に対するデータの記憶及び読み出し動作を制御する。データ記憶装置３０はデータを記憶する記憶媒体を備える装置である。データ処理装置２０、データ記憶装置３０をそれぞれ単独のコンピュータとして構成する場合、データ処理装置２０、データ記憶装置３０をそれぞれ例えばサーバコンピュータ装置にて構成することが考えられる。 The data processing device 20 controls data storage and reading operations with respect to the data storage device 30. The data storage device 30 is a device that includes a storage medium that stores data. When the data processing device 20 and the data storage device 30 are each configured as a single computer, it is conceivable that the data processing device 20 and the data storage device 30 are each configured by, for example, a server computer device.

データ処理装置２０は、ストリームID(Identifier)付与部２１、ブロック生成部２２、重複チェック部２３、フラグメント生成部２４、分散記憶制御部２５、格納位置管理部２６を備える。データ記憶装置３０は、複数の記憶装置３１、格納位置情報記憶部３２、読み出し済み情報記憶部３３を備える。 The data processing device 20 includes a stream ID (Identifier) assigning unit 21, a block generation unit 22, a duplication check unit 23, a fragment generation unit 24, a distributed storage control unit 25, and a storage location management unit 26. The data storage device 30 includes a plurality of storage devices 31, a storage position information storage unit 32, and a read information storage unit 33.

ストリームID付与部２１は、一群のデータである対象データＡの入力を受けると、当該対象データＡを区別する識別情報であるストリームIDを付与する。例えば、対象データＡに対してストリームID=ST1を付与し、対象データＢ（図示せず）に対してストリームID=ST2を付与する。 When receiving the input of the target data A that is a group of data, the stream ID assigning unit 21 assigns a stream ID that is identification information for distinguishing the target data A. For example, stream ID = ST1 is assigned to the target data A, and stream ID = ST2 is assigned to the target data B (not shown).

ブロック生成部２２は、対象データＡの入力を受けると、当該対象データＡを所定容量（例えば、６４ＫＢ）のブロックデータＤに分割する。 When receiving the input of the target data A, the block generation unit 22 divides the target data A into block data D having a predetermined capacity (for example, 64 KB).

また、ブロック生成部２２は、ブロックデータＤのデータ内容に基づいて、当該ブロックデータＤの内容を代表する固有のハッシュ値Ｈ（内容識別情報）を算出する。例えば、ハッシュ値Ｈは、予め設定されたハッシュ関数を用いて、ブロックデータＤのデータ内容から算出する。 Further, the block generation unit 22 calculates a unique hash value H (content identification information) representing the content of the block data D based on the data content of the block data D. For example, the hash value H is calculated from the data content of the block data D using a preset hash function.

更に、ブロック生成部２２は、対象データＡに付与したストリームIDを、その対象データＡを分割して生成した各ブロックデータＤにも引き継いで付与する。 Furthermore, the block generation unit 22 inherits and assigns the stream ID assigned to the target data A to each block data D generated by dividing the target data A.

重複チェック部２３は、対象データＡのブロックデータＤのハッシュ値Ｈを用いて、当該ブロックデータＤが既に記憶装置３１に格納されているか否かを調べる。 The duplication check unit 23 uses the hash value H of the block data D of the target data A to check whether the block data D is already stored in the storage device 31.

具体的には、まず、既に格納されているブロックデータＤは、そのハッシュ値Ｈと格納位置を表すコンテンツアドレスＣＡが、関連付けてＭＦＩ（ＭａｉｎＦｒａｇｍｅｎｔＩｎｄｅｘ）ファイルに登録されている。従って、重複チェック部２３は、格納前に算出したブロックデータＤのハッシュ値ＨがＭＦＩファイル内に存在している場合には、既に同一内容のブロックデータＤが格納されていると判断できる。 Specifically, first, the block data D that has already been stored has its hash value H and a content address CA representing the storage position associated with each other and registered in an MFI (Main Fragment Index) file. Therefore, when the hash value H of the block data D calculated before storage is present in the MFI file, the duplication check unit 23 can determine that the block data D having the same content has already been stored.

この場合には、格納前のブロックデータＤのハッシュ値Ｈと一致したＭＦＩ内のハッシュ値Ｈに関連付けられているコンテンツアドレスＣＡを、当該ＭＦＩファイルから取得する。そして、このコンテンツアドレスＣＡを、記憶要求にかかるブロックデータＤのコンテンツアドレスＣＡとして返却する。 In this case, the content address CA associated with the hash value H in the MFI that matches the hash value H of the block data D before storage is acquired from the MFI file. Then, this content address CA is returned as the content address CA of the block data D relating to the storage request.

これにより、このコンテンツアドレスＣＡにて参照される既に格納されているデータが、記憶要求されたブロックデータＤとして使用されることとなり、当該記憶要求にかかるブロックデータＤは記憶する必要がなくなる。 As a result, the already stored data referred to by the content address CA is used as the block data D requested to be stored, and the block data D related to the storage request need not be stored.

フラグメント生成部２４は、重複チェック部２３にてまだ記憶されていないと判断されたブロックデータＤを、圧縮して、複数の所定の容量のフラグメントデータに分割する。例えば、符号Ｄ１〜Ｄ９にそれぞれ対応する９つのフラグメントデータに分割する。 The fragment generation unit 24 compresses the block data D determined not to be stored yet by the duplication check unit 23 and divides it into a plurality of fragment data having a predetermined capacity. For example, the data is divided into nine fragment data corresponding to the codes D1 to D9.

また、フラグメント生成部２４は、分割したフラグメントデータのうちいくつかが欠けた場合であっても、元となるブロックデータを復元可能なよう冗長データを生成する。フラグメント生成部２４は生成した冗長データをフラグメントデータに追加して、ひとつのデータセットを生成する。 In addition, the fragment generation unit 24 generates redundant data so that the original block data can be restored even if some of the divided fragment data is missing. The fragment generation unit 24 adds the generated redundant data to the fragment data to generate one data set.

例えば、フラグメント生成部２４は、９つのフラグメントデータＤ１〜Ｄ９に対して、符号Ｄ１０〜Ｄ１２にそれぞれ対応する３つの冗長データを追加して、ひとつのデータセットを生成する。つまり、フラグメント生成部２４は、重複チェック部２３が未記憶と判断したブロックデータＤに相当するデータセットを生成する。以下、フラグメントデータＤ１〜Ｄ９のように、ブロックデータＤを圧縮、分割して生成したデータと、冗長データＤ１０〜Ｄ１２のように、冗長化を実現するために生成したデータの両方を、総称してフラグメントデータと呼ぶことがある。フラグメントデータＤ１〜Ｄ１２は同じ容量になるように作成されることが好ましい。 For example, the fragment generation unit 24 adds three redundant data corresponding to the codes D10 to D12 to the nine fragment data D1 to D9 to generate one data set. That is, the fragment generation unit 24 generates a data set corresponding to the block data D that the duplication check unit 23 determines to be unstored. Hereinafter, both of data generated by compressing and dividing block data D such as fragment data D1 to D9 and data generated for realizing redundancy such as redundant data D10 to D12 are generically referred to. Sometimes called fragment data. The fragment data D1 to D12 are preferably created so as to have the same capacity.

更に、フラグメント生成部２４は、生成した全てのフラグメントデータＤ１〜Ｄ１２に、当該フラグメントデータの元となるブロックデータＤ、つまり、フラグメントデータＤ１〜Ｄ１２から復元されるブロックデータＤに付与されたストリームIDを、それぞれ付与する。 Further, the fragment generation unit 24 applies to all the generated fragment data D1 to D12 the block data D that is the original of the fragment data, that is, the stream ID assigned to the block data D restored from the fragment data D1 to D12. Are given respectively.

分散記憶制御部２５は、フラグメント生成部２４にて生成されたデータセットを構成する各フラグメントデータを、記憶装置３１に形成された各記憶領域に、それぞれ分散して格納する。 The distributed storage control unit 25 stores each fragment data constituting the data set generated by the fragment generation unit 24 in a distributed manner in each storage area formed in the storage device 31.

例えば、ストレージシステム１０が複数の記憶装置３１として１２基の記憶装置を設けて、１基毎にデータ格納ファイルＦを用意する。このとき、ストレージシステム１０は１２個のデータ格納ファイルＦ１〜Ｆ１２をデータ格納領域として有する。分散記憶制御部２５は、これら１２個のデータ格納ファイルＦ１〜Ｆ１２のそれぞれに、フラグメントデータＤ１〜Ｄ１２のひとつを格納する。即ち、分散記憶制御部２５は、ある１基の記憶装置３１のデータ格納ファイルＦ１にフラグメントデータＤ１を格納し、別の１基の記憶装置３１のデータ格納ファイルＦ２にフラグメントデータＤ２を格納する。同一のデータセットをなす複数のフラグメントデータを、別々の記憶装置のデータ格納ファイルに格納する。 For example, the storage system 10 provides 12 storage devices as the plurality of storage devices 31 and prepares a data storage file F for each storage device. At this time, the storage system 10 has 12 data storage files F1 to F12 as data storage areas. The distributed storage control unit 25 stores one of the fragment data D1 to D12 in each of these twelve data storage files F1 to F12. That is, the distributed storage control unit 25 stores the fragment data D1 in the data storage file F1 of one storage device 31, and stores the fragment data D2 in the data storage file F2 of another storage device 31. A plurality of fragment data constituting the same data set are stored in data storage files of different storage devices.

また、分散記憶制御部２５は、同一データセットのフラグメントデータを別々の記憶装置３１に格納する際、各データ格納ファイル内の同じ位置に格納する。 Further, when storing the fragment data of the same data set in different storage devices 31, the distributed storage control unit 25 stores them at the same position in each data storage file.

例えば、上述の例のように、ストレージシステム１０が１２基の記憶装置３１を備え、各記憶装置３１にデータ格納ファイルＦ１〜Ｆ１２があり、あるブロックデータＤに対応するデータセットが、１２個のフラグメントデータＤ１〜Ｄ１２からなるとする。分散記憶制御部２５は、データ格納ファイルＦ１にフラグメントデータＤ１を格納する際、データ格納ファイルＦ１の先頭からＵビット（Ｕは自然数）目に格納したとする。このとき、分散記憶制御部２５は、データ格納ファイルＦ２にフラグメントデータＤ２を格納する際も、データ格納ファイルＦ２の先頭からＵビット目に格納する。他のフラグメントデータＤ３〜Ｄ１２の格納についても同様である。 For example, as in the above-described example, the storage system 10 includes 12 storage devices 31, each storage device 31 has data storage files F1 to F12, and a data set corresponding to a certain block data D includes 12 data sets It is assumed that the data consists of fragment data D1 to D12. When the distributed storage control unit 25 stores the fragment data D1 in the data storage file F1, it is assumed that it is stored in the U bit (U is a natural number) from the head of the data storage file F1. At this time, the distributed storage control unit 25 also stores the U-bit from the top of the data storage file F2 when storing the fragment data D2 in the data storage file F2. The same applies to the storage of the other fragment data D3 to D12.

ストレージシステム１０に複数の対象データを同時に格納する場合、分散記憶制御部２５は、異なる対象データに由来するデータセットを交互にデータ格納ファイルに格納することがある。例えば、ストレージシステム１０が同時期に３つの対象データの入力を受け付けたとする。ストリームＩＤ付与部２１はこれら３つの対象データに対して、ストリームＩＤとしてＳＴ１、ＳＴ２、ＳＴ３を順に付与したとする。このとき、データ格納フォルダＦ１〜Ｆ１２は例えば図７のようにフラグメントデータを格納する。 When storing a plurality of target data simultaneously in the storage system 10, the distributed storage control unit 25 may store data sets derived from different target data alternately in the data storage file. For example, it is assumed that the storage system 10 receives input of three target data at the same time. It is assumed that the stream ID assigning unit 21 assigns ST1, ST2, and ST3 as stream IDs in order to these three target data. At this time, the data storage folders F1 to F12 store fragment data as shown in FIG. 7, for example.

図７において、同一の対象データに由来するフラグメントデータには、同一の網掛けが施してある。即ち、ストリームＩＤ＝ＳＴ１のフラグメントデータには、右上から左下に向かう斜線の網掛けが施してある。ストリームＩＤ＝ＳＴ２のフラグメントデータには、格子状の網掛けが施してある。ストリームＩＤ＝ＳＴ３のフラグメントデータには、斜方格子状の網掛けが施してある。 In FIG. 7, fragment data derived from the same target data are given the same shading. In other words, the fragment data of stream ID = ST1 is shaded with diagonal lines from the upper right to the lower left. The fragment data of stream ID = ST2 is given a grid-like shade. The fragment data of stream ID = ST3 is hatched in an oblique lattice pattern.

また、分散記憶制御部２５は、各データ格納ファイルＦ１〜Ｆ１２に格納する前に、各フラグメントデータＤ１〜Ｄ１２をストリームＩＤ毎に区別して異なるバッファメモリに一旦格納する。そして、その後、バッファメモリ内のフラグメントデータＤ１〜Ｄ１２を各データ格納ファイルＦ１〜Ｆ１２に格納する。 Further, the distributed storage control unit 25 distinguishes the fragment data D1 to D12 for each stream ID and temporarily stores them in different buffer memories before storing them in the data storage files F1 to F12. Thereafter, the fragment data D1 to D12 in the buffer memory are stored in the data storage files F1 to F12.

また、分散記憶制御部２５は、上述したように、フラグメントデータをデータ格納ファイルに格納する時のみならず、すでに格納されたフラグメントデータに対しても、同一のストリームIDのフラグメントデータが連続して配置されるよう、事後的に格納位置を変更する機能も有する。例えば、ストレージシステム１０自体のリソースが所定値以上空いているときに、各データ格納ファイルＦ１〜Ｆ１２に格納されているフラグメントデータを、ストリームIDが同一のものが連続するよう格納位置を移動する。具体的には、各データ格納ファイルＦ１〜Ｆ１２の同一の格納位置（横一列）に格納されている全てのフラグメントデータの格納位置を、当該各フラグメントデータにて構成されるデータセット４０ごと、変更することができる。これにより、ストリームIDが異なるデータセットが交互に格納された状態から、同一のストリームIDのものが連続する格納状態にすることができる。 In addition, as described above, the distributed storage control unit 25 not only stores fragment data in the data storage file but also continuously stores fragment data with the same stream ID for already stored fragment data. It also has a function of changing the storage position afterwards so that it is arranged. For example, when the resources of the storage system 10 itself are vacant over a predetermined value, the storage positions of the fragment data stored in the data storage files F1 to F12 are moved so that the same stream IDs continue. Specifically, the storage position of all fragment data stored in the same storage position (horizontal row) of each data storage file F1 to F12 is changed for each data set 40 composed of the fragment data. can do. Thereby, it is possible to change from a state in which data sets having different stream IDs are alternately stored to a state in which data having the same stream ID are continuously stored.

また、分散記憶制御部２５は、データセット４０を構成する各フラグメントデータＤ１〜Ｄ１２に、同一のデータセット４０を構成していることを識別するために、同一の識別情報（WriteRecordSeqNum）を付与して、各データ格納ファイルＦ１〜Ｆ１２にそれぞれ格納する。そして、分散記憶制御部２５は、ストレージシステム１０のリソースが空いているときなど任意のタイミングで識別情報を調べる。これにより、同一のデータセット４０を構成している各フラグメントデータが、各データ格納ファイルＦ１〜Ｆ１２内の同一の格納位置に格納されているか、ということを調べることができる。従って、分散記憶制御部２５は、各データ格納ファイルＦ１〜Ｆ１２内の同一の格納位置に格納されている各フラグメントデータが、全て同一の識別情報（WriteRecordSeqNum）を含んでいなければ、当該各フラグメントデータが同一の格納位置に格納されるよう当該格納位置を修正して、データを再格納する。 In addition, the distributed storage control unit 25 gives the same identification information (WriteRecordSeqNum) to each fragment data D1 to D12 constituting the data set 40 in order to identify that the same data set 40 is constituted. And stored in the data storage files F1 to F12, respectively. Then, the distributed storage control unit 25 checks the identification information at an arbitrary timing such as when the resources of the storage system 10 are available. Thereby, it can be checked whether each fragment data which comprises the same data set 40 is stored in the same storage position in each data storage file F1-F12. Therefore, the distributed storage control unit 25 determines that each fragment data stored in the same storage position in each data storage file F1 to F12 does not include the same identification information (WriteRecordSeqNum). The storage position is corrected so that the data is stored in the same storage position, and the data is stored again.

更に、分散記憶制御部２５は、現に記憶装置３１に格納しているデータの読み出し処理を行う。 Further, the distributed storage control unit 25 performs a process of reading data currently stored in the storage device 31.

格納位置管理部２６は、上述したように記憶装置３１に格納したフラグメントデータＤ１〜Ｄ１２の格納位置、つまり、当該フラグメントデータＤ１〜Ｄ１２にて復元されるブロックデータＤの格納位置を表す、コンテンツアドレスＣＡを生成して管理する。具体的には、格納したブロックデータＤの内容に基づいて算出したハッシュ値Ｈの一部（ショートハッシュ）（例えば、ハッシュ値Ｈの先頭８Ｂ（バイト））と、論理格納位置を表す情報と、を組み合わせて、コンテンツアドレスＣＡを生成する。そして、このコンテンツアドレスＣＡを、ストレージシステム１０内のファイルシステムに返却する。すると、ファイルシステムは、対象データのファイル名などの識別情報と、コンテンツアドレスＣＡとを関連付けて管理する。 As described above, the storage location management unit 26 stores the fragment data D1 to D12 stored in the storage device 31, that is, the content address indicating the storage location of the block data D restored by the fragment data D1 to D12. Generate and manage CA. Specifically, a part of the hash value H (short hash) calculated based on the contents of the stored block data D (for example, the top 8B (bytes) of the hash value H), information indicating the logical storage position, Are combined to generate a content address CA. Then, the content address CA is returned to the file system in the storage system 10. Then, the file system manages the identification information such as the file name of the target data in association with the content address CA.

また、格納位置管理部２６は、ブロックデータＤのコンテンツアドレスＣＡと、当該ブロックデータＤのハッシュ値Ｈと、を関連付けて、各ストレージノード１０ＢがＭＦＩファイルにて管理する。このように、コンテンツアドレスＣＡは、ファイルを特定する情報やハッシュ値Ｈなどと関連付けられて、格納位置情報記憶部３２に格納される。 Further, the storage location management unit 26 associates the content address CA of the block data D with the hash value H of the block data D, and each storage node 10B manages it with the MFI file. As described above, the content address CA is stored in the storage position information storage unit 32 in association with information for specifying the file, the hash value H, and the like.

さらに、格納位置管理部２６は、上述したように格納した対象データを読み出す制御を行う。例えば、ストレージシステム１０に対して、特定のファイルを指定して読み出し要求があると、まず、ファイルシステムに基づいて、読み出し要求にかかるファイルに対応するハッシュ値の一部であるショートハッシュと論理位置の情報からなるコンテンツアドレスＣＡを指定する。そして、格納位置管理部２６は、コンテンツアドレスＣＡがＭＦＩファイルに登録されているか否かを調べる。登録されていなければ、要求されたデータは格納されていないため、エラーを返却する。 Furthermore, the storage location management unit 26 performs control to read the target data stored as described above. For example, when a read request is made by designating a specific file to the storage system 10, first, based on the file system, a short hash and a logical position that are part of a hash value corresponding to the file related to the read request A content address CA consisting of the above information is designated. Then, the storage location management unit 26 checks whether or not the content address CA is registered in the MFI file. If it is not registered, the requested data is not stored and an error is returned.

一方、読み出し要求にかかるコンテンツアドレスＣＡが登録されている場合には、コンテンツアドレスＣＡにて指定される格納位置を特定し、この特定された格納位置に格納されている各フラグメントデータを、読み出し要求されたデータとして読み出す。このとき、各フラグメントが格納されているデータ格納ファイルＦ１〜Ｆ１２と、当該データ格納ファイルのうち１つのフラグメントデータの格納位置が分かれば、同一の格納位置から他のフラグメントデータの格納位置を特定することができる。 On the other hand, if the content address CA related to the read request is registered, the storage location specified by the content address CA is specified, and each fragment data stored in the specified storage location is read out. As read data. At this time, if the data storage files F1 to F12 in which each fragment is stored and the storage position of one fragment data in the data storage file are known, the storage position of other fragment data is specified from the same storage position. be able to.

そして、格納位置管理部２６は、読み出し要求に応じて読み出した各フラグメントデータからブロックデータＤを復元する。さらに、格納位置管理部２５は、復元したブロックデータＤを複数連結し、ファイルＡなどの一群のデータに復元して、読み出し元に返却する。 Then, the storage location management unit 26 restores the block data D from each fragment data read in response to the read request. Furthermore, the storage location management unit 25 concatenates a plurality of restored block data D, restores them to a group of data such as file A, and returns them to the reading source.

データ記憶装置３０は、複数の記憶装置３１、格納位置情報記憶部３２、読み出し済み情報記憶部３３を備える。 The data storage device 30 includes a plurality of storage devices 31, a storage position information storage unit 32, and a read information storage unit 33.

記憶装置３１は、それぞれ、ストレージシステム１０に格納する対象となるデータを格納する記憶装置である。記憶装置３１として用いる記憶装置は例えばハードディスクドライブ、SSD (Solid State Drive)である。 Each storage device 31 is a storage device that stores data to be stored in the storage system 10. The storage device used as the storage device 31 is, for example, a hard disk drive or an SSD (Solid State Drive).

格納位置情報記憶部３２は、格納したデータの格納位置を記憶する記憶装置である。格納位置情報記憶部３２として用いる記憶装置は例えばハードディスクドライブ、SSDである。 The storage position information storage unit 32 is a storage device that stores the storage position of the stored data. A storage device used as the storage location information storage unit 32 is, for example, a hard disk drive or an SSD.

読み出し済み情報記憶部３３は、読み出し処理によるデータの読み出しを中断した位置を示す位置情報である、読み出し済みポインタＰＯＳ_Ｒを記憶する。また、読み出し済み情報記憶部３３は、再格納処理を終了した位置、即ち、次回の再格納処理を開始する位置ＰＯＳ_Ｐを記憶する。 The read information storage unit 33 stores a read pointer POS _R that is position information indicating a position where reading of data by the reading process is interrupted. Further, the read information storage unit 33 stores a position where the re-storing process is completed, that is, a position POS _{P at} which the next re-storing process is started.

次に、ストレージシステム１０の動作について説明する。ストレージシステム１０は、記憶装置３１に格納したデータのうち、読み出し処理が既に終了した位置を示す位置情報を保持する。この位置情報に基づいて、再格納処理を行っていないデータのうち、読み出しを既に行ったデータについては、再格納処理の優先度を下げる。読み出しがまだのデータを優先して再格納処理を行う。再格納処理を行っていない、未ソートのデータについては、例えば、当該データの位置情報を昇順のリストで管理する。 Next, the operation of the storage system 10 will be described. The storage system 10 holds position information indicating the position where the reading process has already been completed among the data stored in the storage device 31. Based on this position information, among the data that has not been re-stored, the priority of the re-store process is lowered for data that has already been read. Re-store processing is performed with priority given to data that has not been read. For unsorted data that has not been re-stored, for example, the position information of the data is managed in an ascending list.

図７を参照して説明する。分散記憶制御部２５は記憶装置３１の全データを対象とした読み出し処理を開始する（ステップＳ１）。このような処理の例としては、レプリケーションの際に行うマスタサイトからの読み出し処理がある。この種の読み出し処理では、記憶装置３１のある位置までの読み出し処理を終了した後、その終了した位置から読み出し処理を続けて行う。 This will be described with reference to FIG. The distributed storage control unit 25 starts reading processing for all data in the storage device 31 (step S1). As an example of such processing, there is read processing from the master site that is performed at the time of replication. In this type of read process, after the read process up to a certain position in the storage device 31 is completed, the read process is continued from the completed position.

レプリケーションでは、マスタサイト側のファイルに対してデータの書き込みが発生すると、書き込みに伴う差分データをマスタサイトからレプリカサイトに転送する。転送の対象は差分であるため、一度レプリケーションによって読み出しの対象となったデータは、以後は参照されない。このため、マスタサイト側のファイルにおいて読み出し処理の対象となったデータは、以後は読み出し処理の対象にならない。 In replication, when data is written to a file on the master site side, differential data accompanying the writing is transferred from the master site to the replica site. Since the transfer target is a difference, the data once read by replication is not referred to thereafter. For this reason, the data subjected to the read process in the file on the master site side is not subjected to the read process thereafter.

次に、分散記憶制御部２５は読み出し処理を終了する（ステップＳ２）。このときの位置を示すポインタを読み出し済みポインタと呼び、読み出し済みポインタの位置情報をＰＯＳ_Ｒとする。 Next, the distributed storage control unit 25 ends the reading process (step S2). The pointer indicating the position at this time is called a read pointer, and the position information of the read pointer is POS _R.

次に、格納位置管理部２６は、読み出し済ポインタ(位置情報)ＰＯＳ_Ｒを格納位置情報記憶部３２に保持する（ステップＳ３）。 Next, the storage location management unit 26 holds the read pointer (position information) POS _R in the storage location information storage unit 32 (step S3).

次に、格納位置管理部２６は、読み出し済みポインタの位置情報ＰＯＳ_Ｒ、及び、再格納処理の終了／再開位置ＰＯＳ_Ｐを、格納位置情報記憶部３２から読み出す（ステップＳ４）。 Next, the storage position management unit 26 reads the position information POS _R of the read pointer and the end / restart position POS _P of the re-storing process from the storage position information storage unit 32 (step S4).

次に、格納位置管理部２６は、格納位置情報記憶部３２から読み出した、位置ＰＯＳ_Ｒと位置ＰＯＳ_Ｐとを比較する（ステップＳ５）。 Next, the storage location management unit 26 compares the location POS _R and the location POS _P read from the storage location information storage unit 32 (step S5).

読み出し処理の進行方向において、位置ＰＯＳ_Ｒが位置ＰＯＳ_Ｐよりも前方にある場合、即ち、次の読み出し処理を開始する位置ＰＯＳ_Ｒが、再格納処理を再開する位置ＰＯＳ_Ｐよりも、データ格納フォルダＦ１〜Ｆ１２の末尾側にある場合、格納位置管理部２６は、格納位置情報記憶部３２に格納している再格納処理の終了／再開位置ＰＯＳ_Ｐの値を、読み出し済みポインタの位置ＰＯＳ_Ｒの値で更新する（ステップＳ６）。つまり、再格納処理を再開する位置ＰＯＳ_Ｐを、次の読み出し処理を開始する位置ＰＯＳ_Ｒまで飛ばす。 In the traveling direction of the reading process, when the position POS _R is in the forward of the position POS _P, i.e., the position POS _R to start the next read process than re-storing process for resuming position POS _P, data storage folder If it is located at the end of F1 to F12, the storage position management unit 26 uses the value of the end / restart position POS _P of the re-storing process stored in the storage position information storage unit 32 as the read pointer position POS _R. Update with a value (step S6). That is, the position POS _P at which the re-storing process is resumed is skipped to the position POS _R at which the next reading process is started.

このとき、図８に示すように、再格納処理の再開位置ＰＯＳ_Ｐを、先行していた読み出し処理の次回開始位置ＰＯＳ_Ｒに追いつかせた状態になる。データ格納ファイルＦ１〜Ｆ１２において、読み出し処理の次回の開始位置以後に格納されているデータに対する再格納処理を優先して行うことになる。 At this time, as shown in FIG. 8, the restart position POS _P of the re-storing process is brought to the state of following the next start position POS _R of the preceding reading process. In the data storage files F <b> 1 to F <b> 12, priority is given to the re-storing process for data stored after the next start position of the reading process.

尚、読み出し処理の進行方向において、位置ＰＯＳ_Ｐが位置ＰＯＳ_Ｒよりも前方にある場合、即ち、次の読み出し処理を開始する位置ＰＯＳ_Ｒが、再格納処理を再開する位置ＰＯＳ_Ｐよりも、データ格納フォルダＦ１〜Ｆ１２の先頭側にある場合は何もしない。 When the position POS _P is ahead of the position POS _R in the progress direction of the reading process, that is, the position POS _{R at} which the next reading process is started is more data than the position POS _{P at} which the re-storing process is restarted. If it is at the head of the storage folder F1 to F12, nothing is done.

次に、分割記憶制御部２５は、格納位置情報記憶部３２に格納されている位置ＰＯＳ_Ｐから再格納処理を開始する（ステップＳ７）。 Next, the divided storage control unit 25 starts the re-storage process from the position POS _P stored in the storage position information storage unit 32 (step S7).

例えば、ステップＳ６、Ｓ７では、ストレージシステム１０がレプリケーションのマスタサイトの記憶装置である場合、レプリケーションジョブの終了時に、レプリケーションジョブは最後に読み出した位置を読み出し済情報記憶部３３に通知する。また、レプリケーションジョブは、読み出し済情報を更新したことを格納位置管理部２６に通知する。この通知を受けて、格納位置管理部２６は、読み出し済情報記憶部３３を参照し、再格納処理開始位置が読み出し済情報記憶部３３から読み出した位置に設定する。分割記憶制御部２５は、再設定を行った位置から再格納処理を開始する。 For example, in steps S6 and S7, when the storage system 10 is a storage device at the replication master site, the replication job notifies the read-out information storage unit 33 of the last read position at the end of the replication job. In addition, the replication job notifies the storage location management unit 26 that the read information has been updated. In response to this notification, the storage location management unit 26 refers to the read information storage unit 33 and sets the re-storage process start position to the position read from the read information storage unit 33. The divided storage control unit 25 starts the re-storing process from the reset position.

本実施形態によれば、コンテンツアドレスストレージシステムの読み出し性能の向上を図ることができる。 According to this embodiment, the read performance of the content address storage system can be improved.

上記の実施形態の一部又は全部は以下の付記のようにも記載されうるが、以下には限られない。 A part or all of the above embodiment can be described as the following supplementary notes, but is not limited thereto.

（付記１）
ひとつの親データを複数に分割して生成したデータを子データと呼ぶとき、互いに異なる親データを分割元とする子データを順番に記憶装置内のファイルに格納する手段、
前記ファイルの読み出し処理の実行に応じて、当該読み出し処理が前記ファイルから最後に読み出した位置を記憶装置に記憶する手段、
前記ファイル内で前記子データをソートするソート処理を、前記位置以後に格納されている前記子ファイルに対して行う手段
を備えるストレージシステム。 (Appendix 1)
When data generated by dividing one parent data into a plurality of pieces is called child data, means for sequentially storing child data having different parent data as a division source in a file in the storage device,
Means for storing, in a storage device, a position at which the reading process was last read from the file in response to execution of the reading process of the file;
A storage system comprising means for performing a sorting process for sorting the child data in the file on the child file stored after the position.

（付記２）
前記ソート処理は、同じ前記親データを分割元とする前記子データが、前記ファイル内において連続するように前記子データをソートする、付記１に記載のストレージシステム。 (Appendix 2)
The storage system according to appendix 1, wherein the sorting process sorts the child data so that the child data having the same parent data as a division source is continuous in the file.

（付記３）
前記子データは、同じ親データを分割して生成した他の子データとの間に互いに順序関係を有し、
前記ソート処理は、同じ前記親データを分割元とする前記子データが前記ファイル内において前記順序関係に従って連続するように行う、
付記１または付記２に記載のストレージシステム。 (Appendix 3)
The child data has an order relationship with other child data generated by dividing the same parent data,
The sorting process is performed so that the child data having the same parent data as a division source is continuous according to the order relation in the file.
The storage system according to appendix 1 or appendix 2.

（付記４）
付記１乃至付記３のいずれかに記載のコンテンツアドレスストレージシステム。 (Appendix 4)
The content address storage system according to any one of appendix 1 to appendix 3.

（付記５）
前記ストレージシステムは、マスタサイトからレプリカサイトへレプリケーションを行う前記マスタサイトが備えるストレージシステムである、付記１乃至付記４のいずれかに記載のストレージシステム。 (Appendix 5)
The storage system according to any one of appendix 1 to appendix 4, wherein the storage system is a storage system included in the master site that performs replication from a master site to a replica site.

（付記６）
ひとつの親データを複数に分割して生成したデータを子データと呼ぶとき、互いに異なる親データを分割元とする子データを順番に記憶装置内のファイルに格納する段階、
前記ファイルの読み出し処理の実行に応じて、当該読み出し処理が前記ファイルから最後に読み出した位置を記憶装置に記憶する段階、
前記ファイル内で前記子データをソートするソート処理を、前記位置以後に格納されている前記子ファイルに対して行う段階
を含む、データソート方法。 (Appendix 6)
When the data generated by dividing one parent data into a plurality of child data is called child data, the step of storing child data having different parent data as a division source in a file in the storage device in order,
In response to execution of the file reading process, storing the position at which the reading process was last read from the file in a storage device;
A data sorting method, comprising: performing a sorting process for sorting the child data in the file on the child file stored after the position.

（付記７）
前記ソート処理は、同じ前記親データを分割元とする前記子データが、前記ファイル内において連続するように前記子データをソートする、付記６に記載の方法。 (Appendix 7)
The method according to claim 6, wherein the sorting process sorts the child data so that the child data having the same parent data as a division source is continuous in the file.

（付記８）
前記子データは、同じ親データを分割して生成した他の子データとの間に互いに順序関係を有し、
前記ソート処理は、同じ前記親データを分割元とする前記子データが前記ファイル内において前記順序関係に従って連続するように行う、
付記６または付記７に記載の方法。 (Appendix 8)
The child data has an order relationship with other child data generated by dividing the same parent data,
The sorting process is performed so that the child data having the same parent data as a division source is continuous according to the order relation in the file.
The method according to appendix 6 or appendix 7.

（付記９）
前記ストレージシステムはコンテンツアドレスストレージシステムである、付記６乃至付記８のいずれかに記載の方法。 (Appendix 9)
9. The method according to any one of appendix 6 to appendix 8, wherein the storage system is a content address storage system.

（付記１０）
前記ストレージシステムは、マスタサイトからレプリカサイトへレプリケーションを行う前記マスタサイトが備えるストレージシステムである、付記６乃至付記９のいずれかに記載の方法。 (Appendix 10)
The method according to any one of appendix 6 to appendix 9, wherein the storage system is a storage system included in the master site that performs replication from a master site to a replica site.

（付記１１）
ひとつの親データを複数に分割して生成したデータを子データと呼ぶとき、互いに異なる親データを分割元とする子データを順番に記憶装置内のファイルに格納する手段、
前記ファイルの読み出し処理の実行に応じて、当該読み出し処理が前記ファイルから最後に読み出した位置を記憶装置に記憶する手段、
前記ファイル内で前記子データをソートするソート処理を、前記位置以後に格納されている前記子ファイルに対して行う手段
をコンピュータに実現させるためのプログラム。 (Appendix 11)
When data generated by dividing one parent data into a plurality of pieces is called child data, means for sequentially storing child data having different parent data as a division source in a file in the storage device,
Means for storing, in a storage device, a position at which the reading process was last read from the file in response to execution of the reading process of the file;
A program for causing a computer to realize means for sorting the child data in the file with respect to the child file stored after the position.

（付記１２）
前記ソート処理は、同じ前記親データを分割元とする前記子データが、前記ファイル内において連続するように前記子データをソートする、付記１１に記載のプログラム。 (Appendix 12)
The program according to claim 11, wherein the sorting process sorts the child data so that the child data having the same parent data as a division source is continuous in the file.

（付記１３）
前記子データは、同じ親データを分割して生成した他の子データとの間に互いに順序関係を有し、
前記ソート処理は、同じ前記親データを分割元とする前記子データが前記ファイル内において前記順序関係に従って連続するように行う、
付記１１または付記１２に記載のプログラム。 (Appendix 13)
The child data has an order relationship with other child data generated by dividing the same parent data,
The sorting process is performed so that the child data having the same parent data as a division source is continuous according to the order relation in the file.
The program according to appendix 11 or appendix 12.

１、１０ストレージシステム
２１ストリームＩＤ付与部
２２ブロック生成部
２３重複チェック部
２４フラグメント生成部
２５分散記憶制御部
２６格納位置管理部
３０データ記憶装置
３１記憶装置
３２格納位置情報記憶部
３３読み出し済み情報記憶部 1, 10 Storage system 21 Stream ID assigning unit 22 Block generation unit 23 Duplicate check unit 24 Fragment generation unit 25 Distributed storage control unit 26 Storage location management unit 30 Data storage device 31 Storage device 32 Storage location information storage unit 33 Read information storage Part

Claims

When data generated by dividing one parent data into a plurality of pieces is called child data, means for sequentially storing child data having different parent data as a division source in a file in the storage device,
Means for storing, in a storage device, a position at which the reading process was last read from the file in response to execution of the reading process of the file;
A storage system comprising means for performing a sorting process for sorting the child data in the file on the child file stored after the position.

The storage system according to claim 1, wherein the sorting processing sorts the child data so that the child data having the same parent data as a division source is continuous in the file.

The child data has an order relationship with other child data generated by dividing the same parent data,
The sorting process is performed so that the child data having the same parent data as a division source is continuous according to the order relation in the file.
The storage system according to claim 1 or 2.

The content address storage system according to any one of claims 1 to 3.

The storage system according to any one of claims 1 to 4, wherein the storage system is a storage system included in the master site that performs replication from a master site to a replica site.

When the data generated by dividing one parent data into a plurality of child data is called child data, the step of storing child data having different parent data as a division source in a file in the storage device in order,
In response to execution of the file reading process, storing the position at which the reading process was last read from the file in a storage device;
A data sorting method, comprising: performing a sorting process for sorting the child data in the file on the child file stored after the position.

The method according to claim 6, wherein the sorting process sorts the child data so that the child data having the same parent data as a division source is continuous in the file.

The child data has an order relationship with other child data generated by dividing the same parent data,
The sorting process is performed so that the child data having the same parent data as a division source is continuous according to the order relation in the file.
8. A method according to claim 6 or claim 7.

When data generated by dividing one parent data into a plurality of pieces is called child data, means for sequentially storing child data having different parent data as a division source in a file in the storage device,
Means for storing, in a storage device, a position at which the reading process was last read from the file in response to execution of the reading process of the file;
A program for causing a computer to realize means for sorting the child data in the file with respect to the child file stored after the position.

The program according to claim 9, wherein the sorting process sorts the child data so that the child data having the same parent data as a division source is continuous in the file.