JP6281333B2

JP6281333B2 - Storage system

Info

Publication number: JP6281333B2
Application number: JP2014047384A
Authority: JP
Inventors: 利朗中島
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2014-03-11
Filing date: 2014-03-11
Publication date: 2018-02-21
Anticipated expiration: 2034-03-11
Also published as: JP2015170345A

Description

本発明は、ストレージシステムにかかり、特に、同一内容のデータの重複記憶を排除するストレージシステムに関する。 The present invention relates to a storage system, and more particularly to a storage system that eliminates redundant storage of data having the same content.

近年、コンピュータの発達及び普及に伴い、種々の情報がデジタルデータ化されている。このようなデジタルデータを保存しておく装置として、磁気テープや磁気ディスクなどの記憶装置がある。そして、保存すべきデータは日々増大し、膨大な量となるため、大容量なストレージシステムが必要となっている。また、記憶装置に費やすコストを削減しつつ、信頼性も必要とされる。これに加えて、後にデータを容易に取り出すことが可能であることも必要である。その結果、自動的に記憶容量や性能の増大を実現できると共に、重複記憶を排除して記憶コストを削減し、さらには、冗長性の高いストレージシステムが望まれている。 In recent years, with the development and spread of computers, various types of information have been converted into digital data. As a device for storing such digital data, there are storage devices such as a magnetic tape and a magnetic disk. Since the data to be stored increases day by day and becomes enormous, a large-capacity storage system is required. In addition, reliability is required while reducing the cost of the storage device. In addition to this, it is necessary that data can be easily retrieved later. As a result, there is a demand for a storage system that can automatically increase storage capacity and performance, eliminate duplicate storage, reduce storage costs, and have high redundancy.

このような状況に応じて、近年では、コンテンツアドレスストレージシステムが開発されている。このコンテンツアドレスストレージシステムは、データを分散して複数の記憶装置に記憶すると共に、このデータの内容に応じて特定される固有のコンテンツアドレスによって、当該データを格納した格納位置が特定される。また、コンテンツアドレスストレージシステムの中には、所定のデータを複数のフラグメントに分割すると共に、冗長データとなるフラグメントをさらに付加して、これら複数のフラグメントをそれぞれ複数の記憶装置にそれぞれ格納する、というものもある。 In response to this situation, in recent years, content address storage systems have been developed. In this content address storage system, data is distributed and stored in a plurality of storage devices, and the storage location where the data is stored is specified by a unique content address specified according to the content of the data. Further, in the content address storage system, predetermined data is divided into a plurality of fragments, and a fragment that becomes redundant data is further added, and the plurality of fragments are stored in a plurality of storage devices, respectively. There are also things.

そして、上述したようなコンテンツアドレスストレージシステムでは、後に、コンテンツアドレスを指定することにより、当該コンテンツアドレスにて特定される格納位置に格納されているデータつまりフラグメントを読み出し、複数のフラグメントから分割前の所定のデータを復元することができる。 Then, in the content address storage system as described above, by designating the content address later, the data stored in the storage location specified by the content address, that is, the fragment is read out, and a plurality of fragments before the division are read. Predetermined data can be restored.

また、上記コンテンツアドレスは、データの内容に応じて固有となるよう生成される値、例えばデータのハッシュ値、に基づいて生成される。このため、重複データであれば同じ格納位置のデータを参照することで、同一内容のデータを取得することができる。従って、重複データを別々に格納する必要がなく、重複記録を排除して、データ容量の削減を図ることができる。 The content address is generated based on a value generated to be unique according to the data content, for example, a hash value of the data. For this reason, if it is duplicate data, the data of the same content can be acquired by referring to the data at the same storage position. Therefore, there is no need to store duplicate data separately, and duplicate recording can be eliminated to reduce the data capacity.

特に、上述したような重複記憶を排除する機能を有するストレージシステムでは、ファイルなど書き込み対象となるデータを所定容量の複数のブロックデータに分割して圧縮し、記憶装置に書き込む。このように、ファイルを分割したブロックデータ単位で重複記憶を排除することで、重複率が増大し、データ容量の削減を図っている。 In particular, in a storage system having a function of eliminating duplicate storage as described above, data to be written such as a file is divided into a plurality of block data of a predetermined capacity, compressed, and written to a storage device. In this way, by eliminating duplicate storage in units of block data obtained by dividing a file, the duplication rate increases and the data capacity is reduced.

特開２０１３−４７９３３号公報JP 2013-47933 A

ここで、特許文献１では、複数階層のボリュームを有するストレージシステムにおいて格納データの位置を制御することで、重複記憶排除率を向上させることを提案している。一方で、ストレージシステムを構成する記憶媒体としては、読み書きが低速であるが低コストで実現できるＨＤＤ（Hard disk drive）といった低速デバイスや、コストが高いが読み書きが高速であるＳＳＤ（Solid State Drive）といった高速デバイスが用いられる。この場合、重複記憶排除率の向上のみを考慮して格納データの格納先を決定すると、格納データの用途や属性に対して不適切なデバイスに格納されることが生じる。すると、低速デバイスに格納したことによる処理遅延や、高速デバイスの利用効率の低下、さらには、デバイス間のデータ移動処理の増大、といった問題が発生する。その結果、ストレージシステムの性能の低下、という問題が生じる。 Here, Patent Document 1 proposes to improve the duplicate storage exclusion ratio by controlling the position of stored data in a storage system having a volume of a plurality of hierarchies. On the other hand, as a storage medium constituting the storage system, a low-speed device such as an HDD (Hard disk drive) that can be realized at low cost while reading and writing is low, or an SSD (Solid State Drive) that is high in cost but high in reading and writing. Such a high-speed device is used. In this case, if the storage destination of the stored data is determined considering only the improvement of the duplicate storage exclusion rate, the storage data may be stored in a device inappropriate for the usage and attributes of the stored data. Then, problems such as processing delay due to storage in a low-speed device, reduction in utilization efficiency of the high-speed device, and increase in data movement processing between devices occur. As a result, there arises a problem that the performance of the storage system is degraded.

このため、本発明の目的は、重複排除機能を有するストレージシステムにおける性能の低下、という問題を解決することができるストレージシステムを提供することにある。 Therefore, an object of the present invention is to provide a storage system that can solve the problem of performance degradation in a storage system having a deduplication function.

本発明の一形態であるストレージシステムは、
記憶対象データが記憶装置に既に記憶されている重複状態であるか否かを判定する重複判定部と、
重複していない前記記憶対象データである非重複データの格納先を決定する格納先決定部と、
前記非重複データは、前記決定した格納先となる記憶装置に記憶し、重複している前記記憶対象データである重複データは、当該重複データと同一のデータ内容であり記憶装置に既に記憶されている他のデータを参照して当該重複データとして用いるデータ格納制御部と、を備え、
前記格納先決定部は、前記非重複データと予め設定された基準により関連すると判断される前記重複データの格納先を判定し、当該判定結果に基づいて、前記非重複データの格納先を決定する、
という構成をとる。 A storage system according to an aspect of the present invention
A duplication determination unit that determines whether or not the storage target data is in a duplication state that is already stored in the storage device;
A storage destination determination unit that determines a storage destination of non-duplicate data that is the storage target data that is not duplicated;
The non-duplicate data is stored in the storage device that is the determined storage destination, and the duplicate data that is the storage target data is the same data content as the duplicate data and is already stored in the storage device A data storage control unit used as the duplicate data with reference to other data that is present,
The storage location determination unit determines a storage location of the duplicate data that is determined to be related to the non-redundant data according to a preset criterion, and determines a storage location of the non-redundant data based on the determination result ,
The configuration is as follows.

本発明の他の形態であるプログラムは、
情報処理装置に、
記憶対象データが記憶装置に既に記憶されている重複状態であるか否かを判定する重複判定部と、
重複していない前記記憶対象データである非重複データの格納先を決定する格納先決定部と、
前記非重複データは、前記決定した格納先となる記憶装置に記憶し、重複している前記記憶対象データである重複データは、当該重複データと同一のデータ内容であり記憶装置に既に記憶されている他のデータを参照して当該重複データとして用いるデータ格納制御部と、を実現させると共に、
前記格納先決定部は、前記非重複データと予め設定された基準により関連すると判断される前記重複データの格納先を判定し、当該判定結果に基づいて、前記非重複データの格納先を決定する、
ことを実現させる、という構成をとる。 The program which is the other form of this invention is:
In the information processing device,
A duplication determination unit that determines whether or not the storage target data is in a duplication state that is already stored in the storage device;
A storage destination determination unit that determines a storage destination of non-duplicate data that is the storage target data that is not duplicated;
The non-duplicate data is stored in the storage device that is the determined storage destination, and the duplicate data that is the storage target data is the same data content as the duplicate data and is already stored in the storage device And a data storage control unit that is used as the duplicate data with reference to other data,
The storage location determination unit determines a storage location of the duplicate data that is determined to be related to the non-redundant data according to a preset criterion, and determines a storage location of the non-redundant data based on the determination result ,
It takes the structure of realizing.

本発明の他の形態であるデータ格納方法は、
記憶対象データが記憶装置に既に記憶されている重複状態であるか否かを判定し、
重複していない前記記憶対象データである非重複データの格納先を決定し、
前記非重複データは、前記決定した格納先となる記憶装置に記憶し、重複している前記記憶対象データである重複データは、当該重複データと同一のデータ内容であり記憶装置に既に記憶されている他のデータを参照して当該重複データとして用いるデータ格納処理を行うデータ格納方法であって、
前記非重複データと予め設定された基準により関連すると判断される前記重複データの格納先を判定し、当該判定結果に基づいて、前記非重複データの格納先を決定する、
という構成をとる。 A data storage method according to another aspect of the present invention includes:
Determine whether the data to be stored is in a duplicate state already stored in the storage device,
Determine the storage destination of non-duplicate data that is the storage target data that is not duplicated,
The non-duplicate data is stored in the storage device that is the determined storage destination, and the duplicate data that is the storage target data is the same data content as the duplicate data and is already stored in the storage device A data storage method for performing data storage processing to be used as the duplicate data with reference to other data,
Determining the storage location of the duplicate data that is determined to be related to the non-duplicate data according to a preset criterion, and determining the storage location of the non-duplication data based on the determination result;
The configuration is as follows.

本発明は、以上のように構成されることにより、重複排除機能を有するストレージシステムにおける性能の向上を図ることができる。 With the configuration as described above, the present invention can improve performance in a storage system having a deduplication function.

本発明の実施形態１におけるストレージシステムの構成の概略を示すブロック図である。1 is a block diagram showing an outline of the configuration of a storage system in Embodiment 1 of the present invention. 本発明の実施形態１におけるストレージシステムの構成を示す機能ブロック図である。1 is a functional block diagram showing a configuration of a storage system in Embodiment 1 of the present invention. 図２に開示したストレージシステムに記憶されるデータの一例を示す図である。FIG. 3 is a diagram illustrating an example of data stored in the storage system disclosed in FIG. 2. 図２に開示したストレージシステムに記憶されるデータの一例を示す図である。FIG. 3 is a diagram illustrating an example of data stored in the storage system disclosed in FIG. 2. 図２に開示したストレージシステムにおけるデータ書き込み処理の様子を説明するための説明図である。FIG. 3 is an explanatory diagram for explaining a state of data writing processing in the storage system disclosed in FIG. 2. 図２に開示したストレージシステムにおけるデータ書き込み処理の様子を説明する説明図である。FIG. 3 is an explanatory diagram illustrating a state of data write processing in the storage system disclosed in FIG. 2. 図２に開示したストレージシステムにおけるデータ書き込み処理の概略を説明する説明図である。FIG. 3 is an explanatory diagram illustrating an outline of a data writing process in the storage system disclosed in FIG. 2. 図２に開示したストレージシステムにおけるデータ書き込み処理の動作を示すフローチャートである。3 is a flowchart showing an operation of data write processing in the storage system disclosed in FIG. 2. 図２に開示したストレージシステムにおけるデータ書き込み処理の動作を示すフローチャートである。3 is a flowchart showing an operation of data write processing in the storage system disclosed in FIG. 2. 図２に開示したストレージシステムにおけるデータ書き込み処理の動作を示すフローチャートである。3 is a flowchart showing an operation of data write processing in the storage system disclosed in FIG. 2. 図２に開示したストレージシステムにおけるデータ書き込み処理の動作を示すフローチャートである。3 is a flowchart showing an operation of data write processing in the storage system disclosed in FIG. 2. 本発明の付記１におけるストレージシステムの構成を示すブロック図である。It is a block diagram which shows the structure of the storage system in attachment 1 of this invention.

＜実施形態１＞
本発明の第１の実施形態を、図１乃至図１１参照して説明する。図１は、ストレージシステムの構成の概略を示すブロック図であり、図２は、ストレージシステムの構成を示す機能ブロック図である。図３乃至図４は、ストレージシステムに記憶されるデータの一例を示す図である。図５乃至図７は、ストレージシステムにおけるデータ書き込み処理の様子を説明するための説明図である。図８乃至図１１は、ストレージシステムの動作を示すフローチャートである。 <Embodiment 1>
A first embodiment of the present invention will be described with reference to FIGS. FIG. 1 is a block diagram showing an outline of the configuration of the storage system, and FIG. 2 is a functional block diagram showing the configuration of the storage system. 3 to 4 are diagrams illustrating an example of data stored in the storage system. 5 to 7 are explanatory diagrams for explaining the state of data write processing in the storage system. 8 to 11 are flowcharts showing the operation of the storage system.

［構成］
図１に示すように、本発明におけるストレージシステム１は、複数のサーバコンピュータが接続された構成を採っている。具体的に、ストレージシステム１は、ストレージシステム１自体における記憶再生動作を制御するサーバコンピュータであるアクセスノード２と、データを格納する記憶装置を備えたサーバコンピュータであるストレージノード３と、を備えている。なお、アクセスノード２の数とストレージノード３の数は、図２に示したものに限定されず、さらに多くの各ノード２，３が接続されて構成されていてもよい。また、本発明のストレージシステム１は、１台のコンピュータで構成されていてもよい。 [Constitution]
As shown in FIG. 1, the storage system 1 of the present invention has a configuration in which a plurality of server computers are connected. Specifically, the storage system 1 includes an access node 2 that is a server computer that controls storage and reproduction operations in the storage system 1 itself, and a storage node 3 that is a server computer including a storage device that stores data. Yes. Note that the number of access nodes 2 and the number of storage nodes 3 are not limited to those shown in FIG. 2, and more nodes 2 and 3 may be connected. Further, the storage system 1 of the present invention may be composed of one computer.

ここで、本実施形態におけるストレージシステム１を、図２に示すように、重複排除コントローラ２０と、階層化ストレージシステム３０と、により構成されていることとして説明する。重複排除コントローラ２０は、主に上述したアクセスノード２にて構成され、階層化ストレージシステム３０は、主に上述したストレージノード３にて構成されている。なお、重複排除コントローラ２０と階層化ストレージシステム３０とは、必ずしもアクセスノード２とストレージノード３とで構成されていることに限定されない。 Here, the storage system 1 in this embodiment will be described as being configured by a deduplication controller 20 and a hierarchical storage system 30 as shown in FIG. The deduplication controller 20 is mainly configured by the access node 2 described above, and the hierarchical storage system 30 is mainly configured by the storage node 3 described above. The deduplication controller 20 and the tiered storage system 30 are not necessarily limited to being configured by the access node 2 and the storage node 3.

図２に示すように、重複排除コントローラ２０は、装備された演算装置にプログラムが組み込まれることで構築された、重複排除処理部２１と、データ管理部２２と、ＮＡＳインタフェース２３と、仮想ファイルシステム２４と、データＩ／ＯＩ／Ｆ２５と、非重複ブロック格納位置判定部２６と、を備えている。また、重複排除コントローラ２０は、装備された記憶装置に記憶された判定選択ポリシー２７を有している。 As shown in FIG. 2, the deduplication controller 20 includes a deduplication processing unit 21, a data management unit 22, a NAS interface 23, and a virtual file system, which are constructed by incorporating a program into the equipped arithmetic device. 24, a data I / OI / F 25, and a non-overlapping block storage position determination unit 26. In addition, the deduplication controller 20 has a determination selection policy 27 stored in the equipped storage device.

ＮＡＳインタフェース２３は、ＮＡＳプロトコル（CIFS, NFS, etc）による接続を受け付けるインタフェースである。ここでは、ストレージシステム１自体がＮＡＳ（Network Attached Storage）であり、他のバックアップサーバやアーカイブサーバから記憶対象となるデータを受け付ける。 The NAS interface 23 is an interface that accepts a connection using a NAS protocol (CIFS, NFS, etc.). Here, the storage system 1 itself is NAS (Network Attached Storage), and accepts data to be stored from other backup servers and archive servers.

重複排除処理部２１（重複判定部）は、入力データを固定長または可変長のブロックに分割し、ブロックごとにすでに保存されているかどうかを、データ管理部２２を参照して判定する。具体的に、重複排除処理部２１は、ブロックが既に記憶されている重複状態である重複ブロックであるか、まだ記憶されていない非重複状態である非重複ブロックであるかを判定する。そして、 The deduplication processing unit 21 (duplication determination unit) divides input data into fixed-length or variable-length blocks, and determines whether each block is already stored with reference to the data management unit 22. Specifically, the deduplication processing unit 21 determines whether the block is a duplicate block in a duplicate state that has already been stored, or a non-overlap block in a non-duplicate state that has not yet been stored. And

データ管理部２２は、各ブロックのハッシュ値を元に参照されるハッシュテーブルで、各ブロックの格納先アドレスを管理する。 The data management unit 22 manages the storage destination address of each block in a hash table that is referred to based on the hash value of each block.

仮想ファイルシステム２４は、ＮＡＳプロトコルで入出力されるデータと、実際に格納されているブロック単位のデータとを紐づけて、リクエストのあるデータを提供する。 The virtual file system 24 provides requested data by associating data inputted / outputted by the NAS protocol with block-unit data actually stored.

非重複データ格納先デバイス判定部２６（格納先判定部）は、ストレージ内に存在しない非重複状態である非重複ブロックを、階層化ストレージシステム３０のどのデバイスに格納するかを決定し、階層化ストレージシステム３０へ要求する。 The non-redundant data storage destination device determination unit 26 (storage destination determination unit) determines in which device of the hierarchical storage system 30 a non-redundant block in a non-overlapping state that does not exist in the storage is stored, and is hierarchical Request to the storage system 30.

データI/O I/F２５は、階層化ストレージシステム３０へデータの受け渡しを行うインタフェースである。例えば、インタフェースは、FC（Fibre Channel）,iSCSIなどで実現されている。 The data I / O I / F 25 is an interface for transferring data to the hierarchical storage system 30. For example, the interface is realized by FC (Fibre Channel), iSCSI or the like.

判定選択ポリシー２７は、非重複ブロックの格納先の決定方法について定義するテーブルである。このポリシーは、外部からの書き換えが可能である。 The determination selection policy 27 is a table that defines a method for determining the storage destination of non-overlapping blocks. This policy can be rewritten from the outside.

また、図２に示すように、階層化ストレージシステム３０は、装備された演算装置にプログラムが組み込まれることで構築された、データＩ／ＯＩ／Ｆ３１と、既存データ格納先取得Ｉ／Ｆ３２と、新規データ格納先リクエスト受付Ｉ／Ｆ３３と、階層化管理部３４と、を備えている。階層化管理部３４は、さらに、データ格納Ｉ／Ｏ管理部３５と、データ格納先リクエスト部３６と、データ格納先変更部３７と、を備えている。 As shown in FIG. 2, the hierarchical storage system 30 includes a data I / OI / F 31, an existing data storage destination acquisition I / F 32, which are constructed by incorporating a program into the equipped arithmetic device. A new data storage destination request reception I / F 33 and a hierarchization management unit 34 are provided. The hierarchization management unit 34 further includes a data storage I / O management unit 35, a data storage destination request unit 36, and a data storage destination change unit 37.

また、階層化ストレージシステム３０は、データを格納する性能の異なる複数の記憶装置であるデバイス４１，４２，４３を備えている。例えば、デバイスは、記憶領域に対するアクセス速度が、所定の基準により高速、中速、低速に分けられている記憶装置である。 The hierarchical storage system 30 includes devices 41, 42, and 43, which are a plurality of storage devices with different performance for storing data. For example, the device is a storage device in which the access speed to the storage area is divided into high speed, medium speed, and low speed according to a predetermined standard.

データI/O I/F３１は、重複排除コントローラ２０とのデータの受け渡しを行うインタフェースである。 The data I / O I / F 31 is an interface that exchanges data with the deduplication controller 20.

階層化管理部３４（データ格納先制御部）は、複数のデバイスに分散して格納されているデータを管理し、アクセス頻度などをもとにデータの移動を行う機能を有する。具体的に、階層化管理部３４を構成するデータ格納I/O管理部３５は、上位からI/Oのリクエストが来たときに、上位から来たアドレスと、実際に格納されているデバイスのアドレスとの変換を行い、データの受け渡しをする。例えば、新規のデータの書き込みリクエストがきた場合、「データの格納先リクエスト」に対し、書き込まれるデータに対して格納先デバイスの指定があるかを問い合わせ、指定がある場合はそのデバイスに格納する。また、すでに書き込まれているデータの格納先デバイスの番号を返却する機能を有する。 The hierarchization management unit 34 (data storage destination control unit) has a function of managing data that is distributed and stored in a plurality of devices and moving the data based on the access frequency. Specifically, the data storage I / O management unit 35 that constitutes the hierarchical management unit 34, when an I / O request is received from the host, the address from the host and the device that is actually stored. Converts to address and passes data. For example, when a new data write request is received, an inquiry is made to the “data storage destination request” as to whether or not a storage destination device is specified for the data to be written. In addition, it has a function of returning the number of the storage destination device of already written data.

階層化管理部３４を構成するデータ格納先リクエスト部３６は、重複排除コントローラ２０側から、格納先デバイスを指定するリクエストを受ける。階層化管理部３４を構成するデータ格納先変更部３７は、格納されているデータの格納先を、定期的にアクセス頻度などから判断して移動させる。 The data storage destination request unit 36 constituting the hierarchization management unit 34 receives a request for designating a storage destination device from the deduplication controller 20 side. The data storage destination changing unit 37 constituting the hierarchization management unit 34 periodically moves the storage destination of the stored data based on the access frequency or the like.

既存データ格納先デバイス取得I/F３２は、重複排除コントローラ２０から、アドレスとオフセットを受け取り、データ格納I/O管理部３５にそのデータが格納されているデバイス番号（高速なものから順に1〜T）を問い合わせて、値を返す。 The existing data storage destination device acquisition I / F 32 receives the address and offset from the deduplication controller 20, and stores the device numbers (1 to T in order from the highest one) in which the data is stored in the data storage I / O management unit 35. ) And return the value.

新規データ格納先リクエスト受付I/F部３３は、重複排除コントローラ２０から、アドレス、オフセット、格納先デバイスの番号を受け取り、データ格納先リクエスト部３６に格納する。 The new data storage destination request reception I / F unit 33 receives the address, offset, and storage device number from the deduplication controller 20 and stores them in the data storage destination request unit 36.

ここで、本実施形態におけるストレージシステム１である重複排除コントローラ２０と階層化ストレージシステム３０とは、データの重複記憶を排除する機能を有している。かかる重複記憶排除機能は、主に重複排除処理部２１、データ管理部２２、データ格納Ｉ／Ｏ管理部３５によって実現されるが、かかる機能の詳細について、図５乃至図６を参照して説明する。以下では、重複排除コントローラ２０と階層化ストレージシステム３０とを区別せず、ストレージシステム１による動作として説明する。なお、重複記憶排除の方法は、以下に説明する方法に限定されない。 Here, the deduplication controller 20 and the hierarchical storage system 30 which are the storage systems 1 in this embodiment have a function of eliminating duplicate storage of data. Such a duplicate storage elimination function is realized mainly by the duplicate elimination processing unit 21, the data management unit 22, and the data storage I / O management unit 35. Details of the function will be described with reference to FIGS. To do. Hereinafter, the deduplication controller 20 and the tiered storage system 30 will not be distinguished from each other and will be described as operations by the storage system 1. Note that the method for eliminating duplicate memory is not limited to the method described below.

まず、図５及び図６の矢印Ｙ１に示すように、ストレージシステム１が書き込み要求されたファイルＡの入力を受ける。すると、図５及び図６の矢印Ｙ２に示すように、ファイルＡを所定容量（例えば、６４ＫＢ）あるいは可変長のブロックデータＤに分割する。 First, as indicated by an arrow Y1 in FIGS. 5 and 6, the storage system 1 receives an input of a file A requested to be written. Then, as indicated by an arrow Y2 in FIGS. 5 and 6, the file A is divided into predetermined capacity (for example, 64 KB) or variable length block data D.

続いて、分割されたブロックデータＤのデータ内容に基づいて、当該データ内容を代表する固有のハッシュ値Ｈを算出する（図６の矢印Ｙ３）。例えば、ハッシュ値Ｈは、予め設定されたハッシュ関数を用いて、ブロックデータＤのデータ内容から算出する。 Subsequently, based on the data content of the divided block data D, a unique hash value H representing the data content is calculated (arrow Y3 in FIG. 6). For example, the hash value H is calculated from the data content of the block data D using a preset hash function.

続いて、ファイルＡのブロックデータＤのハッシュ値Ｈを用いて、当該ブロックデータＤが既に格納されているか否かを調べる。具体的には、まず、既に格納されているブロックデータＤは、そのハッシュ値Ｈと格納位置を表すコンテンツアドレスＣＡとが、関連付けられてＭＦＩ（ＭａｉｎＦｒａｇｍｅｎｔＩｎｄｅｘ）ファイルに登録されている。従って、格納前に算出したブロックデータＤのハッシュ値ＨがＭＦＩファイル内に存在している場合には、既に同一内容のブロックデータＤが格納されている重複ブロックと判断できる（図６の矢印Ｙ４）。この場合には、格納前のブロックデータＤのハッシュ値Ｈと一致したＭＦＩ内のハッシュ値Ｈに関連付けられているコンテンツアドレスＣＡを、当該ＭＦＩファイルから取得する。そして、このコンテンツアドレスＣＡを、書き込み要求されたブロックデータＤのコンテンツアドレスＣＡとして返却する。 Subsequently, using the hash value H of the block data D of the file A, it is checked whether or not the block data D is already stored. Specifically, first, the block data D that has already been stored has its hash value H and a content address CA representing the storage position associated with each other and registered in an MFI (Main Fragment Index) file. Therefore, when the hash value H of the block data D calculated before the storage exists in the MFI file, it can be determined that the block data D having the same contents is already stored (arrow Y4 in FIG. 6). ). In this case, the content address CA associated with the hash value H in the MFI that matches the hash value H of the block data D before storage is acquired from the MFI file. Then, this content address CA is returned as the content address CA of the block data D requested to be written.

そして、返却されたコンテンツアドレスＣＡが参照する既に格納されているデータを、書き込み要求されたブロックデータＤとして使用する。つまり、書き込み要求されたブロックデータＤの格納先として、返却されたコンテンツアドレスＣＡが参照する領域を指定することで、当該書き込み要求されたブロックデータＤを記憶したこととする。これにより、書き込み要求にかかるブロックデータＤを、実際に記憶装置であるデバイス内に記憶する必要がなくなる。 The already stored data referred to by the returned content address CA is used as the block data D requested to be written. That is, it is assumed that the block data D requested to be written is stored by designating an area referred to by the returned content address CA as a storage destination of the block data D requested to be written. This eliminates the need to store the block data D relating to the write request in a device that is actually a storage device.

また、書き込み要求にかかるブロックデータＤがまだ記憶されていない非重複ブロックと判断された場合には、以下のようにして書き込み要求にかかるブロックデータＤの書き込みを行う。まず、書き込み要求にかかるブロックデータＤを圧縮して、図６の矢印Ｙ５に示すように、複数の所定の容量のフラグメントデータに分割する。例えば、図５の符号Ｄ１〜Ｄ９に示すように、９つのフラグメントデータ（分割データ９１）に分割する。そしてさらに、分割したフラグメントデータのうちいくつかが欠けた場合であっても、元となるブロックデータを復元可能なよう冗長データを生成し、上記分割したフラグメントデータ９１に追加する。例えば、図５の符号Ｄ１０〜Ｄ１２に示すように、３つのフラグメントデータ（冗長データ９２）を追加する。これにより、９つの分割データ９１と、３つの冗長データ９２とにより構成される１２個のフラグメントデータからなるデータセット９０を生成する。 When it is determined that the block data D related to the write request is a non-overlapping block that has not yet been stored, the block data D related to the write request is written as follows. First, the block data D related to the write request is compressed and divided into a plurality of fragment data of a predetermined capacity as indicated by an arrow Y5 in FIG. For example, as shown by symbols D1 to D9 in FIG. 5, the data is divided into nine fragment data (divided data 91). Further, even if some of the divided fragment data is missing, redundant data is generated so that the original block data can be restored and added to the divided fragment data 91. For example, three pieces of fragment data (redundant data 92) are added as indicated by reference numerals D10 to D12 in FIG. As a result, a data set 90 composed of 12 fragment data composed of nine divided data 91 and three redundant data 92 is generated.

続いて、上述したように生成されたデータセットを構成する各フラグメントデータを、記憶装置に形成された各記憶領域に、それぞれ分散して格納する。例えば、図５に示すように、１２個のフラグメントデータＤ１〜Ｄ１２を生成した場合には、複数の記憶装置内にそれぞれ形成したデータ格納ファイルに、各フラグメントデータＤ１〜Ｄ１２を１つずつそれぞれ格納する（図６の矢印Ｙ６参照）。 Subsequently, each fragment data constituting the data set generated as described above is distributed and stored in each storage area formed in the storage device. For example, as shown in FIG. 5, when 12 pieces of fragment data D1 to D12 are generated, each piece of fragment data D1 to D12 is stored in a data storage file formed in each of a plurality of storage devices. (See arrow Y6 in FIG. 6).

続いて、ストレージシステム１は、上述したように格納したフラグメントデータＤ１〜Ｄ１２の格納位置、つまり、当該フラグメントデータＤ１〜Ｄ１２にて復元されるブロックデータＤの格納位置を表すコンテンツアドレスＣＡを生成して管理する。具体的には、格納したブロックデータＤの内容に基づいて算出したハッシュ値Ｈの一部（ショートハッシュ）（例えば、ハッシュ値Ｈの先頭８Ｂ（バイト））と、論理格納位置を表す情報と、を組み合わせて、コンテンツアドレスＣＡを生成する。そして、このコンテンツアドレスＣＡを、ストレージシステム１内のファイルシステムに返却する（図６の矢印Ｙ７）。すると、ストレージシステム１は、バックアップ対象データのファイル名などの識別情報と、コンテンツアドレスＣＡとを関連付けてファイルシステムで管理する。 Subsequently, the storage system 1 generates a content address CA indicating the storage position of the fragment data D1 to D12 stored as described above, that is, the storage position of the block data D restored by the fragment data D1 to D12. Manage. Specifically, a part of the hash value H (short hash) calculated based on the contents of the stored block data D (for example, the top 8B (bytes) of the hash value H), information indicating the logical storage position, Are combined to generate a content address CA. Then, this content address CA is returned to the file system in the storage system 1 (arrow Y7 in FIG. 6). Then, the storage system 1 manages identification information such as the file name of the backup target data and the content address CA in association with the file system.

また、ブロックデータＤのコンテンツアドレスＣＡと、当該ブロックデータＤのハッシュ値Ｈと、を関連付けて、各ストレージノード３がＭＦＩファイルにて管理する（図６の矢印８）。このように、上記コンテンツアドレスＣＡは、ファイルを特定する情報やハッシュ値Ｈなどと関連付けられて、デバイスに格納される。 Further, each storage node 3 manages the content address CA of the block data D and the hash value H of the block data D in an MFI file (arrow 8 in FIG. 6). As described above, the content address CA is stored in the device in association with the information specifying the file, the hash value H, and the like.

以上のように、本実施形態におけるストレージシステム１では、重複ブロックは、同一のデータ内容でありデバイスに記憶されている他のデータを参照することで、実際にデータを記憶することなく記憶処理を行う重複排除処理を行う。また、非重複ブロックについては、フラグメントに分割して冗長化し、複数の記憶装置に分散して記憶する。但し、本発明におけるストレージシステム１は、必ずしも上述した方法にて重複記憶を排除することに限定されず、他の方法で記憶処理を行ってもよい。 As described above, in the storage system 1 according to the present embodiment, the duplicate block has the same data contents and refers to other data stored in the device, so that the storage process is performed without actually storing the data. Perform deduplication processing. Non-overlapping blocks are divided into fragments to make them redundant, and are distributed and stored in a plurality of storage devices. However, the storage system 1 in the present invention is not necessarily limited to eliminating duplicate storage by the above-described method, and storage processing may be performed by other methods.

ここで、本発明ではさらに、上述したように書き込み要求にかかるブロックデータＤがまだ記憶されていない非重複ブロックである場合には、以下のように、格納先のデバイスを決定する処理を行う。その動作について、主に、図７の説明図、及び、図８乃至図１１のフローチャートを参照して説明する。なお、図８は、ストレージシステム全体における動作を示し、図９乃至図１０は、そのうち、ブロックの格納先を決定するときの動作を示す。 Here, in the present invention, when the block data D related to the write request is a non-overlapping block that has not yet been stored as described above, processing for determining a storage destination device is performed as follows. The operation will be described mainly with reference to the explanatory diagram of FIG. 7 and the flowcharts of FIGS. 8 shows the operation in the entire storage system, and FIGS. 9 to 10 show the operation when determining the storage destination of the block.

まず、重複排除コントローラ２０の動作について説明する。 First, the operation of the deduplication controller 20 will be described.

重複排除コントローラ２０にファイルが入力されると、かかるファイルをバッファ領域に一時保存する（図８のステップＳ１）。そして、重複排除処理部２１が、ファイルの先頭から固定長にブロック分割し、各ブロックがすでにストレージ内に格納されているか、重複判定を行う（図８のステップＳ２）。このとき、ファイルの先頭のブロックから、上述したように重複判定を行う。なお、ブロック分割は、可変長で行ってもよい。 When a file is input to the deduplication controller 20, the file is temporarily stored in the buffer area (step S1 in FIG. 8). Then, the deduplication processing unit 21 divides the block into fixed lengths from the beginning of the file, and determines whether each block has already been stored in the storage (step S2 in FIG. 8). At this time, duplication determination is performed from the top block of the file as described above. Note that block division may be performed with a variable length.

重複判定の結果、重複ブロックは、実データを記憶することなく既存のブロックを参照し、ファイルの終端でなければ、次のブロックの重複判定に進む（図９のステップＳ１１：Ｎｏ，図１０のステップＳ２８:Ｎｏ，ステップＳ２９を参照）。 As a result of the duplication determination, the duplicate block refers to the existing block without storing the actual data, and if it is not the end of the file, it proceeds to the duplication judgment of the next block (step S11 in FIG. 9: No, FIG. 10). Step S28: No, see step S29).

一方、重複判定の結果、非重複ブロックは、格納先のデバイスを決定してから、かかるデバイスに格納する。以下、非重複ブロックの格納先となるデバイスを決定する処理について詳細に説明する。 On the other hand, as a result of the duplication determination, the non-overlapping block is stored in the device after determining the storage destination device. Hereinafter, processing for determining a device that is a storage destination of non-overlapping blocks will be described in detail.

具体的に、非重複ブロックの格納先を決定する際には、まず、図７（Ａ）に示すように、ファイルの属性が、判定選択ポリシー２７において定義されているかどうかを参照する（図８のステップＳ３）。そして、ファイルを構成する非重複ブロックの格納先を決定するポリシーとして、本発明で設定する「本方式」を用いるか、あるいは、既に階層化ストレージシステム３０に設定されている「規定ポリシー」を用いるか、を判定する。なお、「規定ポリシー」は、例えば、初期設定として設定されていたり、アクセス頻度や空き容量などから設定された、非重複ブロックの格納先を決定するポリシーである。 Specifically, when determining the storage destination of the non-overlapping block, first, as shown in FIG. 7A, it is referred to whether or not the file attribute is defined in the determination selection policy 27 (FIG. 8). Step S3). Then, as the policy for determining the storage destination of the non-overlapping blocks constituting the file, the “present method” set in the present invention is used, or the “specified policy” already set in the hierarchical storage system 30 is used. It is determined whether or not. Note that the “prescribed policy” is a policy that determines the storage destination of non-overlapping blocks, which is set as an initial setting, or is set based on access frequency, free space, or the like.

判定選択ポリシー２７は、例えば、図３に示すように設定されている。図３において「属性」は、上位から書き込まれたファイルの属性を意味し、この属性に該当する場合の判定方式と判定基準のパラメーターを定義している。このうち、「判定方式」は、ファイルが作成された段階における、ファイルの属性による新規データとなる非重複ブロックの格納先決定の方式を示す。 The determination selection policy 27 is set as shown in FIG. 3, for example. In FIG. 3, “attribute” means an attribute of a file written from the top, and defines a determination method and a determination criterion parameter when the attribute corresponds to this attribute. Among these, the “determination method” indicates a method for determining a storage destination of a non-overlapping block that becomes new data based on a file attribute at the stage when the file is created.

図３の例のでは、ファイルの拡張子が「.XXX」であった場合、非重複ブロックの格納先を「本方式」で決定すべく、次の処理へと進む（図８のステップＳ４以降、図９のステップＳ１２以降）。もし満たさない場合は、「既定ポリシー」を用いて格納先を決定し、非重複ブロックを格納する（図８のステップＳ６）。 In the example of FIG. 3, when the file extension is “.XXX”, the process proceeds to the next process in order to determine the storage destination of the non-overlapping block by “this method” (after step S4 in FIG. 8). FIG. 9 and subsequent steps S12). If not satisfied, the storage destination is determined using the “default policy”, and the non-overlapping block is stored (step S6 in FIG. 8).

なお、上述したようにファイルの「属性」によって格納先決定のポリシーを定義する理由としては、高速にアクセスできた方がよいもの、低速だが安価に保存したほうがよいものと、ファイルの種類によって必要とされる特性が異なるためである。例えば、OS（Operating System）のカーネルやアプリケーションのマイナーバージョン間の差分データ（非重複ブロック）は、他の周辺にある格納済みデータ（重複ブロック）と同様に高速でアクセスすることが期待される。一方、データファイルなどは、低速だが安価なデバイスに常に格納することが許容されることがある。 In addition, as described above, the reason for defining the policy for determining the storage location based on the “attribute” of the file is that it is better to be able to access it at high speed, it is better to save it at low speed, but it is necessary depending on the type of file. This is because the characteristics are different. For example, differential data (non-overlapping blocks) between OS (Operating System) kernels and application minor versions is expected to be accessed at high speed in the same way as stored data (overlapping blocks) in other peripheral areas. On the other hand, data files and the like may be allowed to be always stored in a low-speed but inexpensive device.

続いて、ファイルの「属性」から、非重複ブロックの格納先を「本方式」で決定する場合には、以下のようにして非重複ブロックの格納先を決定し、図４に示すような格納先のリストを作成する（図８のステップＳ４）。以下、非重複ブロックの格納先を「本方式」で決定する場合の動作を、主に図９乃至図１０を参照して説明する。 Subsequently, when determining the storage destination of non-overlapping blocks from the “attribute” of the file by “this method”, the storage destination of non-overlapping blocks is determined as follows, and storage as shown in FIG. The previous list is created (step S4 in FIG. 8). Hereinafter, the operation when the storage destination of the non-overlapping block is determined by the “present method” will be described mainly with reference to FIGS. 9 to 10.

まず、非重複ブロックについて（図９のステップＳ１１：Ｙｅｓ）、判定選択ポリシー２７から、ファイルの「属性」に該当する判定基準パラメーター(M,N,P)を取得する（図９のステップＳ１２）。ここで、Mは、非重複ブロックの周辺ブロック数を指定する値である。Nは、本方式を用いるときに判断する、前後Mブロック中の重複ブロック数の割合を指定する値である。Pは、ある特定の格納先デバイスに前後Mブロックの重複ブロックのP%以上が格納されているかどうかを判定するために用いる値である。 First, for a non-overlapping block (step S11 in FIG. 9: Yes), a determination criterion parameter (M, N, P) corresponding to the “attribute” of the file is acquired from the determination selection policy 27 (step S12 in FIG. 9). . Here, M is a value that specifies the number of peripheral blocks of non-overlapping blocks. N is a value that designates the ratio of the number of overlapping blocks in the preceding and following M blocks, which is determined when using this method. P is a value used to determine whether or not P% or more of the overlapping blocks of the preceding and succeeding M blocks are stored in a specific storage destination device.

続いて、非重複ブロックＢ１の前後の所定範囲に位置する重複ブロックＢ２が、すでにストレージ内に存在している率（重複率）が閾値以上であるかどうかを判定する。具体的には、図７（Ｂ）に示すように、まず、非重複ブロックＢ１の前後Mブロックにおいて、重複しているブロックＢ２の数をカウントする（図９のステップＳ１３）。そして、非重複ブロックＢ１の前後Mブロックの総数である2M個のうち、重複ブロックＢ２の数の割合を求め、その割合がN%（閾値）を超えているか否かを調べる（図９のステップＳ１４）。重複ブロックの割合がN%を超えていない場合には（図９のステップＳ１４：Ｎｏ）、規定ポリシーによる設定されるデバイスを非重複ブロックＢ１の格納先として決定する（図１０のステップＳ２５）。 Subsequently, it is determined whether or not the rate (duplication rate) at which the duplicate block B2 located in the predetermined range before and after the non-redundant block B1 already exists in the storage is equal to or greater than the threshold value. Specifically, as shown in FIG. 7B, first, the number of overlapping blocks B2 in the M blocks before and after the non-overlapping block B1 is counted (step S13 in FIG. 9). Then, the ratio of the number of duplicate blocks B2 is obtained out of 2M which is the total number of M blocks before and after the non-overlapping block B1, and it is checked whether the ratio exceeds N% (threshold) (step in FIG. 9). S14). If the overlapping block ratio does not exceed N% (step S14 in FIG. 9: No), the device set by the prescribed policy is determined as the storage destination of the non-overlapping block B1 (step S25 in FIG. 10).

一方、重複ブロックの割合がN%を超えている場合には（図９のステップＳ１４：Ｙｅｓ）、図７（Ｃ）に示すように、非重複ブロックの前後Mブロックのうち、重複ブロックの格納先の分布を判定する（図９のステップＳ１５）。具体的には、前後Mブロックのすべての重複ブロックＢ２に対し、現在の格納先デバイスの番号を取得し、デバイスごとの数をカウントする。 On the other hand, when the ratio of overlapping blocks exceeds N% (step S14 in FIG. 9: Yes), as shown in FIG. 7C, storage of the overlapping blocks among the M blocks before and after the non-overlapping blocks. The previous distribution is determined (step S15 in FIG. 9). Specifically, the number of the current storage destination device is acquired for all overlapping blocks B2 of the preceding and following M blocks, and the number for each device is counted.

ここで、被重複ブロック格納位置判定部２６は、重複ブロックごとに、アドレス・オフセットを階層化ストレージシステム３０の既存データ格納先I/F３２へ入力すると、そのブロックが格納されているデバイス番号（1〜T）を得られる。なお、デバイスの番号は、階層化ストレージシステム３０の各デバイス４１〜４３に、アクセス性能が高い順に、1〜T（Ｔ＝デバイスの数）の番号が割り振られている。つまり、デバイス番号1のデバイスが最も高速なデバイスであり、デバイス番号Tが最も低速なデバイスである。 Here, when the duplicated block storage position determination unit 26 inputs an address / offset for each duplicated block to the existing data storage destination I / F 32 of the hierarchical storage system 30, the device number (1 ~ T). Device numbers 1 to T (T = number of devices) are assigned to the devices 41 to 43 of the hierarchical storage system 30 in descending order of access performance. That is, the device with device number 1 is the fastest device, and the device number T is the slowest device.

そして、ある特定のデバイスに、非重複ブロックの前後Mブロックのうちの重複ブロックがP%（閾値）以上格納されているかどうかを調べる（図９のステップＳ１６）。ある特定のデバイスに、重複ブロックのP%が格納されていない場合には（図９のステップＳ１６：Ｎｏ）、規定ポリシーにより設定されるデバイスを非重複ブロックＢ１の格納先として決定する（図１０のステップＳ２５）。 Then, it is checked whether or not the duplicate block among the M blocks before and after the non-duplicate block is stored in a specific device at P% (threshold) or more (step S16 in FIG. 9). When P% of the duplicate block is not stored in a specific device (step S16 in FIG. 9: No), the device set by the prescribed policy is determined as the storage destination of the non-duplicate block B1 (FIG. 10). Step S25).

一方、ある特定のデバイスに、重複ブロックのP%以上が格納されている場合には（図９のステップＳ１６：Ｙｅｓ）、図７（Ｄ）に示すように、当該重複ブロックのP%以上が格納されている格納先と、判定選択ポリシー２７とに基づいて、最終的に非重複ブロックの格納先を決定する。具体的には、重複ブロックのP%以上が格納されているデバイスの番号から、当該デバイスが、高速なデバイス（当該デバイス番号＜＝T/2）であるか、低速なデバイス（当該デバイス番号＞T/2）であるか、を判定する（図１０のステップＳ２１）。そして、判定選択ポリシー２７で定義されている、重複ブロックの格納先が高速デバイスである場合の格納位置、または、低速デバイスである場合の格納位置、を参照して、指定されている格納先に基づいて、非重複ブロックの格納位置を決定する。 On the other hand, when P% or more of the duplicate block is stored in a specific device (step S16: Yes in FIG. 9), as shown in FIG. Based on the stored storage destination and the determination selection policy 27, the storage destination of the non-overlapping block is finally determined. Specifically, from the device number in which P% or more of the duplicate blocks are stored, the device is a high-speed device (the device number <= T / 2) or a low-speed device (the device number> T / 2) is determined (step S21 in FIG. 10). Then, referring to the storage location when the storage location of the duplicate block is a high-speed device or the storage location when it is a low-speed device, which is defined in the judgment selection policy 27, the specified storage location Based on this, the storage position of the non-overlapping block is determined.

例えば、図３の判定選択ポリシー２７の例では、「ファイルの容量がZZZバイト以上のファイル」において、「前後Mブロックの重複ブロックのP%以上が高速デバイス」の場合は（図１０のステップＳ２１：Ｙｅｓ）、高速デバイスの設定を用いる（図１０のステップＳ２２）。この場合は、重複ブロックのP%以上が格納されているデバイスを、非重複ブロックの格納先として決定する（図１０のステップＳ２４：Ｙｅｓ，ステップＳ２５）。一方、「前後Mブロックの重複ブロックのP%以上が低速デバイス」の場合は（図１０のステップＳ２１：Ｎｏ）、低速デバイスの設定を用いる（図１０のステップＳ２３）。この場合は、既定ポリシーに基づくデバイスを格納先として決定する（図１０のステップＳ２４：Ｎｏ，ステップＳ２６）。 For example, in the example of the determination selection policy 27 in FIG. 3, in the case of “file having a file size of ZZZ bytes or more” and “P% or more of overlapping blocks of preceding and following M blocks are high-speed devices” (step S21 in FIG. 10). : Yes), the setting of the high-speed device is used (step S22 in FIG. 10). In this case, a device in which P% or more of the duplicate blocks are stored is determined as a non-duplicate block storage destination (step S24 in FIG. 10: Yes, step S25). On the other hand, if “P% or more of the overlapping blocks of the preceding and following M blocks is a low speed device” (step S21 in FIG. 10: No), the setting of the low speed device is used (step S23 in FIG. 10). In this case, a device based on the default policy is determined as the storage destination (step S24 in FIG. 10: No, step S26).

続いて、上述した処理により、非重複ブロックの格納先が、重複ブロックがP%以上の格納されているデバイスを格納先として決定された場合には（図１０のステップＳ２５）、その情報を、階層化ストレージシステム３０へリクエストするリストに追加する（図１０のステップＳ２７）。例えば、図４に示すように、非重複ブロックの格納先の情報として、アドレス、オフセット、格納先のデバイス番号を、リストに追加する（ステップＳ２７）。 Subsequently, when the storage destination of the non-overlapping block is determined as a storage destination by the above-described processing (step S25 in FIG. 10), the information is stored as follows. It adds to the list | wrist requested | required of the hierarchical storage system 30 (step S27 of FIG. 10). For example, as shown in FIG. 4, an address, an offset, and a device number of the storage destination are added to the list as information on the storage destination of the non-overlapping block (step S27).

以上の処理を、ファイルの終端のブロックまで実行する（図１０のステップＳ２８）。ファイルの終端となった場合には、重複排除コントローラ２０は、階層化ストレージシステム３０のデータ格納先リクエスト受付I/F３３へ、リクエストのリストを送付する（図８のステップＳ５）。そして、データI/O I/F２５，３１を通じて、データを階層化ストレージシステム３０へ非重複ブロックを書き込む（図８のステップＳ６）。 The above processing is executed up to the end block of the file (step S28 in FIG. 10). When the end of the file is reached, the deduplication controller 20 sends a list of requests to the data storage destination request reception I / F 33 of the hierarchical storage system 30 (step S5 in FIG. 8). Then, the non-overlapping blocks are written to the hierarchical storage system 30 through the data I / O I / Fs 25 and 31 (step S6 in FIG. 8).

次に、図１１のフローチャートを参照して、階層化ストレージシステム３０の動作について詳細に説明する。 Next, the operation of the hierarchical storage system 30 will be described in detail with reference to the flowchart of FIG.

まず、データI/O I/F３１にデータの書き込みリクエストが来ると、その書き込みデータについて、重複排除コントローラ２０から格納先のデバイスが指定されているかどうか、データの格納先リクエストを参照する（図１１のステップＳ３１）。重複排除コントローラ２０からリクエストにより格納先デバイスを指定されている場合（図１１のステップＳ３１：Ｙｅｓ）、リクエストのあった格納先を取得する（図１１のステップＳ３２）。例えば、図４に示すリストのアドレスから、非重複ブロックの格納位置を取得する。 First, when a data write request is received in the data I / OI / F 31, the data storage destination request is referred to whether or not a storage destination device is designated from the deduplication controller 20 for the write data (FIG. 11). Step S31). When the storage destination device is designated by the request from the deduplication controller 20 (step S31 in FIG. 11: Yes), the storage destination in which the request has been made is acquired (step S32 in FIG. 11). For example, the storage position of the non-overlapping block is acquired from the address of the list shown in FIG.

そして、リストにより指定されたデバイス内の、上記取得した格納位置に、非重複ブロックを書き込み、アドレス変換テーブルを更新する（図１１のステップＳ３３）。なお、リクエストにより格納先デバイスが指定されていない場合は（図１１のステップＳ３１：Ｎｏ）、階層化ストレージシステム３０の既定ポリシーで指定されているデバイスに書き込み、アドレス変換テーブルを更新する（図１１のステップＳ３４）。 Then, the non-overlapping block is written in the acquired storage position in the device specified by the list, and the address conversion table is updated (step S33 in FIG. 11). When the storage destination device is not specified by the request (step S31 in FIG. 11: No), the address conversion table is updated by writing to the device specified by the default policy of the hierarchical storage system 30 (FIG. 11). Step S34).

以上の本発明におけるストレージシステムでは、ファイルを構成する非重複ブロックを、同一ファイルに属する関連する重複ブロックが格納されている同一のデバイスに格納することができ、データの用途や属性に応じた適切なデバイスに格納することができる。その結果、処理遅延や記憶装置の利用効率の低下を抑制し、また、記憶装置間のデータ移動処理を抑制でき、ストレージシステムの性能の向上を図ることができる。 In the above storage system according to the present invention, non-overlapping blocks constituting a file can be stored in the same device in which the related overlapping blocks belonging to the same file are stored. Can be stored on any device. As a result, it is possible to suppress a processing delay and a decrease in utilization efficiency of the storage device, and to suppress a data movement process between the storage devices, thereby improving the performance of the storage system.

ここで、上記では、非重複ブロックの格納先を決定する際に、当該非重複ブロックにし前後する所定範囲の重複ブロックの格納先を参照したが、必ずしも非重複ブロックに前後する所定範囲の重複ブロックを用いることに限定されない。非重複ブロックに対して、他の基準によって関連すると判断される重複ブロックの格納先を参照して、当該非重複ブロックの格納先を決定してもよい。 Here, in the above description, when determining the storage destination of a non-overlapping block, the storage destination of a predetermined range of overlapping blocks before and after the non-overlapping block is referred to. It is not limited to using. The storage location of the non-overlapping block may be determined with reference to the storage location of the overlapping block determined to be related to the non-overlapping block by other criteria.

なお、本発明におけるストレージシステムでは、上述した階層化ストレージシステム３０の代わりに、他の記憶装置を構成するデバイスを用いてもよい。例えば、アクセス速度が異なる複数のデータ保存が可能なデバイスを組み合わせたものとして、CPUのキャッシュ、メモリ、HDDの組み合わせ、を用いてもよい。但し、格納先となる記憶装置は、必ずしもアクセス速度が異なるものである必要はなく、同一の性能のものや他の性能が異なるものなど、いかなる記憶装置であってもよい。 In the storage system of the present invention, a device constituting another storage device may be used instead of the above-described hierarchical storage system 30. For example, a combination of a CPU cache, memory, and HDD may be used as a combination of devices capable of storing a plurality of data with different access speeds. However, the storage device as the storage destination does not necessarily have to have different access speeds, and may be any storage device having the same performance or different performance.

＜付記＞
上記実施形態の一部又は全部は、以下の付記のようにも記載されうる。以下、本発明におけるストレージシステム（図１２参照）、プログラム、データ格納方法の構成の概略を説明する。但し、本発明は、以下の構成に限定されない。 <Appendix>
Part or all of the above-described embodiment can be described as in the following supplementary notes. The outline of the configuration of the storage system (see FIG. 12), program, and data storage method in the present invention will be described below. However, the present invention is not limited to the following configuration.

（付記１）
記憶対象データが記憶装置に既に記憶されている重複状態であるか否かを判定する重複判定部と、
重複していない前記記憶対象データである非重複データの格納先を決定する格納先決定部と、
前記非重複データは、前記決定した格納先となる記憶装置に記憶し、重複している前記記憶対象データである重複データは、当該重複データと同一のデータ内容であり記憶装置に既に記憶されている他のデータを参照して当該重複データとして用いるデータ格納制御部と、を備え、
前記格納先決定部は、前記非重複データと予め設定された基準により関連すると判断される前記重複データの格納先を判定し、当該判定結果に基づいて、前記非重複データの格納先を決定する、
ストレージシステム。 (Appendix 1)
A duplication determination unit that determines whether or not the storage target data is in a duplication state that is already stored in the storage device;
A storage destination determination unit that determines a storage destination of non-duplicate data that is the storage target data that is not duplicated;
The non-duplicate data is stored in the storage device that is the determined storage destination, and the duplicate data that is the storage target data is the same data content as the duplicate data and is already stored in the storage device A data storage control unit used as the duplicate data with reference to other data that is present,
The storage location determination unit determines a storage location of the duplicate data that is determined to be related to the non-redundant data according to a preset criterion, and determines a storage location of the non-redundant data based on the determination result ,
Storage system.

（付記２）
付記１に記載のストレージシステムであって、
前記記憶対象データを構成する前記非重複データ及び前記重複データは、記憶するデータを複数のブロックに分割したデータであり、
前記格納先決定部は、分割前における前記非重複データの位置に前後する所定範囲に位置する前記重複データの格納先を判定し、当該判定結果に基づいて、前記非重複データの格納先を決定する、
ストレージシステム。 (Appendix 2)
The storage system according to attachment 1, wherein
The non-duplicate data and the duplicate data constituting the storage target data are data obtained by dividing the data to be stored into a plurality of blocks,
The storage destination determining unit determines a storage destination of the duplicate data located in a predetermined range before and after the position of the non-duplicate data before division, and determines a storage destination of the non-duplicate data based on the determination result To
Storage system.

（付記３）
付記２に記載のストレージシステムであって、
前記格納先決定部は、分割前における前記非重複データの位置に前後する所定範囲に位置する全てのブロックに対する前記重複データの割合が閾値以上である場合に、当該重複データの格納先を判定し、当該判定結果に基づいて、前記非重複データの格納先を決定する、
ストレージシステム。 (Appendix 3)
The storage system according to appendix 2,
The storage location determination unit determines the storage location of the duplicate data when the ratio of the duplicate data to all blocks located in a predetermined range before and after the position of the non-duplicate data before the division is equal to or greater than a threshold value. Based on the determination result, the storage location of the non-duplicate data is determined.
Storage system.

（付記４）
付記２又は３に記載のストレージシステムであって、
前記格納先決定部は、分割前における前記非重複データの位置に前後する所定範囲に位置する前記重複データの格納先毎の割合を判定し、当該判定結果に基づいて、前記非重複データの格納先を決定する、
ストレージシステム。 (Appendix 4)
The storage system according to appendix 2 or 3,
The storage destination determination unit determines a ratio of the duplicate data stored in a predetermined range before and after the position of the non-duplicate data before the division, and stores the non-duplicate data based on the determination result. Determine the destination,
Storage system.

（付記５）
付記４に記載のストレージシステムであって、
前記格納先決定部は、分割前における前記非重複データの位置に前後する所定範囲に位置する前記重複データの格納先毎の割合が、閾値以上である当該格納先となる記憶装置の性能を特定し、当該特定した記憶装置の性能を前記判定結果として、当該判定結果に基づいて前記非重複データの格納先を決定する、
ストレージシステム。 (Appendix 5)
The storage system according to appendix 4, wherein
The storage destination determination unit identifies the performance of the storage device that is the storage destination in which the ratio of each of the duplicate data storage destinations located in a predetermined range before and after the position of the non-duplicate data before the division is greater than or equal to a threshold value Then, the performance of the specified storage device is used as the determination result, and the storage destination of the non-duplicate data is determined based on the determination result.
Storage system.

（付記６）
付記５に記載のストレージシステムであって、
前記格納先決定部は、前記重複データの格納先毎の割合が閾値以上である当該格納先となる記憶装置の性能として、当該記憶装置の記憶領域に対するアクセス速度が所定の基準により高速であるか否かを特定し、当該特定した記憶装置の性能を前記判定結果として、当該判定結果に基づいて前記非重複データの格納先を決定する、
ストレージシステム。 (Appendix 6)
The storage system according to appendix 5,
The storage destination determination unit determines whether the access speed to the storage area of the storage device is high according to a predetermined standard as the performance of the storage device that is the storage destination in which the ratio of the duplicate data for each storage destination is equal to or greater than a threshold value. Determining whether or not the performance of the specified storage device is the determination result, and determining the storage location of the non-duplicate data based on the determination result;
Storage system.

（付記７）
付記１乃至６のいずれかに記載のストレージシステムであって、
前記非重複データの格納先を予め設定した規定ポリシーと、前記判定結果に応じて前記非重複データの格納先を設定した判定結果ポリシーと、を記憶し、
前記格納先決定部は、前記判定結果に基づいて、前記規定ポリシーか前記判定結果ポリシーのいずれを用いるか決定し、決定されたポリシーに基づいて、前記非重複データの格納先を決定する、
ストレージシステム。 (Appendix 7)
The storage system according to any one of appendices 1 to 6,
A prescription policy in which a storage destination of the non-duplicate data is set in advance and a determination result policy in which a storage destination of the non-duplicate data is set in accordance with the determination result are stored,
The storage destination determination unit determines whether to use the specified policy or the determination result policy based on the determination result, and determines a storage destination of the non-duplicate data based on the determined policy.
Storage system.

（付記８）
付記７に記載のストレージシステムであって、
前記格納先決定部は、
記憶するデータの属性に応じて前記規定ポリシーを用いるか否かを決定し、
記憶するデータの属性に応じて前記規定ポリシーを用いることを決定した場合には、当該規定ポリシーに基づいて前記非重複データの格納先を決定し、
記憶するデータの属性に応じて前記規定ポリシーを用いないことを決定した場合には、前記判定結果に基づいて、前記規定ポリシーか前記判定結果ポリシーのいずれを用いるか決定し、決定されたポリシーに基づいて、前記非重複データの格納先を決定する、
ストレージシステム。 (Appendix 8)
The storage system according to appendix 7,
The storage location determination unit
Decide whether to use the prescribed policy according to the attribute of the data to be stored,
When it is determined to use the specified policy according to the attribute of the data to be stored, the storage destination of the non-duplicate data is determined based on the specified policy,
When it is determined not to use the specified policy according to the attribute of the data to be stored, it is determined based on the determination result whether to use the specified policy or the determination result policy, and the determined policy To determine a storage location for the non-duplicate data,
Storage system.

上記発明によると、まず、記憶対象データが記憶装置に既に記憶されているか（重複データであるか）、まだ記憶されていないか（非重複データ）、を判定し、重複データは既に記憶装置に記憶されている既存のデータを参照することで、記憶処理を完了する。一方、非重複データについては、当該非重複データと関連のある重複データの格納先を判定して、その判定結果に基づいて非重複データの格納先を決定し、決定した格納先の記憶装置に非重複データを格納する。 According to the above invention, first, it is determined whether the storage target data is already stored in the storage device (whether it is duplicate data) or not yet stored (non-duplication data), and the duplicate data is already stored in the storage device. The storage process is completed by referring to the existing data stored. On the other hand, for non-duplicate data, the storage location of the duplicate data related to the non-duplication data is determined, the storage location of the non-duplication data is determined based on the determination result, and the storage device at the determined storage location is determined. Store non-duplicate data.

このとき、例えば、非重複データに関連する重複データとしては、非重複データであるブロックの分割前のデータ配置における前後の所定範囲に位置するブロックを用いる。そして、非重複データに対する前後の所定範囲に位置するブロックのうち、当該非重複データに関連する重複データの割合や、当該重複データの格納先毎の割合に応じて、非重複データの格納先を決定する。例えば、重複データの格納先となる記憶装置のアクセス性能に応じて、非重複データの格納先を決定する。さらには、記憶するデータの属性や上記判定結果に応じて、規定ポリシーによって設定された格納先とするか、判定結果に応じた格納先とするか、を決定する。 At this time, for example, as duplicate data related to non-redundant data, a block located in a predetermined range before and after the data arrangement before division of the block which is non-redundant data is used. And, among the blocks located in the predetermined range before and after the non-duplicate data, the storage location of the non-duplication data is determined according to the ratio of the duplicate data related to the non-duplication data and the ratio for each storage destination of the duplicate data. decide. For example, the storage destination of non-redundant data is determined according to the access performance of the storage device that is the storage destination of the duplicate data. Further, it is determined whether the storage destination is set according to the prescribed policy or the storage destination is determined according to the determination result according to the attribute of the data to be stored and the determination result.

これにより、本発明では、非重複データを、関連する重複データが格納されている記憶装置と同一又は関連する記憶装置に格納することができ、データの用途や属性に応じた適切な記憶装置に格納することができる。その結果、処理遅延や記憶装置の利用効率の低下を抑制し、また、記憶装置間のデータ移動処理を抑制でき、ストレージシステムの性能の向上を図ることができる。 Thus, in the present invention, non-duplicate data can be stored in a storage device that is the same as or related to the storage device in which the related duplicate data is stored, and the appropriate storage device according to the use and attribute of the data. Can be stored. As a result, it is possible to suppress a processing delay and a decrease in utilization efficiency of the storage device, and to suppress a data movement process between the storage devices, thereby improving the performance of the storage system.

（付記９）
情報処理装置に、
記憶対象データが記憶装置に既に記憶されている重複状態であるか否かを判定する重複判定部と、
重複していない前記記憶対象データである非重複データの格納先を決定する格納先決定部と、
前記非重複データは、前記決定した格納先となる記憶装置に記憶し、重複している前記記憶対象データである重複データは、当該重複データと同一のデータ内容であり記憶装置に既に記憶されている他のデータを参照して当該重複データとして用いるデータ格納制御部と、を実現させると共に、
前記格納先決定部は、前記非重複データと予め設定された基準により関連すると判断される前記重複データの格納先を判定し、当該判定結果に基づいて、前記非重複データの格納先を決定する、
ことを実現させるためのプログラム。 (Appendix 9)
In the information processing device,
A duplication determination unit that determines whether or not the storage target data is in a duplication state that is already stored in the storage device;
A storage destination determination unit that determines a storage destination of non-duplicate data that is the storage target data that is not duplicated;
The non-duplicate data is stored in the storage device that is the determined storage destination, and the duplicate data that is the storage target data is the same data content as the duplicate data and is already stored in the storage device And a data storage control unit that is used as the duplicate data with reference to other data,
The storage location determination unit determines a storage location of the duplicate data that is determined to be related to the non-redundant data according to a preset criterion, and determines a storage location of the non-redundant data based on the determination result ,
A program to make things happen.

（付記１０）
記憶対象データが記憶装置に既に記憶されている重複状態であるか否かを判定し、
重複していない前記記憶対象データである非重複データの格納先を決定し、
前記非重複データは、前記決定した格納先となる記憶装置に記憶し、重複している前記記憶対象データである重複データは、当該重複データと同一のデータ内容であり記憶装置に既に記憶されている他のデータを参照して当該重複データとして用いるデータ格納処理を行うデータ格納方法であって、
前記非重複データと予め設定された基準により関連すると判断される前記重複データの格納先を判定し、当該判定結果に基づいて、前記非重複データの格納先を決定する、
データ格納方法。 (Appendix 10)
Determine whether the data to be stored is in a duplicate state already stored in the storage device,
Determine the storage destination of non-duplicate data that is the storage target data that is not duplicated,
The non-duplicate data is stored in the storage device that is the determined storage destination, and the duplicate data that is the storage target data is the same data content as the duplicate data and is already stored in the storage device A data storage method for performing data storage processing to be used as the duplicate data with reference to other data,
Determining the storage location of the duplicate data that is determined to be related to the non-duplicate data according to a preset criterion, and determining the storage location of the non-duplication data based on the determination result;
Data storage method.

（付記１１）
付記１０に記載のデータ格納方法であって、
前記記憶対象データを構成する前記非重複データ及び前記重複データは、記憶するデータを複数のブロックに分割したデータであり、
分割前における前記非重複データの位置に前後する所定範囲に位置する前記重複データの格納先を判定し、当該判定結果に基づいて、前記非重複データの格納先を決定する、
データ格納方法。 (Appendix 11)
The data storage method according to appendix 10, wherein
The non-duplicate data and the duplicate data constituting the storage target data are data obtained by dividing the data to be stored into a plurality of blocks,
Determining the storage destination of the duplicate data located in a predetermined range before and after the position of the non-redundant data before the division, and determining the storage destination of the non-redundant data based on the determination result;
Data storage method.

（付記１２）
付記１１に記載のデータ格納方法であって、
分割前における前記非重複データの位置に前後する所定範囲に位置する全てのブロックに対する前記重複データの割合が閾値以上である場合に、当該重複データの格納先を判定し、当該判定結果に基づいて、前記非重複データの格納先を決定する、
データ格納方法。 (Appendix 12)
The data storage method according to attachment 11, wherein
When the ratio of the duplicate data to all the blocks located in a predetermined range before and after the position of the non-duplicate data before the division is equal to or greater than a threshold, the storage location of the duplicate data is determined, and based on the determination result Determining a storage location of the non-duplicate data;
Data storage method.

（付記１３）
付記１１又は１２に記載のデータ格納方法であって、
分割前における前記非重複データの位置に前後する所定範囲に位置する前記重複データの格納先毎の割合を判定し、当該判定結果に基づいて、前記非重複データの格納先を決定する、
データ格納方法。 (Appendix 13)
The data storage method according to appendix 11 or 12,
Determining a ratio for each storage destination of the duplicate data located in a predetermined range before and after the position of the non-duplicate data before the division, and determining a storage destination of the non-duplication data based on the determination result;
Data storage method.

（付記１４）
付記１３に記載のデータ格納方法であって、
分割前における前記非重複データの位置に前後する所定範囲に位置する前記重複データの格納先毎の割合が、閾値以上である当該格納先となる記憶装置の性能を特定し、当該特定した記憶装置の性能を前記判定結果として、当該判定結果に基づいて前記非重複データの格納先を決定する、
データ格納方法。 (Appendix 14)
A data storage method according to attachment 13, wherein
The performance of the storage device that is the storage destination in which the ratio for each storage destination of the duplicate data that is located in a predetermined range before and after the position of the non-duplicate data before the division is equal to or greater than a threshold is specified, and the specified storage device As the determination result, the storage destination of the non-duplicate data is determined based on the determination result.
Data storage method.

（付記１５）
付記１４に記載のデータ格納方法であって、
前記重複データの格納先毎の割合が閾値以上である当該格納先となる記憶装置の性能として、当該記憶装置の記憶領域に対するアクセス速度が所定の基準により高速であるか否かを特定し、当該特定した記憶装置の性能を前記判定結果として、当該判定結果に基づいて前記非重複データの格納先を決定する、
データ格納方法。 (Appendix 15)
The data storage method according to attachment 14, wherein
As a performance of the storage device that is the storage destination in which the ratio for each storage destination of the duplicate data is equal to or greater than a threshold value, specify whether the access speed to the storage area of the storage device is high speed according to a predetermined criterion, The performance of the identified storage device is used as the determination result, and the storage destination of the non-duplicate data is determined based on the determination result.
Data storage method.

（付記１６）
付記１０乃至１５のいずれかに記載のデータ格納方法であって、
前記判定結果に基づいて、前記非重複データの格納先を予め設定した規定ポリシーか、前記判定結果に応じて前記非重複データの格納先を設定した判定結果ポリシー、のいずれを用いるか決定し、決定されたポリシーに基づいて、前記非重複データの格納先を決定する、
データ格納方法。 (Appendix 16)
The data storage method according to any one of appendices 10 to 15,
Based on the determination result, it is determined whether to use a prescribed policy in which the storage destination of the non-duplicate data is set in advance or a determination result policy in which the storage destination of the non-duplication data is set in accordance with the determination result, Based on the determined policy, a storage location of the non-duplicate data is determined.
Data storage method.

（付記１７）
付記１６に記載のデータ格納方法であって、
記憶するデータの属性に応じて前記規定ポリシーを用いるか否かを決定し、
記憶するデータの属性に応じて前記規定ポリシーを用いることを決定した場合には、当該規定ポリシーに基づいて前記非重複データの格納先を決定し、
記憶するデータの属性に応じて前記規定ポリシーを用いないことを決定した場合には、前記判定結果に基づいて、前記規定ポリシーか前記判定結果ポリシーのいずれを用いるか決定し、決定されたポリシーに基づいて、前記非重複データの格納先を決定する、
データ格納方法。 (Appendix 17)
The data storage method according to appendix 16, wherein
Decide whether to use the prescribed policy according to the attribute of the data to be stored,
When it is determined to use the specified policy according to the attribute of the data to be stored, the storage destination of the non-duplicate data is determined based on the specified policy,
When it is determined not to use the specified policy according to the attribute of the data to be stored, it is determined based on the determination result whether to use the specified policy or the determination result policy, and the determined policy To determine a storage location for the non-duplicate data,
Data storage method.

なお、上述したプログラムは、記憶装置に記憶されていたり、コンピュータが読み取り可能な記録媒体に記録されている。例えば、記録媒体は、フレキシブルディスク、光ディスク、光磁気ディスク、及び、半導体メモリ等の可搬性を有する媒体である。 Note that the above-described program is stored in a storage device or recorded on a computer-readable recording medium. For example, the recording medium is a portable medium such as a flexible disk, an optical disk, a magneto-optical disk, and a semiconductor memory.

以上、上記実施形態等を参照して本願発明を説明したが、本願発明は、上述した実施形態に限定されるものではない。本願発明の構成や詳細には、本願発明の範囲内で当業者が理解しうる様々な変更をすることができる。 Although the present invention has been described with reference to the above-described embodiment and the like, the present invention is not limited to the above-described embodiment. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

１ストレージシステム
２アクセスノード
３ストレージノード
２０重複排除コントローラ
２１重複排除処理部
２２データ管理部
２３ NASインタフェース
２４仮想ファイルシステム
２５データI/O I/F
２６非重複ブロック格納位置判定部
２７判定選択ポリシー
３０階層化ストレージシステム
３１データI/O I/F
３２既存データ格納先取得I/F
３３新規データ格納先リクエストI/F
３４階層化管理部
３５データ格納I/F管理部
３６データ格納先リクエスト部
３７データ格納先変更部
４１高速デバイス
４２中速デバイス
４３低速デバイス
１００ストレージシステム
１０１重複判定部
１０２格納先決定部
１０３データ格納制御部
１１０記憶装置
DESCRIPTION OF SYMBOLS 1 Storage system 2 Access node 3 Storage node 20 Deduplication controller 21 Deduplication processing part 22 Data management part 23 NAS interface 24 Virtual file system 25 Data I / OI / F
26 Non-overlapping block storage position determination unit 27 Determination selection policy 30 Hierarchical storage system 31 Data I / OI / F
32 Existing data storage location acquisition I / F
33 New data storage location request I / F
34 Hierarchical management unit 35 Data storage I / F management unit 36 Data storage destination request unit 37 Data storage destination change unit 41 High speed device 42 Medium speed device 43 Low speed device 100 Storage system 101 Duplicate determination unit 102 Storage destination determination unit 103 Data storage Control unit 110 storage device

Claims

A duplication determination unit that determines whether or not the storage target data is in a duplication state that is already stored in the storage device;
A storage destination determination unit that determines a storage destination of non-duplicate data that is the storage target data that is not duplicated;
The non-duplicate data is stored in the storage device that is the determined storage destination, and the duplicate data that is the storage target data is the same data content as the duplicate data and is already stored in the storage device A data storage control unit used as the duplicate data with reference to other data that is present,
The non-duplicate data and the duplicate data constituting the storage target data are data obtained by dividing the data to be stored into a plurality of blocks,
The storage destination determining unit determines a storage destination of the duplicate data located in a predetermined range before and after the position of the non-duplicate data before division, and determines a storage destination of the non-duplicate data based on the determination result To
Storage system.

The storage system according to claim 1 ,
The storage location determination unit determines the storage location of the duplicate data when the ratio of the duplicate data to all blocks located in a predetermined range before and after the position of the non-duplicate data before the division is equal to or greater than a threshold value. Based on the determination result, the storage location of the non-duplicate data is determined.
Storage system.

The storage system according to claim 1 or 2 ,
The storage destination determination unit determines a ratio of the duplicate data stored in a predetermined range before and after the position of the non-duplicate data before the division, and stores the non-duplicate data based on the determination result. Determine the destination,
Storage system.

The storage system according to claim 3 ,
The storage destination determination unit identifies the performance of the storage device that is the storage destination in which the ratio of each of the duplicate data storage destinations located in a predetermined range before and after the position of the non-duplicate data before the division is greater than or equal to a threshold value Then, the performance of the specified storage device is used as the determination result, and the storage destination of the non-duplicate data is determined based on the determination result.
Storage system.

The storage system according to claim 4 ,
The storage destination determination unit determines whether the access speed to the storage area of the storage device is high according to a predetermined standard as the performance of the storage device that is the storage destination in which the ratio of the duplicate data for each storage destination is equal to or greater than a threshold value. Determining whether or not the performance of the specified storage device is the determination result, and determining the storage location of the non-duplicate data based on the determination result;
Storage system.

The storage system according to any one of claims 1 to 5 ,
A prescription policy in which a storage destination of the non-duplicate data is set in advance and a determination result policy in which a storage destination of the non-duplicate data is set in accordance with the determination result are stored,
The storage destination determination unit determines whether to use the specified policy or the determination result policy based on the determination result, and determines a storage destination of the non-duplicate data based on the determined policy.
Storage system.

The storage system according to claim 6 ,
The storage location determination unit
First, determine whether to use the prescribed policy based on the attribute of the data to be stored,
When it is determined to use the specified policy based on the attribute of the data to be stored, the storage location of the non-duplicate data is determined based on the specified policy,
When it is determined not to use the specified policy based on the attribute of the data to be stored, it is determined based on the determination result whether to use the specified policy or the determination result policy. To determine a storage location for the non-duplicate data,
Storage system.

A duplication determination unit that determines whether or not the storage target data is in a duplication state that is already stored in the storage device;
A storage destination determination unit that determines a storage destination of non-duplicate data that is the storage target data that is not duplicated;
The non-duplicate data is stored in the storage device that is the determined storage destination, and the duplicate data that is the storage target data is the same data content as the duplicate data and is already stored in the storage device A data storage control unit used as the duplicate data with reference to other data that is present,
The storage destination determination portion, the determined storage destination of the duplication data is judged to be related by a predetermined reference and non-overlapping data, based on the determination result, determines a storage destination of the non-duplicate data ,
Further, a prescribed policy in which the storage destination of the non-duplicate data is set in advance, and a determination result policy in which the storage destination of the non-duplicate data is set in accordance with the determination result are stored,
The storage destination determination unit determines whether to use the specified policy or the determination result policy based on the determination result, and determines a storage destination of the non-duplicate data based on the determined policy.
Storage system.

In the information processing device,
A duplication determination unit that determines whether or not the storage target data is in a duplication state that is already stored in the storage device;
A storage destination determination unit that determines a storage destination of non-duplicate data that is the storage target data that is not duplicated;
The non-duplicate data is stored in the storage device that is the determined storage destination, and the duplicate data that is the storage target data is the same data content as the duplicate data and is already stored in the storage device And a data storage control unit that is used as the duplicate data with reference to other data,
The non-duplicate data and the duplicate data constituting the storage target data are data obtained by dividing the data to be stored into a plurality of blocks,
The storage destination determining unit determines a storage destination of the duplicate data located in a predetermined range before and after the position of the non-duplicate data before division, and determines a storage destination of the non-duplicate data based on the determination result To
A program to make things happen.

In the information processing device,
A duplication determination unit that determines whether or not the storage target data is in a duplication state that is already stored in the storage device;
A storage destination determination unit that determines a storage destination of non-duplicate data that is the storage target data that is not duplicated;
The non-duplicate data is stored in the storage device that is the determined storage destination, and the duplicate data that is the storage target data is the same data content as the duplicate data and is already stored in the storage device And a data storage control unit that is used as the duplicate data with reference to other data,
The storage destination determination portion, the determined storage destination of the duplication data is judged to be related by a predetermined reference and non-overlapping data, based on the determination result, determines a storage destination of the non-duplicate data ,
Further, the information processing apparatus stores a regulation policy in which a storage destination of the non-duplicate data is set in advance, and a determination result policy in which the storage destination of the non-duplicate data is set according to the determination result,
The storage destination determination unit determines whether to use the specified policy or the determination result policy based on the determination result, and determines a storage destination of the non-duplicate data based on the determined policy.
A program to make things happen.

Determine whether the data to be stored is in a duplicate state already stored in the storage device,
Determine the storage destination of non-duplicate data that is the storage target data that is not duplicated,
The non-duplicate data is stored in the storage device that is the determined storage destination, and the duplicate data that is the storage target data is the same data content as the duplicate data and is already stored in the storage device A data storage method for performing data storage processing to be used as the duplicate data with reference to other data,
The non-duplicate data and the duplicate data constituting the storage target data are data obtained by dividing the data to be stored into a plurality of blocks,
Determining the storage destination of the duplicate data located in a predetermined range before and after the position of the non-redundant data before the division, and determining the storage destination of the non-redundant data based on the determination result;
Data storage method.

Determine whether the data to be stored is in a duplicate state already stored in the storage device,
Determine the storage destination of non-duplicate data that is the storage target data that is not duplicated,
The non-duplicate data is stored in the storage device that is the determined storage destination, and the duplicate data that is the storage target data is the same data content as the duplicate data and is already stored in the storage device A data storage method for performing data storage processing to be used as the duplicate data with reference to other data,
Determine the storage location of the duplicate data determined to be related to the non-duplicate data according to a preset criterion, and determine the storage location of the non-duplication data based on the determination result ,
Furthermore, a prescription policy in which the storage destination of the non-duplicate data is set in advance and a determination result policy in which the storage destination of the non-duplicate data is set according to the determination result are stored,
Determining whether to use the specified policy or the determination result policy based on the determination result, and determining a storage destination of the non-duplicate data based on the determined policy;
Data storage method.