JP2015158765A

JP2015158765A - storage system

Info

Publication number: JP2015158765A
Application number: JP2014032590A
Authority: JP
Inventors: 悠永田; Hisashi Nagata
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2014-02-24
Filing date: 2014-02-24
Publication date: 2015-09-03
Anticipated expiration: 2034-02-24
Also published as: JP6337507B2

Abstract

PROBLEM TO BE SOLVED: To solve a problem in which the efficiency of eliminating redundant storage is lowered due to RAW backup.SOLUTION: A storage system 100 of the present invention includes: image analysis means 101 that analyzes RAW image data targeted for RAW backup; and backup means 102 that stores the RAW image data in a storage device while eliminating redundant storage eliminated so as to perform RAW backup. The image analysis means analyzes RAW image data and extracts data which can be classified into predetermined groups from the RAW image data. The backup means combines the extracted data into one piece of storage data for each of the classified groups and combines data which is not extracted into one piece of storage data; and stores each of the pieces of storage data obtained by the combination.

Description

本発明は、ストレージシステムにかかり、特に、重複記憶排除機能を有するストレージシステムに関する。 The present invention relates to a storage system, and more particularly to a storage system having a duplicate storage elimination function.

近年、コンピュータの発達及び普及に伴い、種々の情報がデジタルデータ化されている。このようなデジタルデータを保存しておく装置として、磁気テープや磁気ディスクなどの記憶装置がある。そして、保存すべきデータは日々増大し、膨大な量となるため、大容量なストレージシステムが必要となっている。また、記憶装置に費やすコストを削減しつつ、信頼性も必要とされる。これに加えて、後にデータを容易に取り出すことが可能であることも必要である。その結果、自動的に記憶容量や性能の増大を実現できると共に、重複記憶を排除して記憶コストを削減し、さらには、冗長性の高いストレージシステムが望まれている。 In recent years, with the development and spread of computers, various types of information have been converted into digital data. As a device for storing such digital data, there are storage devices such as a magnetic tape and a magnetic disk. Since the data to be stored increases day by day and becomes enormous, a large-capacity storage system is required. In addition, reliability is required while reducing the cost of the storage device. In addition to this, it is necessary that data can be easily retrieved later. As a result, there is a demand for a storage system that can automatically increase storage capacity and performance, eliminate duplicate storage, reduce storage costs, and have high redundancy.

このような状況に応じて、近年では、特許文献１に示すように、コンテンツアドレスストレージシステムが開発されている。このコンテンツアドレスストレージシステムは、データを分散して複数の記憶装置に記憶すると共に、このデータの内容に応じて特定される固有のコンテンツアドレスによって、当該データを格納した格納位置が特定される。 In response to such a situation, in recent years, a content address storage system has been developed as shown in Patent Document 1. In this content address storage system, data is distributed and stored in a plurality of storage devices, and the storage location where the data is stored is specified by a unique content address specified according to the content of the data.

このように、コンテンツアドレスは、データの内容に応じて固有となるよう生成されるため、重複データであれば、同じ格納位置のデータを参照することで、同一内容のデータを取得することができる。従って、重複データを別々に格納する必要がなく、重複記録を排除し、データ容量の削減を図ることができる。つまり、コンテンツアドレスストレージシステムでは、同一内容のデータが記憶されていない場合だけ、新たなデータが記憶されるという重複排除機能を有している。 As described above, since the content address is generated so as to be unique according to the content of the data, if it is duplicated data, the data of the same content can be acquired by referring to the data at the same storage position. . Therefore, it is not necessary to store the duplicate data separately, and duplicate recording can be eliminated and the data capacity can be reduced. That is, the content address storage system has a deduplication function in which new data is stored only when data of the same content is not stored.

また、ストレージシステムでは、所定容量のブロックデータであるチャンクを複数のフラグメントデータに分割すると共に、冗長データとなるフラグメントをさらに付加して、これら複数のフラグメントデータをそれぞれ複数の記憶装置にそれぞれ格納している。そして、後にコンテンツアドレスを指定することにより、当該コンテンツアドレスにて特定される格納位置に格納されているフラグメントデータを読み出し、複数のフラグメントデータから分割前のチャンクを復元することができる。 In addition, the storage system divides a chunk, which is block data of a predetermined capacity, into a plurality of fragment data, and further adds a fragment that becomes redundant data, and stores each of the plurality of fragment data in a plurality of storage devices. ing. Then, by designating the content address later, the fragment data stored in the storage location specified by the content address can be read, and the chunk before division can be restored from the plurality of fragment data.

このように、ストレージシステムは、冗長データとなるフラグメントデータを付加しているため、付加した冗長データのフラグメント数以下のフラグメントデータが失われた場合でも、元のチャンクを再生成することができる。 In this way, since the storage system adds fragment data that becomes redundant data, even if fragment data equal to or less than the number of fragments of the added redundant data is lost, the original chunk can be regenerated.

特開２０１３−１８２４７６JP2013-182476

ここで、近年では、ＳＳＤ（Solid State Drive）など小Ｉ／Ｏにおいても十分に高速なデバイスが大容量化し、一般的に利用されることが予想される。このようなデバイスでは、デフラグの必要性が低下し、フラグメント化したファイルシステムイメージがバックアップされる可能性が高まる。ところが、このようなファイルシステムのＲＡＷバックアップは、重複排除ストレージシステムを利用して重複排除の効率を高めることが難しい。 In recent years, it is expected that a sufficiently high-speed device has a large capacity and is generally used even in a small I / O such as an SSD (Solid State Drive). Such devices reduce the need for defragmentation and increase the likelihood that a fragmented file system image will be backed up. However, it is difficult for such a RAW backup of a file system to increase the efficiency of deduplication using a deduplication storage system.

例えば、ＳＡＮ（Storage Area Network）ストレージ上に作成されているファイルシステムを重複排除ストレージにＲＡＷバックアップする際に、ファイルシステムのフラグメント化が進んでいると、一部のファイルの更新で重複率が大きく低下する。これは、ファイルシステムイメージ上にファイルは連続して配置されていないため、同じデータを持つファイルが存在していたとしても、重複することが出来なくなるためである。 For example, when a file system created on SAN (Storage Area Network) storage is RAW backed up to deduplication storage, if the file system is fragmented, the duplication rate will increase due to the update of some files. descend. This is because the files are not continuously arranged on the file system image, and even if files having the same data exist, they cannot be duplicated.

このため、本発明の目的は、上述した課題である、ＲＡＷバックアップを行うことによる重複記憶排除の効率が低下する、ということを解決することにある。 Therefore, an object of the present invention is to solve the above-described problem that the efficiency of eliminating duplicate storage by performing RAW backup decreases.

本発明の一形態であるストレージシステムは、
ＲＡＷバックアップの対象となるＲＡＷイメージデータを解析するイメージ解析手段と、
前記ＲＡＷイメージデータを、重複記憶を排除して記憶装置に記憶してＲＡＷバックアップを行うバックアップ手段と、
を備え、
前記イメージ解析手段は、前記ＲＡＷイメージデータを解析して、当該ＲＡＷイメージデータから所定のグループに分類できるデータを抽出し、
前記バックアップ手段は、前記イメージ解析手段にて抽出されたデータを、分類された前記グループ毎にそれぞれ１つの記憶データとしてまとめると共に、抽出されなかったデータを１つの記憶データとしてまとめ、これらまとめた各記憶データをそれぞれ記憶する、
という構成をとる。 A storage system according to an aspect of the present invention
Image analysis means for analyzing RAW image data to be subjected to RAW backup;
Backup means for performing RAW backup by storing the RAW image data in a storage device by eliminating duplicate storage;
With
The image analysis means analyzes the RAW image data and extracts data that can be classified into a predetermined group from the RAW image data.
The backup means summarizes the data extracted by the image analysis means as one storage data for each of the classified groups, and summarizes the data not extracted as one storage data. Store each stored data,
The configuration is as follows.

また、本発明の一形態であるプログラムは、
情報処理装置に、
ＲＡＷバックアップの対象となるＲＡＷイメージデータを解析するイメージ解析手段と、
前記ＲＡＷイメージデータを、重複記憶を排除して記憶装置に記憶してＲＡＷバックアップを行うバックアップ手段と、
を実現させると共に、
前記イメージ解析手段は、前記ＲＡＷイメージデータを解析して、当該ＲＡＷイメージデータから所定のグループに分類できるデータを抽出し、
前記バックアップ手段は、前記イメージ解析手段にて抽出されたデータを、分類された前記グループ毎にそれぞれ１つの記憶データとしてまとめると共に、抽出されなかったデータを１つの記憶データとしてまとめ、これらまとめた各記憶データをそれぞれ記憶する、
ことを実現させる、
という構成をとる。 In addition, a program which is one embodiment of the present invention is
In the information processing device,
Image analysis means for analyzing RAW image data to be subjected to RAW backup;
Backup means for performing RAW backup by storing the RAW image data in a storage device by eliminating duplicate storage;
And realize
The image analysis means analyzes the RAW image data and extracts data that can be classified into a predetermined group from the RAW image data.
The backup means summarizes the data extracted by the image analysis means as one storage data for each of the classified groups, and summarizes the data not extracted as one storage data. Store each stored data,
Make it happen,
The configuration is as follows.

また、本発明の一形態であるバックアップ方法は、
ＲＡＷバックアップの対象となるＲＡＷイメージデータを解析し、
前記ＲＡＷイメージデータを、重複記憶を排除して記憶装置に記憶してＲＡＷバックアップを行い、
前記ＲＡＷイメージデータの解析の際に、当該ＲＡＷイメージデータから所定のグループに分類できるデータを抽出し、
前記ＲＡＷバックアップの際に、前記ＲＡＷイメージデータの解析で抽出されたデータを、分類された前記グループ毎にそれぞれ１つの記憶データとしてまとめると共に、抽出されなかったデータを１つの記憶データとしてまとめ、これらまとめた各記憶データをそれぞれ記憶する、
という構成をとる。 In addition, a backup method according to one aspect of the present invention includes:
Analyzing the raw image data subject to raw backup,
The RAW image data is stored in a storage device by eliminating duplicate storage, and RAW backup is performed.
During the analysis of the RAW image data, data that can be classified into a predetermined group is extracted from the RAW image data,
The data extracted by the analysis of the RAW image data at the time of the RAW backup is summarized as one storage data for each of the classified groups, and the data not extracted is summarized as one storage data. Store each stored data individually,
The configuration is as follows.

本発明は、以上のように構成されることにより、ＲＡＷイメージデータのＲＡＷバックアップの際にも重複記憶排除の効率の低下を抑制することができる。 By configuring as described above, the present invention can suppress a decrease in the efficiency of eliminating duplicate storage even during RAW backup of RAW image data.

本発明の重複排除ストレージ装置を含む情報処理システムの全体構成を示すブロック図である。1 is a block diagram illustrating an overall configuration of an information processing system including a deduplication storage apparatus according to the present invention. 図１に開示したＳＡＮ装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the SAN device disclosed in FIG. 図１に開示した重複排除ストレージ装置の構成を示す機能ブロック図である。FIG. 2 is a functional block diagram illustrating a configuration of a deduplication storage device disclosed in FIG. 1. 図３に開示した重複排除ストレージ装置によるデータを記憶するときの様子を示す図である。It is a figure which shows a mode when the data by the deduplication storage apparatus disclosed in FIG. 3 are memorize | stored. 図１に開示した情報処理システムの動作を示すフローチャートである。3 is a flowchart illustrating an operation of the information processing system disclosed in FIG. 1. 図１に開示した情報処理システムの動作を示すフローチャートである。3 is a flowchart illustrating an operation of the information processing system disclosed in FIG. 1. 図３に開示した重複排除ストレージ装置の他の構成例を示す図である。FIG. 4 is a diagram illustrating another configuration example of the deduplication storage device disclosed in FIG. 3. 図３に開示した重複排除ストレージ装置によるデータ書き込み時の様子を示す図である。It is a figure which shows the mode at the time of the data writing by the deduplication storage apparatus disclosed in FIG. 図３に開示した重複排除ストレージ装置によるデータ書き込み時の様子を示す図である。It is a figure which shows the mode at the time of the data writing by the deduplication storage apparatus disclosed in FIG. 図３に開示した重複排除ストレージ装置によるデータを記憶するときの他の様子を示す図である。It is a figure which shows the other mode when storing the data by the deduplication storage apparatus disclosed in FIG. 本発明の付記１におけるストレージシステムの構成を示す図である。It is a figure which shows the structure of the storage system in attachment 1 of this invention.

＜実施形態１＞
本発明の第１の実施形態を、図１乃至図１０を参照して説明する。図１は、情報処理システムの全体構成を示すブロック図である。図２は、ＳＡＮ装置の構成を示す機能ブロック図であり、図３は、重複排除ストレージ装置の構成を示す機能ブロック図である。図４は、重複排除ストレージ装置によるＲＡＷイメージデータをＲＡＷバックアップするときの処理の様子を示す図である。図５乃至図６は、情報処理システムの動作を示すフローチャートである。図７乃至図９は、重複排除ストレージ装置によるデータ書き込み時の様子を示す図である。図１０は、重複排除ストレージ装置によるＲＡＷイメージデータをＲＡＷバックするときの他の処理の様子を示す図である。 <Embodiment 1>
A first embodiment of the present invention will be described with reference to FIGS. FIG. 1 is a block diagram showing the overall configuration of the information processing system. FIG. 2 is a functional block diagram showing the configuration of the SAN device, and FIG. 3 is a functional block diagram showing the configuration of the deduplication storage device. FIG. 4 is a diagram showing a state of processing when RAW image data is backed up by the deduplication storage apparatus. 5 to 6 are flowcharts showing the operation of the information processing system. FIG. 7 to FIG. 9 are diagrams showing a state at the time of data writing by the deduplication storage apparatus. FIG. 10 is a diagram illustrating another process when the RAW image data is RAW backed by the deduplication storage apparatus.

［構成］
図１に示すように、情報処理システムは、ＳＡＮ（Storage Area Network）装置１、重複排除ストレージ装置２、業務サーバ装置３、を備えている。そして、これらはそれぞれ通信路によって接続されている。この通信路としては、例えば、Ethernet（登録商標）やFibre Channelなどで接続されたネットワークである。そして、本実施形態では、業務サーバ装置３がＳＡＮ装置２上のボリュームを使用しており、このボリュームをＳＡＮ装置１から重複排除ストレージ装置２にバックアップする場合を想定して説明する。 [Constitution]
As shown in FIG. 1, the information processing system includes a SAN (Storage Area Network) device 1, a deduplication storage device 2, and a business server device 3. These are connected by a communication path. The communication path is, for example, a network connected by Ethernet (registered trademark) or Fiber Channel. In this embodiment, the business server apparatus 3 uses a volume on the SAN apparatus 2 and the volume is backed up from the SAN apparatus 1 to the deduplication storage apparatus 2 for explanation.

ＳＡＮ装置１は、図２に示すように、通信部１０、差分マップ取得部１１、ペアリング部１２、レプリケート部１３、セパレート部１４、差分情報蓄積部１５、を備えている。これら各部は、専用の回路によって構成されていたり、演算装置にプログラムが組み込まれることで構築されている。 As shown in FIG. 2, the SAN device 1 includes a communication unit 10, a difference map acquisition unit 11, a pairing unit 12, a replicate unit 13, a separate unit 14, and a difference information storage unit 15. Each of these units is configured by a dedicated circuit, or is constructed by incorporating a program into an arithmetic device.

また、ＳＡＮ装置１は、装備する記憶装置に、差分マップ情報１６を記憶する。また、ＳＡＮ装置１は、記憶装置に、バックアップ対象となるデータを記憶するボリュームであるＭＶ（Master Volume）１７と、ＲＶ（Replica Volume）１８とを備えている。 Further, the SAN device 1 stores the difference map information 16 in a storage device that is equipped. In addition, the SAN device 1 includes a MV (Master Volume) 17 and a RV (Replica Volume) 18 that are volumes for storing data to be backed up in the storage device.

上記ペアリング部１２、レプリケート部１３、セパレート部１４は、ＭＶ１７のデータを更新したり読み込んだりする業務サーバ装置３にて通信路から利用することができる。ペアリング部１２を用いて、ＭＶ１７とＲＶ１８とをペアリングすることができる。その後、レプリケート部１３によって、ＭＶ１７からＲＶ１８にデータをコピーできる。また、セパレート部１４を用いることで、ＭＶ１７とＲＶ１８とのペアリングを解除することができる。差分情報蓄積部１５は、上記ペアリング解除によるセパレートが実施された直後からＭＶ１７に発生したデータ更新による更新ブロックを、差分マップ情報１６として記録する。そして、重複排除ストレージ装置２は、通信路からＳＡＮ装置１の差分マップ取得部１１を使用して、差分マップ情報１６を取得することができる。 The pairing unit 12, the replicate unit 13, and the separate unit 14 can be used from the communication path in the business server device 3 that updates or reads the data of the MV 17. Using the pairing unit 12, the MV 17 and the RV 18 can be paired. Thereafter, the replicate unit 13 can copy data from the MV 17 to the RV 18. Further, by using the separation unit 14, pairing between the MV 17 and the RV 18 can be canceled. The difference information storage unit 15 records, as difference map information 16, an update block due to data update generated in the MV 17 immediately after the separation by the pairing cancellation is performed. The deduplication storage apparatus 2 can acquire the difference map information 16 from the communication path using the difference map acquisition unit 11 of the SAN apparatus 1.

重複排除ストレージ装置２は、通信部２０、ＲＡＷバックアップ部２１、ＲＡＷイメージリストア部２２、フラグメント化判断部２６、ＲＡＷイメージ解析部２７、抽出ブロック決定部２８、ファイル瞬間コピー部２９、を備えている。これら各部は、専用の回路によって構成されていたり、演算装置にプログラムが組み込まれることで構築されている。 The deduplication storage apparatus 2 includes a communication unit 20, a RAW backup unit 21, a RAW image restoration unit 22, a fragmentation determination unit 26, a RAW image analysis unit 27, an extraction block determination unit 28, and a file instantaneous copy unit 29. . Each of these units is configured by a dedicated circuit, or is constructed by incorporating a program into an arithmetic device.

また、重複排除ストレージ装置２は、装備する記憶装置に、ＲＡＷイメージ構成情報２５を記憶する。さらに、重複排除ストレージ装置２は、演算装置にプログラムが組み込まれることで構築されたファイル管理機能と記憶装置とにより形成された、ＦＧファイルシステム２３と、ＢＧファイルシステム２４と、を備えている。 In addition, the deduplication storage device 2 stores the RAW image configuration information 25 in the storage device provided. Further, the deduplication storage device 2 includes an FG file system 23 and a BG file system 24 formed by a file management function and a storage device constructed by incorporating a program into the arithmetic device.

ＲＡＷバックアップ部２１（バックアップ手段）は、まず、フラグメント化判断部２６（分散判断手段）を使用して、バックアップ対象となるＲＡＷイメージデータ内の所定単位のデータが、所定基準に基づいてフラグメント化つまり分散して記憶されているか、を判断する。また、ＲＡＷバックアップ部２１は、フラグメント化判断部２６による判断に基づいて、ＲＡＷイメージデータを分解してＢＧファイルシステム２４に記録する。このとき、ＲＡＷバックアップ部２１は、既に記憶されているデータの重複記憶を排除して、記憶データをＢＧファイルシステム２４に記憶する。さらに、ＲＡＷバックアップ部２１は、ＢＧファイルシステム２４に記憶したＲＡＷイメージデータを元のファイルに戻すための構成情報を、ＲＡＷイメージ構成情報２５として記録する。 First, the RAW backup unit 21 (backup means) uses the fragmentation determination unit 26 (distribution determination means) to fragment a predetermined unit of data in the RAW image data to be backed up based on a predetermined criterion. It is judged whether it is stored in a distributed manner. Also, the RAW backup unit 21 decomposes the RAW image data based on the determination by the fragmentation determination unit 26 and records it in the BG file system 24. At this time, the RAW backup unit 21 stores the stored data in the BG file system 24 by eliminating the redundant storage of the already stored data. Further, the RAW backup unit 21 records configuration information for returning the RAW image data stored in the BG file system 24 to the original file as the RAW image configuration information 25.

抽出ブロック決定部２６（イメージ解析手段、バックアップ手段）は、ＲＡＷイメージデータから分割されたブロックデータから、別ファイルとして記憶するデータを抽出する。例えば、大きなファイルは大きな重複率を得られ、容量効率を改善できる可能性があるため、このようなファイルを構成するブロックデータを抽出して別ファイルして記録するように決定する。具体的には、容量が設定された閾値を超えるファイルを構成するブロックデータを抽出する。 The extraction block determination unit 26 (image analysis means, backup means) extracts data to be stored as a separate file from the block data divided from the RAW image data. For example, since a large file can obtain a large duplication rate and the capacity efficiency can be improved, it is determined that block data constituting such a file is extracted and recorded as a separate file. Specifically, block data constituting a file whose capacity exceeds a set threshold is extracted.

ＲＡＷイメージ解析部２７（イメージ解析手段）は、上述したブロックデータの抽出を行うために、フラグメント化判断部２６と抽出ブロック決定部２８から使用される。ＲＡＷイメージ解析部２７は、例えば、ＲＡＷイメージデータがファイルシステムであった場合、ＲＡＷイメージデータ中に含まれるファイルの大きさ（容量）や、当該ファイルを構成するブロックの配置位置を解析し、その情報を抽出ブロック決定部２６に提供することができる。 The RAW image analysis unit 27 (image analysis unit) is used by the fragmentation determination unit 26 and the extraction block determination unit 28 in order to extract the block data described above. For example, when the RAW image data is a file system, the RAW image analysis unit 27 analyzes the size (capacity) of the file included in the RAW image data and the arrangement position of the blocks constituting the file. Information can be provided to the extraction block determination unit 26.

ＦＧファイルシステム２３（データ出力手段）は、ＢＧファイルシステム２４に記録されている後述するように分割されたＲＡＷイメージデータと、ＲＡＷイメージ構成情報２５とを基に、元のＲＡＷイメージデータとして見せるための透過的なファイルシステムである。 The FG file system 23 (data output means) displays the original RAW image data based on the RAW image data recorded in the BG file system 24 and divided as described later and the RAW image configuration information 25. Is a transparent file system.

ＲＡＷイメージリストア部２２（データ出力手段）は、ＦＧファイルシステム２３からＲＡＷイメージデータを読み込むことで、分割される前のＲＡＷイメージデータを、ＳＡＮ装置１にリストアすることができる。 The RAW image restore unit 22 (data output means) reads the RAW image data from the FG file system 23 to restore the RAW image data before being divided into the SAN device 1.

ファイル瞬間コピー部２９は、メタデータのコピーのみでファイルのコピーを作成する機能である。重複排除ストレージ装置２で実施していることから、コビーを作成しても重複排除されるため、実際にディスク容量が消費されることはないという特徴を持つ。 The file instantaneous copy unit 29 is a function for creating a copy of a file only by copying metadata. Since it is implemented in the deduplication storage apparatus 2, deduplication is eliminated even if a copy is created, and thus the disk capacity is not actually consumed.

［動作］
次に、上記構成の情報処理システムの動作を、図４の説明図、図５乃至図６のフローチャート、さらには、図７乃至図９の説明図を参照して説明する。以下では、業務サーバ装置３からＭＶ１７のバックアップを取ることとして説明する。 [Operation]
Next, the operation of the information processing system configured as described above will be described with reference to the explanatory diagram of FIG. 4, the flowcharts of FIGS. 5 to 6, and the explanatory diagrams of FIGS. 7 to 9. In the following description, it is assumed that the MV 17 is backed up from the business server device 3.

まず、業務サーバ装置３からＭＶ１７とＲＶ１８のペアリングを実施する（ステップＳ１）。そして、ＭＶ１７のデータをＲＶ１８にコピーするためにレプリケートを実施する（ステップＳ２）。もし、静止点を取ったバックアップが必要な場合には（ステップＳ３：Ｙｅｓ）、業務を停止して静止点を作り（ステップＳ４）、その後、セパレートを実施する（ステップＳ５）。セパレートを実施すると、差分情報蓄積部１５によってＭＶ１７への更新情報（更新ブロック）が差分マップ情報１６に記録開始される（ステップＳ６）。 First, pairing of the MV 17 and the RV 18 is performed from the business server device 3 (step S1). Then, replication is performed in order to copy the data of MV17 to RV18 (step S2). If backup with a static point is necessary (step S3: Yes), the business is stopped to create a static point (step S4), and then separation is performed (step S5). When the separation is performed, update information (update block) to the MV 17 is started to be recorded in the difference map information 16 by the difference information storage unit 15 (step S6).

その後、重複排除ストレージ装置２のＲＡＷバックアップ部２１は、差分マップ取得部１１を使用して、セパレートが実施される直前までの差分マップ情報１６を取得する（ステップＳ７）。この取得した差分マップ情報１６を用いて、ＲＡＷバックアップ部２１は、重複排除ストレージ装置２へＲＶ１８のデータをバックアップする（ステップＳ８）。 Thereafter, the RAW backup unit 21 of the deduplication storage device 2 uses the difference map acquisition unit 11 to acquire the difference map information 16 until immediately before the separation is performed (step S7). Using this acquired difference map information 16, the RAW backup unit 21 backs up the data of the RV 18 to the deduplication storage apparatus 2 (step S8).

図５のステップＳ８における重複排除ストレージ装置２へのＲＶ１８のデータのバックアップ処理を、図６を参照して詳述する。まず、ＲＡＷバックアップ部２１は、フラグメント化判断部２６を使用して、差分マップ情報１６からＲＡＷイメージデータがフラグメント化しているか判断する。つまり、フラグメント化判断部２６は、Dynamic Data Replication間又はスナップショット間の差分情報である差分マップ情報１６を解析して、ＲＡＷイメージデータ内におけるデータの分散度合いつまりフラグメント化の度合いを推定する。 The RV 18 data backup process to the deduplication storage apparatus 2 in step S8 of FIG. 5 will be described in detail with reference to FIG. First, the RAW backup unit 21 uses the fragmentation determination unit 26 to determine whether the RAW image data is fragmented from the difference map information 16. That is, the fragmentation determination unit 26 analyzes the difference map information 16 that is difference information between Dynamic Data Replications or between snapshots, and estimates the degree of data dispersion, that is, the degree of fragmentation in the RAW image data.

例えば、図６のステップＳ１１及びステップＳ１２に示すように、サイズが規定値以下の連続する更新ブロックが規定位置以上存在し（ステップＳ１１：Ｙｅｓ）、かつ、更新ブロックの終端と次の更新ブロックの開始位置までのオフセットが規定値以下となる箇所が規定値以上存在する場合は（ステップＳ１２：Ｙｅｓ）、前世代のデータに対する差分が大きいと考えられ、データがＲＡＷイメージデータ内においてフラグメント化されていると判断する（ステップＳ１３）。この結果を基に、以下に説明するように、ＲＡＷイメージデータからファイルを抜き出す処理を実施する必要があるか判断する。 For example, as shown in step S11 and step S12 of FIG. 6, there are continuous update blocks having a size equal to or smaller than a predetermined value (step S11: Yes), and the end of the update block and the next update block If there are locations where the offset to the start position is less than or equal to the specified value (step S12: Yes), the difference from the previous generation data is considered large, and the data is fragmented in the RAW image data. (Step S13). Based on this result, as will be described below, it is determined whether it is necessary to perform processing for extracting a file from RAW image data.

なお、上記では、フラグメント化判断部２６は、差分マップ情報１６内の更新ブロックといった所定単位のデータの容量や配置状況に応じて、ＲＡＷイメージデータ内におけるデータがフラグメント化しているか否かを判断したが、他の基準によりフラグメント化しているか否かを判断してもよい。 In the above description, the fragmentation determination unit 26 determines whether or not the data in the RAW image data is fragmented according to the capacity and arrangement status of a predetermined unit of data such as an update block in the difference map information 16. However, it may be determined whether it is fragmented according to other criteria.

続いて、フラグメント化していると判断した場合には、前世代のバックアップが存在する確認する（ステップＳ１５）。前世代のバックアップが存在する場合には（ステップＳ１５：Ｙｅｓ）、前世代のバックアップイメージと、ＳＡＮ装置１から取得した差分マップ情報１６によって更新されたことが示されているＳＡＮ装置１上の差分データと、を用いて、ＲＡＷバックアップを実施する（ステップＳ１７）。このとき、前世代のバックアップイメージと差分データとによるＲＡＷイメージデータのメタデータを解析し、容量が閾値を超える大きいファイルを抽出し、抽出されたファイルと抽出されなかったデータとを、それぞれ重複排除を実施しながらＲＡＷバックアップを行う。 Subsequently, when it is determined that fragmentation has occurred, it is confirmed that a previous generation backup exists (step S15). If the previous generation backup exists (step S15: Yes), the difference on the SAN device 1 indicated that the previous generation backup image and the difference map information 16 acquired from the SAN device 1 are updated. RAW backup is performed using the data (step S17). At this time, RAW image data metadata from the previous generation backup image and differential data is analyzed, a large file whose capacity exceeds the threshold is extracted, and the extracted file and the unextracted data are deduplicated respectively. RAW backup is performed while

ここで、上述したファイルの抽出処理について、図４を参照して説明する。図４の「BK_A」は、ＳＡＮ装置１のボリューム全体のＲＡＷイメージデータを示している。まず、このＲＡＷイメージデータを固定長のブロックデータに分割し、それぞれにＲＡＷイメージデータ上における位置を表すオフセット情報を割り当てる。そして、「BK_A」のＲＡＷイメージデータの解析の結果、容量が閾値を超える大きなファイル「ｆ１」を抽出した際には、そのオフセット情報をＲＡＷイメージ構成情報２５内に「f1 index」として格納しておく。ここでは、ファイル「ｆ１」を構成するブロックデータとして、「f1(1)」〜「f1(6)」が抽出されたとする。また、抽出しなかったブロックデータについても、そのオフセット情報をＲＡＷイメージ構成情報２５内に「Rem index」として格納しておく。 Here, the above-described file extraction processing will be described with reference to FIG. “BK_A” in FIG. 4 indicates RAW image data of the entire volume of the SAN device 1. First, this RAW image data is divided into fixed-length block data, and offset information indicating a position on the RAW image data is assigned to each. As a result of analyzing the raw image data of “BK_A”, when a large file “f1” whose capacity exceeds the threshold is extracted, the offset information is stored in the raw image configuration information 25 as “f1 index”. deep. Here, it is assumed that “f1 (1)” to “f1 (6)” are extracted as block data constituting the file “f1”. The offset information of the block data that has not been extracted is also stored as “Rem index” in the RAW image configuration information 25.

そして、ＢＧファイルシステム２４には、図４に示すように、抽出されたブロックデータを１つにまとめたファイル「ｆ１」と、抽出されなかったブロックデータを１つにまとめた残りのデータとが、それぞれ格納される。このとき、後述するように重複記憶排除処理を行って格納されるため、特にファイル「ｆ１」については、重複排除効率が高まる。なお、図４の例では、１つのファイルしか抽出されていないが、複数のファイルが抽出された場合には、ブロックデータをファイル毎にまとめて、当該ファイル毎にＢＧファイルシステム２４に格納される。 Then, as shown in FIG. 4, the BG file system 24 includes a file “f1” in which the extracted block data is combined into one, and the remaining data in which the block data that has not been extracted is combined into one. , Respectively. At this time, since the duplicate storage elimination process is performed and stored as will be described later, the duplicate elimination efficiency is increased particularly for the file “f1”. In the example of FIG. 4, only one file is extracted. However, when a plurality of files are extracted, block data is collected for each file and stored in the BG file system 24 for each file. .

また、ＦＧファイルシステム２３では、上記ＲＡＷイメージ構成情報２５内のオフセット情報とＢＧファイルシステム２４上のデータを基にして、１つの元のＲＡＷイメージデータとして見せる。例えば、図４に示すＢＧファイルシステム２４上の各ブロックデータを、それぞれＲＡＷイメージ構成情報のオフセット情報にて示される位置に復元して、元のＲＡＷイメージデータ「BK_A」として見せる。 In the FG file system 23, the original RAW image data is displayed based on the offset information in the RAW image configuration information 25 and the data on the BG file system 24. For example, each block data on the BG file system 24 shown in FIG. 4 is restored to the position indicated by the offset information of the RAW image configuration information, and is displayed as the original RAW image data “BK_A”.

ここで、図６のステップＳ１５に戻り、前世代バックアップが存在しない場合には（ステップＳ１５：Ｎｏ）、差分マップ情報１６を基にした差分バックアップは行えないため、ＲＡＷイメージデータのフルバックアップになる。この場合、ＲＡＷバックアップ部２１は、ＳＡＮ装置１のＲＡＷイメージデータのメタデータ部分を解析し、上述同様に容量が閾値を超える大きなファイルを抽出して、当該抽出されたファイルと抽出されなかったデータとをそれぞれ重複排除を行ってバックアップを行う（ステップＳ１８）。 Here, returning to step S15 in FIG. 6, if there is no previous generation backup (step S15: No), the differential backup based on the differential map information 16 cannot be performed, so the RAW image data is a full backup. . In this case, the RAW backup unit 21 analyzes the metadata part of the RAW image data of the SAN device 1 and extracts a large file whose capacity exceeds the threshold as described above, and extracts the extracted file and the data that has not been extracted. Are backed up by deduplication (step S18).

また、図６のステップＳ１１及びステップＳ１２で共に「Ｎｏ」となった場合には、前世代のデータに対する差分は大きくないと考えられ、データがＲＡＷイメージデータ内においてフラグメント化されていないと判断する（ステップＳ１４）。続いて、前世代のバックアップが存在するか確認する（ステップＳ１６）。前世代のバックアップが存在する場合には（ステップＳ１６：Ｙｅｓ）、ＢＧファイルシステム２４上にある前世代のバックアップをファイル瞬間コピー部２９によってコピーする。さらに、ＳＡＮ装置１から取得した差分マップ情報１６によって更新されたことが示されているＳＡＮ装置１上の差分データを用いて、コピー後のバックアップイメージの該当箇所を更新データで上書きする（ステップＳ１９）。このように、フラグメント化されていないと判断され、前世代のバックアップが存在する場合には、上述したようなファイルの抽出は行わずにバックアップを行う。 Further, when both “No” in step S11 and step S12 in FIG. 6, it is considered that the difference from the previous generation data is not large, and it is determined that the data is not fragmented in the RAW image data. (Step S14). Subsequently, it is confirmed whether or not a previous generation backup exists (step S16). If the previous generation backup exists (step S16: Yes), the previous generation backup on the BG file system 24 is copied by the file instant copy unit 29. Furthermore, using the difference data on the SAN device 1 that is indicated to have been updated by the difference map information 16 acquired from the SAN device 1, the corresponding portion of the backup image after copying is overwritten with the updated data (step S19). ). As described above, when it is determined that the file is not fragmented and there is a previous generation backup, the backup is performed without extracting the file as described above.

一方、フラグメント化していないと判断された場合であっても（ステップＳ１４）、前世代のバックアップが存在しない場合は（ステップＳ１６：Ｎｏ）、ＲＡＷイメージデータのフルバックアップになる。このため、上述同様に、ＳＡＮ装置１のＲＡＷイメージデータのメタデータ部分を解析し、容量が閾値を超える大きなファイルを抽出して、当該抽出されたファイルと抽出されなかったデータとをそれぞれ重複排除を行ってバックアップする（ステップＳ１８）。 On the other hand, even if it is determined that it is not fragmented (step S14), if there is no previous generation backup (step S16: No), a full backup of the RAW image data is performed. Therefore, as described above, the metadata portion of the RAW image data of the SAN device 1 is analyzed, a large file whose capacity exceeds the threshold is extracted, and the extracted file and the data that has not been extracted are deduplicated respectively. To back up (step S18).

なお、ＲＡＷイメージリストア部２２は、ＲＡＷイメージ構成情報２５の情報を基にＢＧファイルシステム２４のデータを読み込み、１つのＲＡＷイメージデータとしてＳＡＮ装置１にデータをリストアできる。 Note that the RAW image restoration unit 22 can read the data of the BG file system 24 based on the information of the RAW image configuration information 25 and restore the data to the SAN device 1 as one RAW image data.

ここで、上述した重複排除ストレージ装置２による重複記憶を排除したデータの書き込み処理の一例を、図７乃至図９を参照して説明する。例えば、重複排除ストレージ装置２は、図７に示すように、ストレージ装置２自体における記憶再生動作を制御するサーバコンピュータである複数のアクセスノード５と、データを格納する記憶装置を備えたサーバコンピュータである複数のストレージノード６と、を備えている。なお、アクセスノード５の数とストレージノード６の数は、図７に示したものに限定されず、さらに多くの各ノード５，６が接続されて構成されていてもよい。あるいは、重複排除ストレージ装置２は、１台のコンピュータで構成されていてもよい。 Here, an example of data writing processing in which the duplicate storage is eliminated by the above-described duplicate elimination storage apparatus 2 will be described with reference to FIGS. For example, as shown in FIG. 7, the deduplication storage device 2 is a server computer that includes a plurality of access nodes 5 that are server computers that control storage and reproduction operations in the storage device 2 itself, and a storage device that stores data. A plurality of storage nodes 6. Note that the number of access nodes 5 and the number of storage nodes 6 are not limited to those shown in FIG. 7, and more nodes 5 and 6 may be connected. Alternatively, the deduplication storage device 2 may be configured by a single computer.

また、重複排除ストレージ装置２は、データを分割及び冗長化し、分散して複数の記憶装置に記憶すると共に、記憶するデータの内容に応じて設定される固有のコンテンツアドレスによって、当該データを格納した格納位置を特定するコンテンツアドレスストレージシステムである。 In addition, the deduplication storage device 2 divides and redundantly stores the data, distributes and stores the data in a plurality of storage devices, and stores the data according to a unique content address set according to the content of the stored data. This is a content address storage system for specifying a storage location.

重複排除ストレージ装置２による書き込み処理は、まず、図８及び図９の矢印Ｙ１に示すように、記憶データであるファイルＡを受信することで開始される。このファイルＡは、例えば、図４に開示した、抽出されたブロックデータをまとめたファイル「ｆ１」、及び、抽出されなかったブロックデータをまとめたデータ、である。 The writing process by the deduplication storage apparatus 2 is started by first receiving a file A that is stored data as indicated by an arrow Y1 in FIGS. The file A is, for example, the file “f1” that is a summary of the extracted block data and the data that is a summary of the block data that has not been extracted, as disclosed in FIG.

続いて、重複排除ストレージ装置２は、図８及び図９の矢印Ｙ２に示すように、ファイルＡを所定容量（例えば、６４ＫＢ）のブロックであるチャンクＤに分割する。そして、分割されたチャンクＤのデータ内容に基づいて、当該データ内容を代表する固有のハッシュ値Ｈを算出する（図９の矢印Ｙ３）。例えば、ハッシュ値Ｈは、予め設定されたハッシュ関数を用いて、チャンクＤのデータ内容から算出する。 Subsequently, the deduplication storage apparatus 2 divides the file A into chunks D that are blocks of a predetermined capacity (for example, 64 KB), as indicated by an arrow Y2 in FIGS. Then, based on the data content of the divided chunk D, a unique hash value H representing the data content is calculated (arrow Y3 in FIG. 9). For example, the hash value H is calculated from the data content of the chunk D using a preset hash function.

続いて、ファイルＡのチャンクＤのハッシュ値Ｈを用いて、当該チャンクＤが既に格納されているか否かを調べる。このとき、既に格納されているチャンクＤは、そのハッシュ値Ｈと格納位置を表すコンテンツアドレスＣＡとが関連付けられて記憶されたＭＦＩ（ＭａｉｎＦｒａｇｍｅｎｔＩｎｄｅｘ）ファイルに登録されている。従って、格納前に算出したチャンクＤのハッシュ値ＨがＭＦＩファイル内に存在している場合には、既に同一内容のチャンクＤが格納されていると判断できる。この場合には、格納前のチャンクＤのハッシュ値Ｈと一致したＭＦＩ内のハッシュ値Ｈに関連付けられているコンテンツアドレスＣＡを、当該ＭＦＩファイルから取得する。そして、このコンテンツアドレスＣＡを、書き込み要求されたチャンクＤのコンテンツアドレスＣＡとして返却する。 Subsequently, using the hash value H of the chunk D of the file A, it is checked whether or not the chunk D is already stored. At this time, the chunk D that has already been stored is registered in an MFI (Main Fragment Index) file in which the hash value H and the content address CA representing the storage position are stored in association with each other. Therefore, when the hash value H of the chunk D calculated before storage is present in the MFI file, it can be determined that the chunk D having the same content has already been stored. In this case, the content address CA associated with the hash value H in the MFI that matches the hash value H of the chunk D before storage is acquired from the MFI file. Then, this content address CA is returned as the content address CA of the chunk D requested to be written.

そして、返却されたコンテンツアドレスＣＡが参照する既に格納されているデータを、書き込み要求されたチャンクＤとして使用する。つまり、書き込み要求されたチャンクＤの格納先として、返却されたコンテンツアドレスＣＡが参照する領域を指定することで、当該書き込み要求されたチャンクＤを記憶したこととする。このように、書き込み要求にかかるチャンクＤが重複していると判断された場合、実際にデータ自体の書き込みを行うことなく、書き込み完了となる。 Then, the already stored data referred to by the returned content address CA is used as the chunk D requested to be written. That is, it is assumed that the chunk D requested to be written is stored by designating an area referred to by the returned content address CA as a storage destination of the chunk D requested to be written. As described above, when it is determined that the chunks D related to the write request are duplicated, the writing is completed without actually writing the data itself.

一方、書き込み要求にかかるチャンクＤが重複しておらず、まだ記憶されていないと判断された場合には、以下のようにしてチャンクＤの書き込みを行う。まず、チャンクＤを圧縮して、図９の矢印Ｙ５に示すように、複数の所定の容量のフラグメントデータに分割する。例えば、図８の符号Ｄ１〜Ｄ９に示すように、チャンクＤを９つのフラグメントデータ（分割データＦ１）に分割する。そしてさらに、分割したフラグメントデータのうちいくつかが欠けた場合であっても、元となるチャンクＤを復元可能なよう冗長データを生成し、上記分割したフラグメントデータＦ１に追加する。例えば、図８の符号Ｄ１０〜Ｄ１２に示すように、３つのフラグメントデータ（冗長データＦ２）を追加する。これにより、９つの分割データＦ１と、３つの冗長データＦ２とにより構成される１２個のフラグメントデータからなるデータセットを生成する。 On the other hand, when it is determined that the chunk D related to the write request is not duplicated and has not been stored yet, the chunk D is written as follows. First, the chunk D is compressed and divided into a plurality of fragment data of a predetermined capacity as indicated by an arrow Y5 in FIG. For example, as shown by reference numerals D1 to D9 in FIG. 8, the chunk D is divided into nine fragment data (divided data F1). Further, even if some of the divided fragment data is missing, redundant data is generated so that the original chunk D can be restored and added to the divided fragment data F1. For example, three pieces of fragment data (redundant data F2) are added as indicated by reference numerals D10 to D12 in FIG. As a result, a data set composed of 12 fragment data composed of 9 divided data F1 and 3 redundant data F2 is generated.

続いて、上述したように生成されたデータセットを構成する各フラグメントデータを、複数の記憶装置に分散して格納する。このとき、各フラグメントデータは、各記憶装置に配置されたコンポーネント（符号０１〜１２）と呼ばれるデータ格納領域にそれぞれ格納される（図９の矢印Ｙ６参照）。 Subsequently, each fragment data constituting the data set generated as described above is distributed and stored in a plurality of storage devices. At this time, each piece of fragment data is stored in a data storage area called a component (reference numerals 01 to 12) arranged in each storage device (see arrow Y6 in FIG. 9).

続いて、上述したように格納したフラグメントデータＤ１〜Ｄ１２の格納位置、つまり、当該フラグメントデータＤ１〜Ｄ１２にて復元されるチャンクＤの格納位置を表すコンテンツアドレスＣＡを生成して管理する。具体的には、格納したチャンクＤの内容に基づいて算出したハッシュ値Ｈの一部（ショートハッシュ）（例えば、ハッシュ値Ｈの先頭８Ｂ（バイト））と、論理格納位置を表す情報と、を組み合わせて、コンテンツアドレスＣＡを生成する。そして、このコンテンツアドレスＣＡをファイルシステムサービスに返却し（図９の矢印Ｙ７）、記憶した対象となるデータのファイル名などの識別情報と、コンテンツアドレスＣＡとを関連付けて上述したＭＦＩファイルにて管理する（図９の矢印Ｙ８）。 Subsequently, a content address CA representing the storage position of the fragment data D1 to D12 stored as described above, that is, the storage position of the chunk D restored by the fragment data D1 to D12 is generated and managed. Specifically, a part (short hash) of the hash value H calculated based on the contents of the stored chunk D (for example, the top 8B (bytes) of the hash value H) and information indicating the logical storage position, In combination, a content address CA is generated. Then, the content address CA is returned to the file system service (arrow Y7 in FIG. 9), and the identification information such as the file name of the data to be stored is associated with the content address CA and managed by the MFI file described above. (Arrow Y8 in FIG. 9).

以上のように、本実施形態における情報処理システムでは、ＲＡＷイメージデータからファイルを抽出して、当該ファイルとそれ以外の部分とを別々にバックアップすることで、重複記憶排除率の向上を図ることができる。つまり、データがフラグメント化されたＲＡＷイメージデータのＲＡＷバックアップであっても、抜き出したファイルを別に記憶しているため、かかるファイルの重複率が高まり、高い重複排除率でバックアップを行うことができる。 As described above, in the information processing system according to the present embodiment, a file is extracted from RAW image data, and the file and other parts are backed up separately, thereby improving the duplicate storage exclusion rate. it can. In other words, even if the RAW backup of the RAW image data in which the data is fragmented is performed, the extracted file is stored separately, so that the duplication rate of the file is increased and the backup can be performed with a high deduplication rate.

そして、上述したようにＲＡＷバックアップで高い重複排除率を得られるため、ファイルシステムのデフラグ処理を行う必要がない。また、ユーザはフラグメントの程度を意識する必要もない。ここで、仮に、あるバックアップ世代においてデフラグを実施したとしても、過去世代のバックアップと重複することができる。 As described above, since a high deduplication rate can be obtained by RAW backup, there is no need to perform defragmentation processing of the file system. Also, the user need not be aware of the degree of fragmentation. Here, even if defragmentation is performed in a certain backup generation, it can overlap with the backup of the past generation.

また、異なるストレージデバイスからのバックアップであっても、ファイルシステムイメージに類似したファイルが含まれている場合、それらのＲＡＷバックアップ同士が重複排除ストレージ内で重複する可能性が高まり、さらなる重複排除効率の向上を図ることができる。このため、バックアップサーバを使ったファイル毎のバックアップと同程度の重複排除率を得ることができる。換言すると、別途バックアップサーバやバックアップソフトウェアを必要とせずに高い重複排除率を得られるため、コストの低減を図ることができる。 In addition, even if backups from different storage devices contain files similar to the file system image, there is an increased possibility that these RAW backups will be duplicated in the deduplication storage. Improvements can be made. For this reason, it is possible to obtain the same deduplication rate as the backup of each file using the backup server. In other words, since a high deduplication rate can be obtained without requiring a separate backup server or backup software, the cost can be reduced.

なお、上記では、ＲＡＷイメージデータから抽出するファイルの基準として、容量が閾値を超える、という基準を採用しているが、他の基準でファイルを抽出してもよい。また、抽出するデータは、ファイルであることに限定されず、他の基準により所定のグループに分類できるようなデータを抽出してもよい。例えば、データを更新時期つまり更新された世代で分類し、ある世代のデータを抽出して１つにまとめてバックアップを行ってもよい。この具体例を、図１０を参照して説明する。 In the above description, the standard that the capacity exceeds the threshold is adopted as the standard for the file extracted from the RAW image data. However, the file may be extracted based on another standard. Further, the data to be extracted is not limited to a file, and data that can be classified into a predetermined group according to other criteria may be extracted. For example, the data may be classified according to the update time, that is, the updated generation, data of a certain generation may be extracted, and the data may be combined and backed up. A specific example will be described with reference to FIG.

まず、重複排除ストレージ装置２内において、ＲＡＷイメージデータとして、所定単位のデータとなるブロック毎に、更新された世代を記憶しておく。そして、バックアップ世代が増えてきたときに、長期間（設定された期間）に渡って更新が行われていないブロックを抽出し、これら抽出したブロックを１つのデータにまとめる。また、抽出されていないブロックも１つのデータにまとめる。図１０では、第３世代のブロックを抽出して１つのデータにまとめ、その他の世代のブロックも１つのデータにまとめた例を示している。そして、これらまとめたデータをそれぞれ、重複排除してバックアップを行う。 First, in the deduplication storage apparatus 2, an updated generation is stored as RAW image data for each block that is a predetermined unit of data. When the backup generation increases, blocks that have not been updated over a long period (set period) are extracted, and these extracted blocks are combined into one data. Also, unextracted blocks are combined into one data. FIG. 10 shows an example in which the third generation blocks are extracted and combined into one data, and the other generation blocks are combined into one data. Each of these collected data is deduplicated and backed up.

ここで、更新が行われていない世代のブロックは、今後も更新する可能性が低いと考えられる。このため、かかる世代のブロックをまとめて１つのデータとして格納することで、バックアップ世代間で重複する可能性が高まり、重複排除率の向上を図ることができる。なお、上記では、長期間更新が行われていない世代のデータを抽出しているが、他の基準で所定の世代のデータを抽出してもよい。 Here, it is considered that the generation blocks that have not been updated are unlikely to be updated in the future. Therefore, by storing blocks of such generations together as one data, the possibility of duplication between backup generations increases, and the deduplication rate can be improved. In the above description, data of a generation that has not been updated for a long time is extracted. However, data of a predetermined generation may be extracted based on other criteria.

また、上記では、ＲＡＷイメージデータをブロックデータに分割して、ある基準を満たすデータを抽出しているが、ブロックデータに分割することなく、他の解析方法により、上述したファイルやある世代のデータといったグループに分類可能なデータを抽出してもよい。 In the above, the RAW image data is divided into block data and data satisfying a certain standard is extracted. However, the above-described file or a certain generation of data can be obtained by another analysis method without dividing the block image into block data. Data that can be classified into such groups may be extracted.

＜付記＞
上記実施形態の一部又は全部は、以下の付記のようにも記載されうる。以下、本発明におけるストレージシステム（図１１参照）、プログラム、バックアップ方法の構成の概略を説明する。但し、本発明は、以下の構成に限定されない。 <Appendix>
Part or all of the above-described embodiment can be described as in the following supplementary notes. The outline of the configuration of the storage system (see FIG. 11), program, and backup method in the present invention will be described below. However, the present invention is not limited to the following configuration.

（付記１）
ＲＡＷバックアップの対象となるＲＡＷイメージデータを解析するイメージ解析手段１０１と、
前記ＲＡＷイメージデータを、重複記憶を排除して記憶装置１１０に記憶してＲＡＷバックアップを行うバックアップ手段１０２と、
を備え、
前記イメージ解析手段１０１は、前記ＲＡＷイメージデータを解析して、当該ＲＡＷイメージデータから所定のグループに分類できるデータを抽出し、
前記バックアップ手段１０２は、前記イメージ解析手段にて抽出されたデータを、分類された前記グループ毎にそれぞれ１つの記憶データとしてまとめると共に、抽出されなかったデータを１つの記憶データとしてまとめ、これらまとめた各記憶データをそれぞれ記憶する、
ストレージシステム１００。 (Appendix 1)
Image analysis means 101 for analyzing RAW image data to be subjected to RAW backup;
Backup means 102 for storing the RAW image data in the storage device 110 by eliminating duplicate storage and performing RAW backup;
With
The image analysis unit 101 analyzes the raw image data and extracts data that can be classified into a predetermined group from the raw image data.
The backup unit 102 summarizes the data extracted by the image analysis unit as one storage data for each of the classified groups, and combines the data not extracted as one storage data. Store each stored data individually,
Storage system 100.

（付記２）
付記１に記載のストレージシステムであって、
前記イメージ解析手段は、前記ＲＡＷイメージデータを所定容量のブロックデータに分割して、当該ブロックデータを前記グループに分類して抽出し、
前記バックアップ手段は、前記イメージ解析手段にて抽出されたブロックデータを、分類された前記グループ毎にそれぞれ１つの記憶データとしてまとめると共に、抽出されなかったブロックデータを１つの記憶データとしてまとめ、これらまとめた各記憶データをそれぞれ記憶する、
ストレージシステム。 (Appendix 2)
The storage system according to attachment 1, wherein
The image analysis means divides the RAW image data into block data of a predetermined capacity, classifies and extracts the block data into the groups,
The backup means summarizes the block data extracted by the image analysis means as one storage data for each of the classified groups, and summarizes the block data not extracted as one storage data. Store each stored data individually,
Storage system.

（付記３）
付記１又は２に記載のストレージシステムであって、
前記イメージ解析手段は、前記ＲＡＷイメージデータから、前記グループとして設定された所定基準を満たすファイルに分類できるデータを抽出し、
前記バックアップ手段は、前記イメージ解析手段にて抽出されたデータを、分類された前記ファイル毎にそれぞれ１つの記憶データとしてまとめると共に、抽出されなかったデータを１つの記憶データとしてまとめ、これらまとめた各記憶データを記憶する、
ストレージシステム。 (Appendix 3)
The storage system according to appendix 1 or 2,
The image analysis means extracts data that can be classified from the RAW image data into files that satisfy a predetermined criterion set as the group,
The backup means summarizes the data extracted by the image analysis means as one storage data for each of the classified files, and summarizes the data not extracted as one storage data. Memorize memorized data,
Storage system.

（付記４）
付記５に記載のストレージシステムであって、
前記イメージ解析手段は、前記ＲＡＷイメージデータから、前記グループとして設定された容量が閾値を超える前記ファイルに分類できるデータを抽出する、
ストレージシステム。 (Appendix 4)
The storage system according to appendix 5,
The image analysis means extracts data that can be classified into the files whose capacity set as the group exceeds a threshold value from the RAW image data.
Storage system.

（付記５）
付記１又は２に記載のストレージシステムであって、
前記イメージ解析手段は、前記ＲＡＷイメージデータから、前記グループとして設定されたデータの更新時期が所定基準を満たすデータ群に分類できるデータを抽出し、
前記バックアップ手段は、前記イメージ解析手段にて抽出されたデータを、分類された前記データ群毎にそれぞれ１つの記憶データとしてまとめると共に、抽出されなかったデータを１つの記憶データとしてまとめ、これらまとめた各記憶データを記憶する、
ストレージシステム。 (Appendix 5)
The storage system according to appendix 1 or 2,
The image analysis means extracts data that can be classified from the RAW image data into a data group in which the update time of the data set as the group satisfies a predetermined criterion,
The backup means summarizes the data extracted by the image analysis means as one storage data for each of the classified data groups, and summarizes the data not extracted as one storage data. Store each stored data,
Storage system.

（付記６）
付記５に記載のストレージシステムであって、
前記イメージ解析手段は、前記ＲＡＷイメージデータから、前記グループとして設定されたデータの更新時期が同一である前記データ群に分類できるデータを抽出する、
ストレージシステム。 (Appendix 6)
The storage system according to appendix 5,
The image analysis means extracts data that can be classified into the data group having the same update time of the data set as the group from the RAW image data.
Storage system.

（付記７）
付記１乃至６のいずれかに記載のストレージシステムであって、
前記バックアップ手段は、前記各記憶データにまとめた各データの前記ＲＡＷイメージデータ内における位置情報を記憶し、
記憶されている前記各記憶データ及び前記位置情報に基づいて、１つの前記ＲＡＷイメージデータとして出力するデータ出力手段を備えた、
ストレージシステム。 (Appendix 7)
The storage system according to any one of appendices 1 to 6,
The backup means stores position information in the RAW image data of each data collected in each storage data,
Data output means for outputting as one RAW image data based on each stored data and the position information stored,
Storage system.

（付記８）
付記１乃至７のいずれかに記載のストレージシステムであって、
前記ＲＡＷイメージデータ内における所定単位のデータの容量及び配置状況に応じて、前記ＲＡＷイメージデータ内におけるデータの分散状況を判断する分散判断手段を備え、
前記分散判断手段による判断結果に応じて、前記イメージ解析手段及び前記バックアップ手段が作動するよう構成されている、
ストレージシステム。 (Appendix 8)
The storage system according to any one of appendices 1 to 7,
A distribution determination unit configured to determine a distribution status of data in the RAW image data according to a capacity and arrangement status of data in a predetermined unit in the RAW image data;
The image analysis unit and the backup unit are configured to operate according to the determination result by the dispersion determination unit.
Storage system.

上記発明によると、まず、イメージ解析手段は、ＲＡＷイメージデータを解析して、ＲＡＷイメージデータからグループに分類できるデータを抽出する。例えば、容量が閾値以上のファイルを構成するデータを抽出したり、しばらく更新されていないと判断できる世代のデータを抽出する。このとき、ＲＡＷイメージデータを所定容量のブロックデータに分割して、所定の基準を満たすファイルや世代といったグループに分類するとよい。そして、バックアップ手段は、イメージ解析手段にて抽出されたデータと、抽出されなかったデータとを、それぞれ記憶データとしてまとめる。例えば、抽出されたデータを分類されたファイル毎や世代毎にまとめて、記憶データとしてまとめる。そして、抽出されまとめられた記憶データ、及び、抽出されずにまとめられた記憶データを、それぞれ記憶装置内に、重複記憶排除を行いつつ記憶してＲＡＷバックアップを行う。このように、ＲＡＷバックアップであっても、容量が大きなファイルや更新されてない世代のデータなどをまとめて記憶するため、重複記憶排除の効率を高めることができる。 According to the above invention, first, the image analysis means analyzes the RAW image data and extracts data that can be classified into groups from the RAW image data. For example, data constituting a file whose capacity is equal to or greater than a threshold is extracted, or generation data that can be determined not to be updated for a while is extracted. At this time, the RAW image data may be divided into block data of a predetermined capacity and classified into groups such as files and generations that satisfy a predetermined standard. Then, the backup means collects the data extracted by the image analysis means and the data not extracted as stored data. For example, the extracted data is collected for each classified file or generation, and stored as storage data. Then, the storage data extracted and collected and the storage data collected without being extracted are stored in the storage device while performing redundant storage elimination, and RAW backup is performed. As described above, even in the case of RAW backup, since a large-capacity file, generation data that has not been updated, and the like are stored together, the efficiency of eliminating duplicate storage can be improved.

（付記９）
情報処理装置に、
ＲＡＷバックアップの対象となるＲＡＷイメージデータを解析するイメージ解析手段と、
前記ＲＡＷイメージデータを、重複記憶を排除して記憶装置に記憶してＲＡＷバックアップを行うバックアップ手段と、
を実現させると共に、
前記イメージ解析手段は、前記ＲＡＷイメージデータを解析して、当該ＲＡＷイメージデータから所定のグループに分類できるデータを抽出し、
前記バックアップ手段は、前記イメージ解析手段にて抽出されたデータを、分類された前記グループ毎にそれぞれ１つの記憶データとしてまとめると共に、抽出されなかったデータを１つの記憶データとしてまとめ、これらまとめた各記憶データをそれぞれ記憶する、
ことを実現させるためのプログラム。 (Appendix 9)
In the information processing device,
Image analysis means for analyzing RAW image data to be subjected to RAW backup;
Backup means for performing RAW backup by storing the RAW image data in a storage device by eliminating duplicate storage;
And realize
The image analysis means analyzes the RAW image data and extracts data that can be classified into a predetermined group from the RAW image data.
The backup means summarizes the data extracted by the image analysis means as one storage data for each of the classified groups, and summarizes the data not extracted as one storage data. Store each stored data,
A program to make things happen.

（付記９．１）
付記９に記載のプログラムであって、
前記イメージ解析手段は、前記ＲＡＷイメージデータを所定容量のブロックデータに分割して、当該ブロックデータを前記グループに分類して抽出し、
前記バックアップ手段は、前記イメージ解析手段にて抽出されたブロックデータを、分類された前記グループ毎にそれぞれ１つの記憶データとしてまとめると共に、抽出されなかったブロックデータを１つの記憶データとしてまとめ、これらまとめた各記憶データをそれぞれ記憶する、
プログラム。 (Appendix 9.1)
The program according to appendix 9, wherein
The image analysis means divides the RAW image data into block data of a predetermined capacity, classifies and extracts the block data into the groups,
The backup means summarizes the block data extracted by the image analysis means as one storage data for each of the classified groups, and summarizes the block data not extracted as one storage data. Store each stored data individually,
program.

（付記９．２）
付記９又は９．１に記載のプログラムであって、
前記イメージ解析手段は、前記ＲＡＷイメージデータから、前記グループとして設定された所定基準を満たすファイルに分類できるデータを抽出し、
前記バックアップ手段は、前記イメージ解析手段にて抽出されたデータを、分類された前記ファイル毎にそれぞれ１つの記憶データとしてまとめると共に、抽出されなかったデータを１つの記憶データとしてまとめ、これらまとめた各記憶データを記憶する、
プログラム。 (Appendix 9.2)
A program according to appendix 9 or 9.1, wherein
The image analysis means extracts data that can be classified from the RAW image data into files that satisfy a predetermined criterion set as the group,
The backup means summarizes the data extracted by the image analysis means as one storage data for each of the classified files, and summarizes the data not extracted as one storage data. Memorize memorized data,
program.

（付記９．３）
付記９又は９．１に記載のプログラムであって、
前記イメージ解析手段は、前記ＲＡＷイメージデータから、前記グループとして設定されたデータの更新時期が所定基準を満たすデータ群に分類できるデータを抽出し、
前記バックアップ手段は、前記イメージ解析手段にて抽出されたデータを、分類された前記データ群毎にそれぞれ１つの記憶データとしてまとめると共に、抽出されなかったデータを１つの記憶データとしてまとめ、これらまとめた各記憶データを記憶する、
プログラム。 (Appendix 9.3)
A program according to appendix 9 or 9.1, wherein
The image analysis means extracts data that can be classified from the RAW image data into a data group in which the update time of the data set as the group satisfies a predetermined criterion,
The backup means summarizes the data extracted by the image analysis means as one storage data for each of the classified data groups, and summarizes the data not extracted as one storage data. Store each stored data,
program.

（付記１０）
ＲＡＷバックアップの対象となるＲＡＷイメージデータを解析し、
前記ＲＡＷイメージデータを、重複記憶を排除して記憶装置に記憶してＲＡＷバックアップを行い、
前記ＲＡＷイメージデータの解析の際に、当該ＲＡＷイメージデータから所定のグループに分類できるデータを抽出し、
前記ＲＡＷバックアップの際に、前記ＲＡＷイメージデータの解析で抽出されたデータを、分類された前記グループ毎にそれぞれ１つの記憶データとしてまとめると共に、抽出されなかったデータを１つの記憶データとしてまとめ、これらまとめた各記憶データをそれぞれ記憶する、
バックアップ方法。 (Appendix 10)
Analyzing the raw image data subject to raw backup,
The RAW image data is stored in a storage device by eliminating duplicate storage, and RAW backup is performed.
During the analysis of the RAW image data, data that can be classified into a predetermined group is extracted from the RAW image data,
The data extracted by the analysis of the RAW image data at the time of the RAW backup is summarized as one storage data for each of the classified groups, and the data not extracted is summarized as one storage data. Store each stored data individually,
Backup method.

（付記１０．１）
付記１０に記載のバックアップ方法であって、
前記ＲＡＷイメージデータの解析の際に、当該ＲＡＷイメージデータを所定容量のブロックデータに分割して、当該ブロックデータを前記グループに分類して抽出し、
前記ＲＡＷバックアップの際に、前記ＲＡＷイメージデータの解析で抽出されたブロックデータを、分類された前記グループ毎にそれぞれ１つの記憶データとしてまとめると共に、抽出されなかったブロックデータを１つの記憶データとしてまとめ、これらまとめた各記憶データをそれぞれ記憶する、
バックアップ方法。 (Appendix 10.1)
The backup method according to appendix 10, wherein
When analyzing the RAW image data, the RAW image data is divided into block data of a predetermined capacity, and the block data is classified into the group and extracted.
At the time of the RAW backup, the block data extracted by the analysis of the RAW image data is grouped as one storage data for each of the classified groups, and the block data not extracted is grouped as one storage data. , Memorize each of these storage data,
Backup method.

（付記１０．２）
付記１０又は１０．１に記載のバックアップ方法であって、
前記ＲＡＷイメージデータの解析の際に、当該ＲＡＷイメージデータから、前記グループとして設定された所定基準を満たすファイルに分類できるデータを抽出し、
前記ＲＡＷバックアップの際に、前記ＲＡＷイメージデータの解析で抽出されたデータを、分類された前記ファイル毎にそれぞれ１つの記憶データとしてまとめると共に、抽出されなかったデータを１つの記憶データとしてまとめ、これらまとめた各記憶データを記憶する、
バックアップ方法。 (Appendix 10.2)
The backup method according to appendix 10 or 10.1,
During the analysis of the RAW image data, data that can be classified into files that satisfy the predetermined criteria set as the group is extracted from the RAW image data,
The data extracted by the analysis of the RAW image data at the time of the RAW backup is summarized as one storage data for each of the classified files, and the data not extracted is summarized as one storage data. Memorize each stored data,
Backup method.

（付記１０．３）
付記１０又は１０．１に記載のバックアップ方法であって、
前記ＲＡＷイメージデータの解析の際に、当該ＲＡＷイメージデータから、前記グループとして設定されたデータの更新時期が所定基準を満たすデータ群に分類できるデータを抽出し、
前記ＲＡＷバックアップの際に、前記ＲＡＷイメージデータの解析で抽出されたデータを、分類された前記データ群毎にそれぞれ１つの記憶データとしてまとめると共に、抽出されなかったデータを１つの記憶データとしてまとめ、これらまとめた各記憶データを記憶する、
バックアップ方法。 (Appendix 10.3)
The backup method according to appendix 10 or 10.1,
During the analysis of the RAW image data, data that can be classified into a data group in which the update time of the data set as the group satisfies a predetermined standard is extracted from the RAW image data,
In the RAW backup, the data extracted by the analysis of the RAW image data is summarized as one storage data for each of the classified data groups, and the data not extracted is summarized as one storage data, Store each of these stored data,
Backup method.

なお、上述したプログラムは、記憶装置に記憶されていたり、コンピュータが読み取り可能な記録媒体に記録されている。例えば、記録媒体は、フレキシブルディスク、光ディスク、光磁気ディスク、及び、半導体メモリ等の可搬性を有する媒体である。 Note that the above-described program is stored in a storage device or recorded on a computer-readable recording medium. For example, the recording medium is a portable medium such as a flexible disk, an optical disk, a magneto-optical disk, and a semiconductor memory.

以上、上記実施形態等を参照して本願発明を説明したが、本願発明は、上述した実施形態に限定されるものではない。本願発明の構成や詳細には、本願発明の範囲内で当業者が理解しうる様々な変更をすることができる。 Although the present invention has been described with reference to the above-described embodiment and the like, the present invention is not limited to the above-described embodiment. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

１ＳＡＮ装置
１０通信部
１１差分マップ取得部
１２ペアリング部
１３レプリケート部
１４セパレート部
１５差分情報蓄積部
１６差分マップ情報
１７ＭＶ
１８ＲＶ
２重複排除ストレージ装置
２０通信部
２１ＲＡＷバックアップ部
２２ＲＡＷイメージリストア部
２３ＦＧファイルシステム
２４ＢＧファイルシステム
２５ＲＡＷイメージ構成情報
２６フラグメント化判断部
２７ＲＡＷイメージ解析部
２８抽出ブロック決定部
２９ファイル瞬間コピー部
３業務サーバ装置
１００ストレージシステム
１０１イメージ解析手段
１０２バックアップ手段
１１０記憶装置
DESCRIPTION OF SYMBOLS 1 SAN apparatus 10 Communication part 11 Difference map acquisition part 12 Pairing part 13 Replicate part 14 Separate part 15 Difference information storage part 16 Difference map information 17 MV
18 RV
2 Deduplication storage device 20 Communication unit 21 RAW backup unit 22 RAW image restore unit 23 FG file system 24 BG file system 25 RAW image configuration information 26 Fragmentation determination unit 27 RAW image analysis unit 28 Extraction block determination unit 29 File instantaneous copy unit 3 Business server device 100 Storage system 101 Image analysis means 102 Backup means 110 Storage device

Claims

Image analysis means for analyzing RAW image data to be subjected to RAW backup;
Backup means for performing RAW backup by storing the RAW image data in a storage device by eliminating duplicate storage;
With
The image analysis means analyzes the RAW image data and extracts data that can be classified into a predetermined group from the RAW image data.
The backup means summarizes the data extracted by the image analysis means as one storage data for each of the classified groups, and summarizes the data not extracted as one storage data. Store each stored data,
Storage system.

The storage system according to claim 1,
The image analysis means divides the RAW image data into block data of a predetermined capacity, classifies and extracts the block data into the groups,
The backup means summarizes the block data extracted by the image analysis means as one storage data for each of the classified groups, and summarizes the block data not extracted as one storage data. Store each stored data individually,
Storage system.

The storage system according to claim 1 or 2,
The image analysis means extracts data that can be classified from the RAW image data into files that satisfy a predetermined criterion set as the group,
The backup means summarizes the data extracted by the image analysis means as one storage data for each of the classified files, and summarizes the data not extracted as one storage data. Memorize memorized data,
Storage system.

The storage system according to claim 5,
The image analysis means extracts data that can be classified into the files whose capacity set as the group exceeds a threshold value from the RAW image data.
Storage system.

The storage system according to claim 1 or 2,
The image analysis means extracts data that can be classified from the RAW image data into a data group in which the update time of the data set as the group satisfies a predetermined criterion,
The backup means summarizes the data extracted by the image analysis means as one storage data for each of the classified data groups, and summarizes the data not extracted as one storage data. Store each stored data,
Storage system.

The storage system according to claim 5,
The image analysis means extracts data that can be classified into the data group having the same update time of the data set as the group from the RAW image data.
Storage system.

The storage system according to any one of claims 1 to 6,
The backup means stores position information in the RAW image data of each data collected in each storage data,
Data output means for outputting as one RAW image data based on each stored data and the position information stored,
Storage system.

The storage system according to any one of claims 1 to 7,
A distribution determination unit configured to determine a distribution status of data in the RAW image data according to a capacity and arrangement status of data in a predetermined unit in the RAW image data;
The image analysis unit and the backup unit are configured to operate according to the determination result by the dispersion determination unit.
Storage system.

In the information processing device,
Image analysis means for analyzing RAW image data to be subjected to RAW backup;
Backup means for performing RAW backup by storing the RAW image data in a storage device by eliminating duplicate storage;
And realize
The image analysis means analyzes the RAW image data and extracts data that can be classified into a predetermined group from the RAW image data.
The backup means summarizes the data extracted by the image analysis means as one storage data for each of the classified groups, and summarizes the data not extracted as one storage data. Store each stored data,
A program to make things happen.

Analyzing the raw image data subject to raw backup,
The RAW image data is stored in a storage device by eliminating duplicate storage, and RAW backup is performed.
During the analysis of the RAW image data, data that can be classified into a predetermined group is extracted from the RAW image data,
The data extracted by the analysis of the RAW image data at the time of the RAW backup is summarized as one storage data for each of the classified groups, and the data not extracted is summarized as one storage data. Store each stored data individually,
Backup method.