JP2020149229A

JP2020149229A - Duplicate eliminating apparatus, duplicate eliminating method, program and storage media

Info

Publication number: JP2020149229A
Application number: JP2019045030A
Authority: JP
Inventors: 知広猪鹿倉; Tomohiro Ikakura
Original assignee: NEC Corp; NEC Solution Innovators Ltd
Current assignee: NEC Corp; NEC Solution Innovators Ltd
Priority date: 2019-03-12
Filing date: 2019-03-12
Publication date: 2020-09-17

Abstract

To provide a duplicate eliminating apparatus capable of low-cost and short-time processing while minimizing high-load processing.SOLUTION: In a duplicate eliminating apparatus 1 of the present invention, an identification unit 11 identifies the type of operating system information based on file system information existing in a first area of a data group stored in a data storage area having identification information in a storage, a duplicate area detection unit 12 detects an area in which duplicate data exists from the data group having the same type of the operating system information in the data storage area, and a duplicate eliminating unit 13 stores the duplicate data in at least one of the areas and eliminates the duplicate data from the other areas.SELECTED DRAWING: Figure 1

Description

本発明は、重複排除装置、重複排除方法、プログラム及び記録媒体に関する。 The present invention relates to deduplication devices, deduplication methods, programs and recording media.

従来から、ストレージにおける記憶容量コストを抑制するために、重複したデータをストレージ内から排除する技術が用いられている（特許文献１）。 Conventionally, in order to suppress the storage capacity cost in storage, a technique for eliminating duplicated data from the storage has been used (Patent Document 1).

特許第４９９０８２８号公報Japanese Patent No. 4990828

しかしながら、ストレージ内の全データを比較し、データの重複を確認することは、負荷の高い処理が必要となる。そのため、低性能なハードウエアでは、前記処理に時間を要するという問題がある。一方で、処理の負荷が高くとも短時間処理を可能とする高性能なハードウエアは、コストが高いため、実装することが困難という問題がある。 However, comparing all the data in the storage and confirming the duplication of data requires a heavy processing. Therefore, in low-performance hardware, there is a problem that the processing takes time. On the other hand, high-performance hardware that enables short-time processing even if the processing load is high has a problem that it is difficult to implement because of its high cost.

そこで、本発明は、低コストで、負荷の高い処理を最小限に抑え、且つ、短時間処理を可能とする重複排除装置、重複排除方法、プログラム及び記録媒体の提供を目的とする。 Therefore, an object of the present invention is to provide a deduplication device, a deduplication method, a program, and a recording medium that can perform processing in a short time while minimizing high-load processing at low cost.

前記目的を達成するために、本発明の重複排除装置は、
特定部、重複領域検出部、及び重複排除部を含み、
前記特定部は、ストレージ内において識別情報を有するデータ記憶領域に格納されているデータ群の第１領域に存在するファイルシステム情報に基づき、オペレーティングシステム情報の種類を特定し、
前記重複領域検出部は、前記データ記憶領域において、同一種類の前記オペレーティングシステム情報を有する前記データ群から、重複データが存在する領域を検出し、
前記重複排除部は、前記重複データを少なくとも１つの前記領域に格納し、且つ、前記重複データを他の領域から排除する、装置である。 In order to achieve the above object, the deduplication device of the present invention is
Including a specific part, an overlapping area detection part, and a deduplication part
The specific unit identifies the type of operating system information based on the file system information existing in the first area of the data group stored in the data storage area having the identification information in the storage.
In the data storage area, the overlapping area detection unit detects an area in which duplicate data exists from the data group having the same type of operating system information.
The deduplication unit is a device that stores the duplication data in at least one of the regions and eliminates the duplication data from the other regions.

本発明の重複排除方法は、
特定工程、重複領域検出工程、及び重複排除工程を含み、
前記特定工程は、ストレージ内において識別情報を有するデータ記憶領域に格納されているデータ群の第１領域に存在するファイルシステム情報に基づき、オペレーティングシステム情報の種類を特定し、
前記重複領域検出工程は、前記データ記憶領域において、同一種類の前記オペレーティングシステム情報を有する前記データ群から、重複データが存在する領域を検出し、
前記重複排除工程は、前記重複データを少なくとも１つの前記領域に格納し、且つ、前記重複データを他の領域から排除する、方法である。 The deduplication method of the present invention
Including a specific step, a duplication region detection step, and a duplication elimination step
In the specific step, the type of operating system information is specified based on the file system information existing in the first area of the data group stored in the data storage area having the identification information in the storage.
The overlapping area detection step detects an area in which duplicate data exists from the data group having the same type of operating system information in the data storage area.
The deduplication step is a method of storing the duplicated data in at least one of the regions and eliminating the duplicated data from the other regions.

本発明によれば、例えば、低コストで、負荷の高い処理を最小限に抑え、且つ、短時間処理が可能となる。 According to the present invention, for example, low-cost, high-load processing can be minimized, and short-time processing can be performed.

図１は、実施形態１の装置の一例の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of an example of the device of the first embodiment. 図２は、実施形態１の装置のハードウエア構成の一例を示すブロック図である。FIG. 2 is a block diagram showing an example of the hardware configuration of the apparatus of the first embodiment. 図３は、実施形態１の装置における処理の一例を示すフローチャートである。FIG. 3 is a flowchart showing an example of processing in the apparatus of the first embodiment. 図４は、実施形態１の装置におけるデータ群の構成の一例を示す模式図である。FIG. 4 is a schematic diagram showing an example of the configuration of the data group in the apparatus of the first embodiment. 図５は、実施形態１の装置において、ストレージが仮想化されたストレージである場合の一例を示す模式図である。FIG. 5 is a schematic diagram showing an example of a case where the storage is virtualized storage in the device of the first embodiment. 図６は、実施形態２の装置の一例の構成を示すブロック図である。FIG. 6 is a block diagram showing a configuration of an example of the device of the second embodiment. 図７は、実施形態２の装置において、データ記憶領域を特定する一例を示す模式図である。FIG. 7 is a schematic diagram showing an example of specifying a data storage area in the apparatus of the second embodiment.

本発明の装置は、さらに、データ記憶領域特定部を含み、前記データ記憶領域特定部は、同一種類の前記オペレーティングシステム情報を有する前記データ記憶領域を、前記識別情報を用いて特定し、前記重複領域検出部は、前記特定されたデータ記憶領域間において、同一種類の前記オペレーティングシステム情報を有する前記データ群から、重複データが存在する領域を検出する、という態様であってもよい。 The device of the present invention further includes a data storage area specifying unit, which identifies the data storage area having the same type of operating system information by using the identification information, and the duplication. The area detection unit may be in an embodiment in which an area in which duplicate data exists is detected from the data group having the same type of operating system information between the specified data storage areas.

本発明の装置において、前記特定部は、仮想化されたストレージ内において前記データ群の仮想ディスク領域における第１領域に存在するファイルシステム情報に基づき、オペレーティングシステム情報の種類を特定する、という態様であってもよい。 In the apparatus of the present invention, the specific unit specifies the type of operating system information based on the file system information existing in the first area of the virtual disk area of the data group in the virtualized storage. There may be.

本発明の方法は、さらに、データ記憶領域特定工程を含み、前記データ記憶領域特定工程は、同一種類の前記オペレーティングシステム情報を有する前記データ記憶領域を、前記識別情報を用いて特定し、前記重複領域検出工程は、前記特定されたデータ記憶領域間において、同一種類の前記オペレーティングシステム情報を有する前記データ群から、重複データが存在する領域を検出する、という態様であってもよい。 The method of the present invention further includes a data storage area specifying step, in which the data storage area specifying step identifies the data storage area having the same type of operating system information using the identification information, and the duplication. The area detection step may be an embodiment in which an area in which duplicate data exists is detected from the data group having the same type of operating system information between the specified data storage areas.

本発明の方法において、前記特定工程は、仮想化されたストレージ内において前記データ群の仮想ディスク領域における第１領域に存在するファイルシステム情報に基づき、オペレーティングシステム情報の種類を特定する、という態様であってもよい。 In the method of the present invention, the specific step identifies the type of operating system information based on the file system information existing in the first area of the virtual disk area of the data group in the virtualized storage. There may be.

本発明のプログラムは、本発明の方法をコンピュータ上で実行可能なプログラムである。 The program of the present invention is a program capable of executing the method of the present invention on a computer.

本発明の記録媒体は、本発明のプログラムを記録しているコンピュータ読み取り可能な記録媒体である。 The recording medium of the present invention is a computer-readable recording medium on which the program of the present invention is recorded.

前記重複データとは、例えば、同一種類のオペレーティングシステム（以下、ＯＳ）のＯＳ領域、データのバックアップ及びコピー等により、同一内容を有するデータのことである。一般的に、データが重複する状況は、同じＯＳ上で保存されている場合が多い。例えば、ＯＳ領域でデータが重複するには、同じＯＳを使用していることが必須である。また、例えば、前記バックアップの場合は、ＯＳも複製されるため、同じＯＳとなる。さらに、例えば、前記コピーの場合は、過去のデータを保持するためコピーすることが想定されるため、同じＯＳとなることが想定される。 The duplicated data is data having the same contents due to, for example, an OS area of the same type of operating system (hereinafter, OS), data backup and copying, and the like. In general, the situation where data is duplicated is often saved on the same OS. For example, in order for data to be duplicated in the OS area, it is essential that the same OS is used. Further, for example, in the case of the backup, the OS is also duplicated, so that the OS is the same. Further, for example, in the case of the copy, it is assumed that the copy is performed in order to retain the past data, so that the OS is assumed to be the same.

本発明の実施形態について図を用いて説明する。本発明は、以下の実施形態には限定されない。以下の各図において、同一部分には、同一符号を付している。また、各実施形態の説明は、特に言及がない限り、互いの説明を援用でき、各実施形態の構成は、特に言及がない限り、組合せ可能である。 An embodiment of the present invention will be described with reference to the drawings. The present invention is not limited to the following embodiments. In each of the following figures, the same parts are designated by the same reference numerals. Further, the explanations of the respective embodiments can be referred to each other unless otherwise specified, and the configurations of the respective embodiments can be combined unless otherwise specified.

［実施形態１］
図１は、本実施形態の重複排除装置１の一例の構成を示すブロック図である。図１に示すように、本装置１は、特定部１１、重複領域検出部１２、及び重複排除部１３を含む。 [Embodiment 1]
FIG. 1 is a block diagram showing a configuration of an example of the deduplication device 1 of the present embodiment. As shown in FIG. 1, the apparatus 1 includes a specific unit 11, an overlap region detection unit 12, and a deduplication unit 13.

本装置１は、例えば、前記各部を含む１つの装置でもよいし、前記各部が、通信回線網を介して接続可能な装置でもよい。また、図示していないが、本装置１は、前記通信回線網を介して、後述する外部装置と接続可能である。前記通信回線網は、特に制限されず、公知のネットワークを使用でき、例えば、有線でも無線でもよい。前記通信回線網は、例えば、インターネット回線、電話回線、ＬＡＮ（Local Area Network）、ＷｉＦｉ（Wireless Fidelity）等があげられる。本装置１は、例えば、システムとしてサーバに組み込まれていてもよい。また、本装置１は、例えば、本発明のプログラムがインストールされた端末であってもよい。前記端末としては、特に制限されず、例えば、パーソナルコンピュータ（ＰＣ）、スマートフォン、タブレット、携帯電話等が挙げられる。 The device 1 may be, for example, one device including the above-mentioned parts, or a device in which the above-mentioned parts can be connected via a communication network. Further, although not shown, the present device 1 can be connected to an external device described later via the communication network. The communication network is not particularly limited, and a known network can be used. For example, it may be wired or wireless. Examples of the communication line network include an Internet line, a telephone line, a LAN (Local Area Network), WiFi (Wireless Fidelity), and the like. The apparatus 1 may be incorporated in the server as a system, for example. Further, the present device 1 may be, for example, a terminal in which the program of the present invention is installed. The terminal is not particularly limited, and examples thereof include a personal computer (PC), a smartphone, a tablet, and a mobile phone.

図２に、本装置１のハードウエア構成のブロック図を例示する。本装置１は、例えば、ＣＰＵ（中央処理装置）１０１、メモリ１０２、バス１０３、記憶装置１０４、入力装置１０５、表示装置（ディスプレイ）１０６、通信デバイス１０７等を有する。本装置１は、例えば、コンピュータ１００の記憶装置１０４に本発明のプログラム等を内蔵した装置である。コンピュータ１００は、一般的なコンピュータである。本装置１の各部は、それぞれのインタフェース（Ｉ／Ｆ）により、バス１０３を介して相互に接続されている。 FIG. 2 illustrates a block diagram of the hardware configuration of the present device 1. The device 1 includes, for example, a CPU (central processing unit) 101, a memory 102, a bus 103, a storage device 104, an input device 105, a display device (display) 106, a communication device 107, and the like. The device 1 is, for example, a device in which the program of the present invention or the like is built in the storage device 104 of the computer 100. The computer 100 is a general computer. Each part of the apparatus 1 is connected to each other via the bus 103 by each interface (I / F).

ＣＰＵ１０１は、本装置１の全体の制御を担う。本装置１において、ＣＰＵ１０１により、例えば、本発明のプログラムやその他のプログラムが実行され、また、各種情報の読み込みや書き込みが行われる。なお、本発明において、ＣＰＵに代えてＧＰＵ等の他の演算装置を用いても良い。 The CPU 101 is responsible for controlling the entire device 1. In the present device 1, for example, the program of the present invention and other programs are executed by the CPU 101, and various information is read and written. In the present invention, another arithmetic unit such as a GPU may be used instead of the CPU.

バス１０３は、例えば、外部装置とも接続できる。前記外部装置は、例えば、後述する外部記憶装置（外部ストレージ等）、プリンター等があげられる。本装置１は、例えば、バス１０３に接続された通信デバイス１０７により、外部ネットワーク（通信回線網）３に接続でき、外部ネットワーク３を介して、前記外部装置と接続することもできる。 The bus 103 can also be connected to, for example, an external device. Examples of the external device include an external storage device (external storage and the like), a printer, and the like, which will be described later. The device 1 can be connected to the external network (communication network) 3 by the communication device 107 connected to the bus 103, and can also be connected to the external device via the external network 3.

メモリ１０２は、例えば、メインメモリ（主記憶装置）が挙げられる。ＣＰＵ１０１が処理を行う際には、例えば、後述する記憶装置１０４に記憶されている本発明のプログラム等の種々の動作プログラムを、メモリ１０２が読み込み、ＣＰＵ１０１は、メモリ１０２からデータを受け取って、プログラムを実行する。前記メインメモリは、例えば、ＲＡＭ（ランダムアクセスメモリ）である。また、メモリ１０２は、例えば、ＲＯＭ（読み出し専用メモリ）であってもよい。 Examples of the memory 102 include a main memory (main storage device). When the CPU 101 performs processing, for example, the memory 102 reads various operation programs such as the program of the present invention stored in the storage device 104 described later, and the CPU 101 receives data from the memory 102 and programs. To execute. The main memory is, for example, a RAM (random access memory). Further, the memory 102 may be, for example, a ROM (read-only memory).

記憶装置１０４は、例えば、前記メインメモリ（主記憶装置）に対して、いわゆる補助記憶装置ともいう。前述のように、記憶装置１０４には、本発明のプログラムを含む動作プログラムが格納されている。記憶装置１０４は、例えば、記憶媒体と、記憶媒体に読み書きするドライブとの組合せであってもよい。前記記憶媒体は、特に制限されず、例えば、内蔵型でも外付け型でもよく、ＨＤ（ハードディスク）、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＣＤ−ＲＷ、ＭＯ、ＤＶＤ、フラッシュメモリー、メモリーカード等が挙げられる。記憶装置１０４は、例えば、記憶媒体とドライブとが一体化されたハードディスクドライブ（ＨＤＤ）であってもよい。また、記憶装置１０４は、ＯＳを格納していてもよい。 The storage device 104 is also referred to as a so-called auxiliary storage device with respect to the main memory (main storage device), for example. As described above, the storage device 104 stores an operation program including the program of the present invention. The storage device 104 may be, for example, a combination of a storage medium and a drive for reading and writing to the storage medium. The storage medium is not particularly limited, and may be an internal type or an external type, and examples thereof include HD (hard disk), CD-ROM, CD-R, CD-RW, MO, DVD, flash memory, and memory card. Be done. The storage device 104 may be, for example, a hard disk drive (HDD) in which a storage medium and a drive are integrated. Further, the storage device 104 may store the OS.

すなわち、図２において、本装置１は、本発明のプログラムをメモリ１０２が読み込み、ＣＰＵ１０１が、メモリ１０２からデータを受け取って、本発明のプログラムを実行するコンピュータ１００であるといえる。 That is, in FIG. 2, it can be said that the apparatus 1 is a computer 100 in which the memory 102 reads the program of the present invention, the CPU 101 receives data from the memory 102, and executes the program of the present invention.

本装置１は、例えば、さらに、入力装置１０５、ディスプレイ１０６を有する。入力装置１０５は、例えば、タッチパネル、キーボード、マウス等である。ディスプレイ１０６は、例えば、ＬＥＤディスプレイ、液晶ディスプレイ等が挙げられる。 The device 1 further includes, for example, an input device 105 and a display 106. The input device 105 is, for example, a touch panel, a keyboard, a mouse, or the like. Examples of the display 106 include an LED display and a liquid crystal display.

特定部１１は、ストレージ内において識別情報を有するデータ記憶領域に格納されているデータ群の第１領域に存在するファイルシステム情報に基づき、オペレーティングシステム情報（以下、ＯＳ情報ともいう）の種類を特定する。前記識別情報は、例えば、ＬＵＮ（Logical Unit Number）等が挙げられる。前記データ群は、ファイルシステムにおいてファイルやディレクトリ毎の領域に情報を記録したデータである。本実施形態において、前記第１領域とは、ファイルシステム情報が存在している前記各領域のうち読み出しを開始する最先の領域のことをいう。 The identification unit 11 identifies the type of operating system information (hereinafter, also referred to as OS information) based on the file system information existing in the first area of the data group stored in the data storage area having the identification information in the storage. To do. Examples of the identification information include LUN (Logical Unit Number) and the like. The data group is data in which information is recorded in an area for each file or directory in the file system. In the present embodiment, the first area means the earliest area in which the file system information exists and the reading is started.

図４に、前記データ群の構成の例を示す。図４において、前記データ群は、左側からデータの読み出しを開始する。前記ファイルシステム情報は、前述のように、前記第１領域に存在する。また、前記ファイルシステム情報よりも後方（右側）に、例えば、ＯＳ情報を記録したＯＳファイルが存在する。さらに、前記ＯＳファイルよりも後方（右側）に、例えば、アクティブページ（ＡＰ）情報を記録したＡＰファイル等の他のファイルが存在する。 FIG. 4 shows an example of the configuration of the data group. In FIG. 4, the data group starts reading data from the left side. The file system information exists in the first area as described above. Further, for example, an OS file in which OS information is recorded exists behind (on the right side) the file system information. Further, behind (on the right side) the OS file, for example, there is another file such as an AP file in which active page (AP) information is recorded.

前記ストレージは、特に制限されず、例えば、図２に示すように、記憶装置１０４内のストレージでもよいし、本装置１と通信回線網３によって接続された外部ストレージ２を対象としてもよい。外部ストレージ２は、特に制限されず、例えば、仮想化されたストレージでもよいし、ユーザ装置内のストレージでもよい。前記ストレージが仮想化されたストレージである場合、特定部１１は、さらに、仮想化されたストレージ内において前記データ群の仮想ディスク領域における第１領域に存在するファイルシステム情報に基づき、オペレーティングシステム情報の種類を特定してもよい。 The storage is not particularly limited, and for example, as shown in FIG. 2, the storage may be the storage in the storage device 104, or the external storage 2 connected to the device 1 by the communication network 3 may be targeted. The external storage 2 is not particularly limited, and may be, for example, virtualized storage or storage in the user device. When the storage is virtualized storage, the identification unit 11 further obtains operating system information based on the file system information existing in the first area of the virtual disk area of the data group in the virtualized storage. The type may be specified.

図５に基づき、前記ストレージが仮想化されたストレージである場合の例を説明する。図５（Ａ）は、仮想サーバとストレージとの関係を示す模式図であり、図５（Ｂ）は、図５（Ａ）の前記ストレージ内における前記データ群の構造の一例を示す模式図である。図５（Ａ）において、各サーバは、サーバ仮想化基板上に存在する仮想サーバであり、前記各仮想サーバは、それぞれ、ＯＳとファイルシステムを含む。なお、図５（Ａ）において、前記各仮想サーバは、３つとしているが、同図は例示であって、これに限定されない。図５（Ａ）に示すように、前記各仮想サーバは、仮想化されたストレージの前記データ記憶領域を共有して使用している。一方で、図５（Ｂ）に示す前記データ群は、左側からデータの読み出しを開始する。前記仮想化されたストレージ内の前記データ群は、図５（Ｂ）に示すように、前記各仮想サーバの仮想化マシン毎に、仮想ディスク領域がある。また、前記開始側の前記仮想ディスク領域よりも前記開始側の領域に仮想化基板ファイルシステム情報が存在する。前記ファイルシステム情報は、前記各仮想ディスク領域内の各領域のうち前記開始側の最先の領域（第１領域）にそれぞれ存在し、前記ＯＳファイルは、前記ファイルシステム情報よりも後方（右側）に存在する。なお、図示していないが、前記ＯＳファイルよりも後方に、前記他のファイルが存在していてもよい。 An example of the case where the storage is a virtualized storage will be described with reference to FIG. 5 (A) is a schematic diagram showing the relationship between the virtual server and the storage, and FIG. 5 (B) is a schematic diagram showing an example of the structure of the data group in the storage of FIG. 5 (A). is there. In FIG. 5A, each server is a virtual server existing on a server virtualization board, and each of the virtual servers includes an OS and a file system, respectively. In FIG. 5A, the number of each virtual server is three, but the figure is an example and is not limited thereto. As shown in FIG. 5A, each of the virtual servers shares and uses the data storage area of the virtualized storage. On the other hand, the data group shown in FIG. 5B starts reading data from the left side. As shown in FIG. 5B, the data group in the virtualized storage has a virtual disk area for each virtualization machine of the virtual server. Further, the virtualization board file system information exists in the area on the start side rather than the virtual disk area on the start side. The file system information exists in the earliest area (first area) on the start side of each area in each virtual disk area, and the OS file is behind (right side) of the file system information. Exists in. Although not shown, the other file may exist behind the OS file.

重複領域検出部１２は、前記データ記憶領域において、同一種類の前記ＯＳ情報を有する前記データ群から、重複データが存在する領域を検出する。前記重複データとは、例えば、前述のように、同一種類のＯＳのＯＳ領域、データのバックアップ及びコピー等により、同一内容を有するデータのことである。前記重複データが存在する領域を重複領域ともいう。 The overlapping area detection unit 12 detects an area in which duplicate data exists from the data group having the same type of OS information in the data storage area. The duplicated data is, for example, data having the same contents due to the OS area of the same type of OS, data backup, copy, etc., as described above. The area where the duplicated data exists is also referred to as a duplicated area.

重複排除部１３は、前記重複データを少なくとも１つの前記領域に格納し、且つ、前記重複データを他の領域から排除する。前記格納する領域は１つでもよいし、２つ以上でもよい。前記格納は、例えば、複数の前記重複データを蓄積して格納してもよいし、特定の重複データのみを格納してもよい。前記特定の重複データとは、特に制限されず、例えば、最新の日時が記録されているデータ及びコピーのオリジナルデータ等が挙げられる。前記排除は、例えば、前記他の領域（すなわち、前記格納する領域以外の前記領域）から前記重複データを削除することによって、排除してもよいし、前記重複データを前記格納する領域及び外部装置等に移動することによって、排除してもよい。 The deduplication unit 13 stores the duplicated data in at least one of the regions and eliminates the duplicated data from the other regions. The storage area may be one or two or more. In the storage, for example, a plurality of the duplicated data may be accumulated and stored, or only specific duplicated data may be stored. The specific duplicated data is not particularly limited, and examples thereof include data in which the latest date and time are recorded, original data of a copy, and the like. The exclusion may be eliminated, for example, by deleting the duplicate data from the other area (that is, the area other than the storage area), or the area for storing the duplicate data and an external device. It may be eliminated by moving to or the like.

次に、本装置１における処理の一例を、図１のブロック図及び図３のフローチャートに基づき説明する。 Next, an example of the processing in the present device 1 will be described with reference to the block diagram of FIG. 1 and the flowchart of FIG.

まず、特定工程により、ストレージ内において識別情報を有するデータ記憶領域に格納されているデータ群の第１領域に存在するファイルシステム情報に基づき、オペレーティングシステム情報の種類を特定する（Ｓ１）。また、前記ストレージが仮想化されたストレージである場合、前記特定工程は、さらに、仮想化ストレージ内において前記データ群の仮想ディスク領域における第１領域に存在するファイルシステム情報に基づき、オペレーティングシステム情報の種類を特定してもよい。 First, in the specific step, the type of operating system information is specified based on the file system information existing in the first area of the data group stored in the data storage area having the identification information in the storage (S1). Further, when the storage is virtualized storage, the specific step further obtains operating system information based on the file system information existing in the first area of the virtual disk area of the data group in the virtualized storage. The type may be specified.

つぎに、重複領域検出工程により、前記データ記憶領域において、同一種類の前記オペレーティングシステム情報を有する前記データ群から、重複データが存在する領域を検出する（Ｓ２）。 Next, the overlapping area detection step detects an area in which the duplicated data exists from the data group having the same type of operating system information in the data storage area (S2).

つぎに、前記重複排除工程により、前記重複データを少なくとも１つの前記領域に格納し、且つ、前記重複データを他の領域から排除し（Ｓ３）、終了する（ＥＮＤ）。 Next, by the deduplication step, the duplication data is stored in at least one of the regions, and the duplication data is excluded from the other regions (S3), and the process ends (END).

本実施形態によれば、ＯＳを特定することで、前記重複データの検出範囲を絞ることができるため、例えば、低コストのハードウエア（すなわち、低性能なハードウエア）においても、負荷の高い処理を最小限に抑え、且つ、短時間処理が可能となる。 According to the present embodiment, the detection range of the duplicated data can be narrowed down by specifying the OS, so that even low-cost hardware (that is, low-performance hardware) has a high load. Can be minimized and processed in a short time.

［実施形態２］
本実施形態は、さらに、前記ストレージ内に前記データ記憶領域が複数存在する形態である。特に示さない限り、本実施形態は、前記実施形態１の記載を援用できる。 [Embodiment 2]
In this embodiment, a plurality of the data storage areas are further present in the storage. Unless otherwise specified, the description of the first embodiment can be incorporated in the present embodiment.

図６に示すように、本装置１は、さらに、データ記憶領域特定部１４を含むこと以外、実施形態１の重複排除装置１と同じである。 As shown in FIG. 6, the present device 1 is the same as the deduplication device 1 of the first embodiment except that the data storage area specifying unit 14 is further included.

図７は、前記データ記憶領域を特定する一例を示す模式図である。なお、図７は、例示であって、これに限定されない。図７において、各サーバ２０（２０ａ、２０ｂ、及び２０ｃ）は、ＯＳとファイルシステムを含んでいる。図７において、前記ＯＳの数字は、前記ＯＳの種類を示し、例えば、サーバ２０ａのＯＳ１とサーバ２０ｂのＯＳ１とは、同一種類のＯＳ情報であることを示す。一方で、例えば、サーバ２０ａのＯＳ１とサーバ２０ｃのＯＳ２とは、異なる種類のＯＳ情報であることを示す。また、前記ファイルシステムの数字は、前記ＯＳと同様に、前記ファイルシステムの種類を示す。各サーバ２０は、１つのストレージ２１を共有して使用している。また、ストレージ２１内に、各サーバ２０と紐づけられた各ＬＵＮ（識別情報）を有する各データ記憶領域２２（２２ａ、２２ｂ、及び２２ｃ）が存在する。このように、前記ストレージ内に前記データ記憶領域が複数存在する場合、データ記憶領域特定部１４は、同一種類の前記オペレーティングシステム情報を有する前記データ記憶領域を、前記識別情報を用いて特定する。すなわち、データ記憶領域特定部１４は、ＯＳ１を含むサーバ２０ａ及び２０ｂに紐づけられた前記各ＬＵＮを有するデータ記憶領域２２ａ及び２２ｂを特定することができる。 FIG. 7 is a schematic diagram showing an example of specifying the data storage area. Note that FIG. 7 is an example and is not limited thereto. In FIG. 7, each server 20 (20a, 20b, and 20c) includes an OS and a file system. In FIG. 7, the number of the OS indicates the type of the OS, and for example, the OS1 of the server 20a and the OS1 of the server 20b indicate the same type of OS information. On the other hand, for example, it is shown that OS1 of the server 20a and OS2 of the server 20c are different types of OS information. Further, the number of the file system indicates the type of the file system as well as the OS. Each server 20 shares and uses one storage 21. Further, in the storage 21, each data storage area 22 (22a, 22b, and 22c) having each LUN (identification information) associated with each server 20 exists. In this way, when a plurality of the data storage areas exist in the storage, the data storage area specifying unit 14 identifies the data storage area having the same type of operating system information by using the identification information. That is, the data storage area specifying unit 14 can specify the data storage areas 22a and 22b having the respective LUNs associated with the servers 20a and 20b including the OS1.

本実施形態において、重複領域検出部１２は、前記特定されたデータ記憶領域間において、同一種類の前記オペレーティングシステム情報を有する前記データ群から、重複データが存在する領域を検出する。すなわち、重複領域検出部１２は、データ記憶領域２２ａ及び２２ｂ間における前記データ群から、前記重複領域を検出する。 In the present embodiment, the overlapping area detection unit 12 detects an area in which duplicate data exists from the data group having the same type of operating system information between the specified data storage areas. That is, the overlapping area detection unit 12 detects the overlapping area from the data group between the data storage areas 22a and 22b.

また、本実施形態の重複排除方法は、さらに、データ記憶領域特定工程を含む。前記データ記憶領域特定工程は、同一種類の前記オペレーティングシステム情報を有する前記データ記憶領域を、前記識別情報を用いて特定する。また、前記重複領域検出工程は、前記特定されたデータ記憶領域間において、同一種類の前記オペレーティングシステム情報を有する前記データ群から、重複データが存在する領域を検出する。 In addition, the deduplication method of the present embodiment further includes a data storage area specifying step. In the data storage area specifying step, the data storage area having the same type of operating system information is specified by using the identification information. In addition, the overlapping area detection step detects an area in which duplicate data exists from the data group having the same type of operating system information between the specified data storage areas.

本実施形態の重複排除装置１によっても、実施形態１の重複排除装置１と同様に、低性能なハードウエアにおいても、負荷の高い処理を最小限に抑え、且つ、短時間処理が可能となる。 Similar to the deduplication device 1 of the first embodiment, the deduplication device 1 of the present embodiment also enables low-performance hardware to minimize high-load processing and to perform short-time processing. ..

［実施形態３］
本実施形態のプログラムは、実施形態１及び２の重複排除方法を、コンピュータ上で実行可能なプログラムである。また、本実施形態のプログラムは、例えば、コンピュータ読み取り可能な記録媒体に記録されていてもよい。前記記録媒体としては、特に限定されず、例えば、読み出し専用メモリ（ＲＯＭ）、ハードディスク（ＨＤ）、光ディスク等が挙げられる。 [Embodiment 3]
The program of the present embodiment is a program capable of executing the deduplication method of the first and second embodiments on a computer. Further, the program of the present embodiment may be recorded on a computer-readable recording medium, for example. The recording medium is not particularly limited, and examples thereof include a read-only memory (ROM), a hard disk (HD), and an optical disk.

以上、実施形態を参照して本発明を説明したが、本発明は、上記実施形態に限定されるものではない。本発明の構成や詳細には、本発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 Although the present invention has been described above with reference to the embodiments, the present invention is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made to the structure and details of the present invention within the scope of the present invention.

本発明によれば、例えば、低コストで、負荷の高い処理を最小限に抑え、且つ、短時間処理を可能とする。このため、本発明は、例えば、低性能なハードウエアを使用してデータの重複排除を行う場合に、特に有用である。 According to the present invention, for example, low-cost, high-load processing can be minimized, and short-time processing can be performed. For this reason, the present invention is particularly useful when, for example, data deduplication is performed using low performance hardware.

１重複排除装置
２外部ストレージ
３通信回線網
１１特定部
１２重複領域検出部
１３重複排除部
１４データ記憶領域特定部
２０（２０ａ、２０ｂ、２０ｃ）サーバ
２１ストレージ
２２（２２ａ、２２ｂ、２２ｃ）データ記憶領域
１００コンピュータ
１０１ＣＰＵ
１０２メモリ
１０３バス
１０４記憶装置
１０５入力装置
１０６ディスプレイ
１０７通信デバイス 1 Deduplication device 2 External storage 3 Communication network 11 Specific unit 12 Overlapping area detection unit 13 Deduplication unit 14 Data storage area identification unit 20 (20a, 20b, 20c) Server 21 Storage 22 (22a, 22b, 22c) Data storage Area 100 Computer 101 CPU
102 Memory 103 Bus 104 Storage device 105 Input device 106 Display 107 Communication device

Claims

Including a specific part, an overlapping area detection part, and a deduplication part
The specific unit identifies the type of operating system information based on the file system information existing in the first area of the data group stored in the data storage area having the identification information in the storage.
In the data storage area, the overlapping area detection unit detects an area in which duplicate data exists from the data group having the same type of operating system information.
The deduplication unit is a deduplication device that stores the duplication data in at least one of the regions and eliminates the duplication data from the other regions.

In addition, it includes a data storage area identification unit.
The data storage area identification unit identifies the data storage area having the same type of operating system information by using the identification information.
The deduplication device according to claim 1, wherein the duplication area detection unit detects an area in which duplicate data exists from the data group having the same type of operating system information between the specified data storage areas.

The deduplication according to claim 1 or 2, wherein the specific unit specifies the type of operating system information based on the file system information existing in the first area of the virtual disk area of the data group in the virtualized storage. apparatus.

Including a specific step, a duplication region detection step, and a duplication elimination step
In the specific step, the type of operating system information is specified based on the file system information existing in the first area of the data group stored in the data storage area having the identification information in the storage.
The overlapping area detection step detects an area in which duplicate data exists from the data group having the same type of operating system information in the data storage area.
The deduplication step stores the duplicated data in at least one of the regions and eliminates the duplicated data from the other regions.
Deduplication method.

In addition, it includes a data storage area identification step.
In the data storage area specifying step, the data storage area having the same type of operating system information is specified by using the identification information.
The deduplication method according to claim 4, wherein the duplication area detection step detects an area in which duplicate data exists from the data group having the same type of operating system information between the specified data storage areas.

The deduplication according to claim 4 or 5, wherein the specific step specifies the type of operating system information based on the file system information existing in the first area of the virtual disk area of the data group in the virtualized storage. Method.

A program capable of executing the method according to any one of claims 4 to 6 on a computer.

A computer-readable recording medium on which the program according to claim 7 is recorded.