JP2006065424A

JP2006065424A - Data storage system, data storage device, similar file recording method to be used for the same and program therefor

Info

Publication number: JP2006065424A
Application number: JP2004244517A
Authority: JP
Inventors: Naoshi Ochimaru; 直詩落丸
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2004-08-25
Filing date: 2004-08-25
Publication date: 2006-03-09

Abstract

<P>PROBLEM TO BE SOLVED: To provide a data storage device for reconciling the reduction of a data size and the shortening of a data processing time. <P>SOLUTION: A data storage device 1 calculates the dissimilarity of inputted data and data belonging to a model data group 111 by a similarity deciding part 12, and generates difference data between the input data and model data by a difference generating part 122. When dissimilarity exceeds a dissimilarity threshold, the data storage device 1 transfers the input data name, the model data name, the calculated similarity and the difference data generated during calculation to a data managing part 11. The data managing part 11 compresses the difference data received from the similarity deciding part 12 by a data compressing part 114, and adds the difference data to a compression difference data group 113 for storage. The data managing part 11 associates the compressed difference data name, the input data name received from the similarity deciding part 12 and the model data name and the similarity with each other, and stores the data in a data table. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明はデータ記憶システム、データ記憶装置及びそれに用いる類似ファイル記録方法並びにそのプログラムに関し、特に情報処理システムにおける類似ファイルの記録方法に関する。 The present invention relates to a data storage system, a data storage device, a similar file recording method used therefor, and a program therefor, and more particularly to a similar file recording method in an information processing system.

従来、情報処理システムにおいては、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ：オペレーティングシステム）等を含む場合、そのバックアップとしてシステムイメージを保存することが多い。しかしながら、保存するシステムイメージのデータサイズは非常に大きなものとなるため、データ記録装置の容量を圧迫しやすい。 Conventionally, in an information processing system, when an OS (Operating System) is included, a system image is often stored as a backup. However, since the data size of the system image to be stored becomes very large, the capacity of the data recording apparatus is easily pressed.

これを解決するために、上記のシステムイメージを圧縮して保存するという方法があるが（例えば、特許文献１，２参照）、システムイメージは保存するデータ毎に独立しているため、データ同士の類似部分を利用した圧縮を行うことができない、また、データサイズの圧縮、解凍時間に時間がかかるという問題がある。 In order to solve this, there is a method of compressing and storing the above system image (for example, see Patent Documents 1 and 2), but the system image is independent for each data to be stored. There is a problem that compression using a similar part cannot be performed, and it takes time to compress and decompress the data size.

特開２００３−６７２３２号公報JP 2003-67232 A 特開２００３−９９３０８号公報JP 2003-99308 A

上述した従来の情報処理システムでは、任意のデータをデータ記録装置に記録する際に、記録するデータのサイズを少なくするのに入力データを圧縮する手法があるが、データによっては圧縮後のサイズ減少率が低いデータが存在し、圧縮処理及び解凍処理の時間に見合わない場合がある。 In the conventional information processing system described above, there is a method of compressing input data to reduce the size of data to be recorded when recording arbitrary data on a data recording apparatus, but depending on the data, the size is reduced after compression. There are cases where data with a low rate exists and the time for the compression process and the decompression process is not suitable.

そこで、本発明の目的は上記の問題点を解消し、データサイズの削減とデータ処理時間の短縮とを両立させることができるデータ記憶システム、データ記憶装置及びそれに用いる類似ファイル記録方法並びにそのプログラムを提供することにある。 Accordingly, an object of the present invention is to provide a data storage system, a data storage device, a similar file recording method used therefor, and a program thereof that can solve the above-described problems and achieve both reduction in data size and reduction in data processing time. It is to provide.

本発明によるデータ記憶システムは、データの読み書き可能なデータ記録装置を含む計算機から構成されるデータ記憶システムであって、前記データ記録装置に複数のデータを記録する際に前記データ同士の類似部分を利用した圧縮を行う手段を前記データ記録装置に備え、当該圧縮によって前記複数のデータの総サイズを減少させている。 A data storage system according to the present invention is a data storage system including a computer including a data recording device capable of reading and writing data, and when a plurality of data is recorded on the data recording device, a similar part between the data is recorded. The data recording apparatus is provided with means for performing compression using the compression, and the total size of the plurality of data is reduced by the compression.

本発明による他のデータ記憶システムは、データの読み書き可能なデータ記録装置を含む計算機から構成されるデータ記憶システムであって、
入力されたデータと予め蓄積されたモデルデータ群に属するデータとについて非類似度を計算する類似度判定手段と、前記入力データと前記モデルデータとの差分データを生成する差分生成手段と、前記差分生成手段から受け取った差分データを圧縮するデータ圧縮手段と、圧縮した差分データ名と前記類似度判定手段から受け取った入力データ名及びモデルデータ名と類似度とに関連を持たせて保存するデータテーブルを含むデータ管理手段とを前記データ記録装置に備えている。 Another data storage system according to the present invention is a data storage system comprising a computer including a data recording device capable of reading and writing data,
Similarity determination means for calculating dissimilarity between input data and data belonging to a previously accumulated model data group, difference generation means for generating difference data between the input data and the model data, and the difference Data compression means for compressing the difference data received from the generation means, and a data table for storing the compressed difference data name in association with the input data name and model data name and similarity received from the similarity determination means The data recording device includes data management means including

本発明によるデータ記憶装置は、データの読み書き可能なデータ記録装置であって、複数のデータを記録する際に前記データ同士の類似部分を利用した圧縮を行う手段を備え、当該圧縮によって前記複数のデータの総サイズを減少させている。 A data storage device according to the present invention is a data recording device capable of reading and writing data, and includes a means for performing compression using a similar portion between the data when recording a plurality of data, and the plurality of the data by the compression. The total size of the data is reduced.

本発明による他のデータ記憶装置は、データの読み書き可能なデータ記録装置であって、
入力されたデータと予め蓄積されたモデルデータ群に属するデータとについて非類似度を計算する類似度判定手段と、前記入力データと前記モデルデータとの差分データを生成する差分生成手段と、前記差分生成手段から受け取った差分データを圧縮するデータ圧縮手段と、圧縮した差分データ名と前記類似度判定手段から受け取った入力データ名及びモデルデータ名と類似度とに関連を持たせて保存するデータテーブルを含むデータ管理手段とを備えている。 Another data storage device according to the present invention is a data recording device capable of reading and writing data,
Similarity determination means for calculating dissimilarity between input data and data belonging to a previously accumulated model data group, difference generation means for generating difference data between the input data and the model data, and the difference Data compression means for compressing the difference data received from the generation means, and a data table for storing the compressed difference data name in association with the input data name and model data name and similarity received from the similarity determination means Including data management means.

本発明による類似ファイル記録方法は、データの読み書き可能なデータ記録装置に用いられる類似ファイル記録方法であって、前記データ記録装置側に、複数のデータを記録する際に前記データ同士の類似部分を利用した圧縮を行うステップを備え、当該圧縮によって前記複数のデータの総サイズを減少させている。 A similar file recording method according to the present invention is a similar file recording method used in a data recording device capable of reading and writing data, and when recording a plurality of data on the data recording device side, similar portions between the data are recorded. And a step of performing compression using the data, and the total size of the plurality of data is reduced by the compression.

本発明による他の類似ファイル記録方法は、データの読み書き可能なデータ記録装置に用いられる類似ファイル記録方法であって、
前記データ記録装置側に、入力されたデータと予め蓄積されたモデルデータ群に属するデータとについて非類似度を計算するステップと、前記入力データと前記モデルデータとの差分データを生成するステップと、前記差分データを圧縮するステップと、圧縮した差分データ名と入力データ名及びモデルデータ名と類似度とに関連を持たせてデータテーブルに保存するステップとを備えている。 Another similar file recording method according to the present invention is a similar file recording method used in a data recording device capable of reading and writing data,
On the data recording device side, calculating the dissimilarity between the input data and data belonging to the model data group accumulated in advance, generating difference data between the input data and the model data, Compressing the difference data, and storing the compressed difference data name, the input data name, the model data name, and the similarity in a data table in association with each other.

本発明による類似ファイル記録方法のプログラムは、データの読み書き可能なデータ記録装置に用いられる類似ファイル記録方法のプログラムであって、前記データ記録装置側のコンピュータに、複数のデータを記録する際に前記データ同士の類似部分を利用した圧縮を行う処理を実行させ、当該圧縮によって前記複数のデータの総サイズを減少させている。 A program of a similar file recording method according to the present invention is a program of a similar file recording method used in a data recording device capable of reading and writing data, and when recording a plurality of data on a computer on the data recording device side, A process of performing compression using similar parts between data is executed, and the total size of the plurality of data is reduced by the compression.

本発明による他の類似ファイル記録方法のプログラムは、データの読み書き可能なデータ記録装置に用いられる類似ファイル記録方法のプログラムであって、前記データ記録装置側のコンピュータに、入力されたデータと予め蓄積されたモデルデータ群に属するデータとについて非類似度を計算する処理と、前記入力データと前記モデルデータとの差分データを生成する処理と、前記差分データを圧縮する処理と、圧縮した差分データ名と入力データ名及びモデルデータ名と類似度とに関連を持たせてデータテーブルに保存する処理とを実行させている。 Another similar file recording method program according to the present invention is a similar file recording method program used in a data recording device capable of reading and writing data, and the input data is stored in advance in the computer on the data recording device side. Processing for calculating dissimilarity with respect to data belonging to the model data group, processing for generating difference data between the input data and the model data, processing for compressing the difference data, and compressed difference data name And a process of storing the data in the data table in association with the input data name, the model data name, and the similarity.

すなわち、本発明のデータ記憶装置は、データの読み書き可能なデータ記録装置を備えた計算機から構成されるシステムにおいて、そのデータ記録装置に複数のデータを記録する際に、類似したデータ群の総データサイズを少なくすることを特徴とする。 That is, the data storage device of the present invention is a system composed of a computer equipped with a data recording device capable of reading and writing data. When recording a plurality of data in the data recording device, the total data of similar data groups It is characterized by reducing the size.

本発明のデータ記憶装置では、データ同士の類似部分を利用した圧縮によって、複数のデータの総サイズを減少させると同時に、データ同士の類似度を判断することによって、圧縮、解凍に必要な時間に対する圧縮、解凍効果が伴わない場合に圧縮、解凍時間を省略している。 In the data storage device of the present invention, the total size of a plurality of data is reduced by compression using a similar portion between the data, and at the same time, the degree of time required for compression and decompression is determined by determining the similarity between the data. When there is no compression / decompression effect, the compression / decompression time is omitted.

より具体的に説明すると、本発明のデータ記憶装置では、データ記憶装置が入力されたデータと自身の持つモデルデータ群に属するデータとについて類似度判定部によって非類似度を計算すると同時に、差分生成部が入力データとモデルデータとの差分データを生成する。 More specifically, in the data storage device of the present invention, the similarity determination unit calculates the dissimilarity between the data input to the data storage device and the data belonging to the model data group of the data storage device, and at the same time generates a difference. The unit generates difference data between the input data and the model data.

本発明のデータ記憶装置では、上記の計算の結果、非類似度が非類似度閾値を超えた場合、入力データ名、モデルデータ名、計算された類似度、計算中に生成された差分データをデータ管理部に渡す。 In the data storage device of the present invention, when the dissimilarity exceeds the dissimilarity threshold as a result of the above calculation, the input data name, the model data name, the calculated similarity, and the difference data generated during the calculation are Pass to the data management department.

データ管理部はデータ圧縮手段で類似度判定部より受け取った差分データを圧縮し、圧縮差分データ群に加えて保存する。また、データ管理部は圧縮した差分データ名、類似度判定部より受け取った入力データ名、モデルデータ名と類似度とに関連を持たせてデータテーブルに保存する。これによって、本発明の情報処理装置では、データをサイズを少なくして保存することが可能となる。 The data management unit compresses the difference data received from the similarity determination unit by the data compression unit, and stores it in addition to the compressed difference data group. Further, the data management unit stores the compressed difference data name, the input data name received from the similarity determination unit, the model data name and the similarity in association with each other in the data table. As a result, the information processing apparatus of the present invention can store data with a reduced size.

さらに、本発明のデータ記憶装置では、２個以上の類似したデータについて、差分生成部で排他的論理和をとることで差分をとり、データの論理的連続性を高め、その差分のみを圧縮記録することによってサイズを少なくする。 Further, in the data storage device of the present invention, two or more similar data are subjected to exclusive OR operation in the difference generation unit to obtain a difference, improve the logical continuity of the data, and compress and record only the difference. By reducing the size.

さらにまた、本発明のデータ記憶装置では、類似度判定部でファイル同士の非類似度が閾値設定手段で設定された非類似度閾値に満たない場合、差分生成処理及び圧縮処理の効果が低いと判断し、処理を行わずに処理時間を費やすことなく、データを記録することが可能となる。 Furthermore, in the data storage device of the present invention, when the dissimilarity between files in the similarity determination unit is less than the dissimilarity threshold set by the threshold setting means, the effect of the difference generation process and the compression process is low. It is possible to record data without making a decision and consuming processing time without performing processing.

これによって、本発明のデータ記憶装置では、記録データの圧縮条件を付加することで、データサイズの削減とデータ処理時間の短縮とを両立させることが可能となる。 Thereby, in the data storage device of the present invention, it is possible to achieve both reduction in data size and reduction in data processing time by adding a compression condition for recording data.

本発明は、以下に述べるような構成及び動作とすることで、データサイズの削減とデータ処理時間の短縮とを両立させることができるという効果が得られる。 According to the present invention, it is possible to achieve both the reduction in data size and the reduction in data processing time by adopting the configuration and operation described below.

次に、本発明の実施例について図面を参照して説明する。図１は本発明の一実施例によるデータ記憶装置の構成を示すブロック図である。図１において、データ記憶装置１はデータ管理部１１と、類似度判定部１２と、データ入力手段１３と、データ復元部１４と、データ出力手段１５と、記録媒体１６とを含んで構成されている。 Next, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing the configuration of a data storage device according to an embodiment of the present invention. In FIG. 1, the data storage device 1 includes a data management unit 11, a similarity determination unit 12, a data input unit 13, a data restoration unit 14, a data output unit 15, and a recording medium 16. Yes.

データ管理部１１はモデルデータ群１１１と、データテーブル１１２と、圧縮差分データ群１１３と、データ圧縮部１１４と、データ解凍部１１５とから構成されている。 The data management unit 11 includes a model data group 111, a data table 112, a compression difference data group 113, a data compression unit 114, and a data decompression unit 115.

類似度判定部１２は差分データと非類似度１２１とを生成する差分生成部１２２と、非類似度閾値１２３を設定する閾値設定手段１２４とを含み、非類似度１２１が非類似度閾値１２３を超えるかどうかを判定する。 The similarity determination unit 12 includes a difference generation unit 122 that generates difference data and a dissimilarity 121, and a threshold setting unit 124 that sets a dissimilarity threshold 123. The dissimilarity 121 sets the dissimilarity threshold 123. Determine if it exceeds.

モデルデータ群１１１は類似度を判定する際のモデルとなるデータで、入力データは装置内に類似するデータがないと判断された場合に、このモデルデータとして非圧縮で記録される。圧縮差分データ群１１３は装置内に類似するモデルデータがあると判断された場合に、モデルデータと入力データとの差分を圧縮したデータである。データテーブル１１２はモデルデータ名と、圧縮差分データ名と、これら二つのデータの類似度とを入力データ名の要素として管理する。 The model data group 111 is data serving as a model for determining the similarity. When it is determined that there is no similar data in the apparatus, the input data is recorded as uncompressed data as model data. The compressed difference data group 113 is data obtained by compressing the difference between the model data and the input data when it is determined that there is similar model data in the apparatus. The data table 112 manages the model data name, the compressed difference data name, and the similarity between these two data as elements of the input data name.

データ圧縮部１１４は差分生成部１２２から渡された差分データを圧縮し、圧縮差分データ群１１３に加えて記録する。データ解凍部１１５は圧縮差分データ群１１３に含まれるデータを解凍し、データ復元部１４に渡す。 The data compression unit 114 compresses the difference data passed from the difference generation unit 122 and records it in addition to the compressed difference data group 113. The data decompression unit 115 decompresses the data included in the compressed differential data group 113 and passes it to the data decompression unit 14.

差分生成部１２２は入力されたデータとモデルデータ群１１１に含まれるデータとの排他的論理和を差分として生成しながら、差分の累積和から非類似度１２１を計算する。 The difference generation unit 122 calculates the dissimilarity 121 from the cumulative sum of the differences while generating an exclusive OR of the input data and the data included in the model data group 111 as a difference.

データ復元部１４はデータ解凍部１１５によって解凍された差分データと、その差分に対応したモデルデータ群１１１に含まれるデータとの排他的論理和を復元データとして出力する。 The data restoration unit 14 outputs an exclusive OR of the difference data decompressed by the data decompression unit 115 and the data included in the model data group 111 corresponding to the difference as restoration data.

記録媒体１６はデータ記憶装置１のデータ管理部１１と、類似度判定部１２と、データ入力手段１３と、データ復元部１４と、データ出力手段１５とが実行するプログラムを格納しており、このプログラムは後述するデータ記憶装置１の各部の処理を実現するためのものである。 The recording medium 16 stores programs executed by the data management unit 11, the similarity determination unit 12, the data input unit 13, the data restoration unit 14, and the data output unit 15 of the data storage device 1. The program is for realizing processing of each unit of the data storage device 1 described later.

データ記憶装置１は入力されたデータと自身の持つモデルデータ群１１１に属するデータとについて類似度判定部１２によって非類似度１２１を計算する。同時に、差分生成部１２２は入力データとモデルデータとの差分データを生成する。類似度判定部１２は計算の結果、非類似度が非類似度閾値１２３を超えた場合、入力データ名、モデルデータ名、計算された類似度、計算中に生成された差分データをデータ管理部１１に渡す。 The data storage device 1 calculates the dissimilarity 121 for the input data and the data belonging to the model data group 111 possessed by the similarity determination unit 12. At the same time, the difference generation unit 122 generates difference data between the input data and the model data. When the dissimilarity exceeds the dissimilarity threshold 123 as a result of the calculation, the similarity determination unit 12 uses the input data name, the model data name, the calculated similarity, and the difference data generated during the calculation as a data management unit. 11

データ管理部１１はデータ圧縮部１１４で類似度判定部１２から受け取った差分データを圧縮し、圧縮差分データ群１１３に加え保存する。また、データ管理部１１は圧縮した差分データ名、類似度判定部１２から受け取った入力データ名、モデルデータ名と、類似度とに関連を持たせてデータテーブル１１２に保存する。これにより、本実施例では、データをサイズを少なくして保存することができる。 In the data management unit 11, the data compression unit 114 compresses the difference data received from the similarity determination unit 12 and stores it in addition to the compressed difference data group 113. In addition, the data management unit 11 stores the compressed difference data name, the input data name received from the similarity determination unit 12, the model data name, and the similarity in association with each other in the data table 112. Thereby, in this embodiment, the data can be stored with a reduced size.

図１において、本実施例では、２個以上の類似したデータについて、差分生成部１２１で排他的論理和をとることで差分をとり、データの論理的連続性を高め、その差分のみを圧縮記録することによって、データサイズを少なくしている。また、本実施例では、類似度判定部１２でファイル同士の非類似度１２１が閾値設定手段１２４で設定された非類似度閾値１２３に満たない場合、差分生成処理及び圧縮処理の効果が低いと判断し、処理を行わずに、処理時間を費やすことなく、データを記録することが可能となる。 In FIG. 1, in the present embodiment, two or more similar data are subjected to exclusive OR in the difference generation unit 121 to obtain a difference, to increase the logical continuity of the data, and only the difference is compressed and recorded. By doing so, the data size is reduced. Further, in this embodiment, when the dissimilarity 121 between files is less than the dissimilarity threshold 123 set by the threshold setting unit 124 in the similarity determination unit 12, the effect of the difference generation process and the compression process is low. It is possible to record data without determining and performing processing and without consuming processing time.

図２は図１のデータテーブル１１２の構成を示すブロック図である。図２において、データテーブル１１２は入力データ名群１１２１と、圧縮差分データ名群１１２２と、モデルデータ名群１１２３と、非類似度群１１２４とから構成されている。 FIG. 2 is a block diagram showing the configuration of the data table 112 of FIG. In FIG. 2, the data table 112 includes an input data name group 1121, a compressed difference data name group 1122, a model data name group 1123, and a dissimilarity group 1124.

入力データ名群１１２１は装置に記録されている全ての入力データ名を含み、特に圧縮差分データとして装置に記録された入力データの入力データ名は、その圧縮差分データ名を圧縮差分データ名群１１２２に、差分データを生成する際に使用したモデルデータ名をモデルデータ名群１１２３に、非類似度を非類似度群１１２４にそれぞれ要素としてを持つ。 The input data name group 1121 includes all input data names recorded in the apparatus. In particular, the input data name of the input data recorded in the apparatus as compressed differential data is obtained by converting the compressed differential data name into the compressed differential data name group 1122. In addition, the model data name used when generating the difference data is included in the model data name group 1123, and the dissimilarity is included in the dissimilarity group 1124 as an element.

図３〜図７は本発明の一実施例によるデータ記憶装置１の動作を示すフローチャートである。これら図１〜図７を参照して本発明の一実施例によるデータ記憶装置１の動作について説明する。尚、図３〜図７に示す処理はデータ記憶装置１の各部が記録媒体１６のプログラムを実行することで実現される。 3 to 7 are flowcharts showing the operation of the data storage device 1 according to one embodiment of the present invention. The operation of the data storage device 1 according to one embodiment of the present invention will be described with reference to FIGS. The processing shown in FIGS. 3 to 7 is realized by each part of the data storage device 1 executing the program of the recording medium 16.

データ記録装置１はデータ記録前に非類似度閾値を設定する必要があり、図３に示すように、閾値設定手段１２４によって非類似度閾値１２３を設定する（図３ステップＳ１）。 The data recording apparatus 1 needs to set a dissimilarity threshold before data recording, and as shown in FIG. 3, the dissimilarity threshold 123 is set by the threshold setting means 124 (step S1 in FIG. 3).

データ記録装置１の入力データ記録時の動作を図４に示す。データ記録装置１はデータ入力手段１３によって外部からデータが入力され、その入力データは類似度判定部１２に渡される（図４ステップＳ１１）。 The operation of the data recording apparatus 1 when recording input data is shown in FIG. In the data recording apparatus 1, data is input from the outside by the data input means 13, and the input data is passed to the similarity determination unit 12 (step S11 in FIG. 4).

データ管理部１１はモデルデータ群１１１にモデルデータが一つ以上存在するかをチェックし、さらに非類似度をチェックしていないモデルデータがあるかをチェックする（図４ステップＳ１２）。もし、モデルデータが一つもなければ、または非類似度をチェックしていないモデルデータが一つもなくなったら、入力データは類似度判定部１２からデータ管理部１１に渡され、モデルデータとしてモデルデータ群１１１に含められる（図４ステップＳ１３）。 The data management unit 11 checks whether there is one or more model data in the model data group 111, and further checks whether there is model data whose dissimilarity is not checked (step S12 in FIG. 4). If there is no model data, or if there is no model data whose dissimilarity is not checked, the input data is transferred from the similarity determination unit 12 to the data management unit 11 and model data group as model data 111 (step S13 in FIG. 4).

一方、非類似度をチェックすることができるモデルデータが一つ以上あれば、データ管理部１１はモデルデータ群１１１から記録順に一つのモデルデータを読込み、類似度判定部１２に渡す（図４ステップＳ１４）。類似度判定部１２は差分生成部１２２で入力データの先頭１ｂｉｔと、データ管理部１１から渡されたモデルデータの先頭１ｂｉｔとで排他的論理和を計算し、差分データとする。 On the other hand, if there is one or more model data whose dissimilarity can be checked, the data management unit 11 reads one model data in the recording order from the model data group 111 and passes it to the similarity determination unit 12 (step in FIG. 4). S14). The similarity determination unit 12 calculates an exclusive OR between the top 1 bit of the input data and the top 1 bit of the model data passed from the data management unit 11 by the difference generation unit 122 to obtain difference data.

また、差分生成部１２２はデータの差分を計算する毎に排他的論理和の累積和を計算し、その累積和を入力データサイズまたはモデルデータサイズのうちの大きい方のｂｉｔサイズで割った値を計算し、非類似度１２１とする（図４ステップＳ１５）。 Further, every time the difference generation unit 122 calculates the difference of data, the difference generation unit 122 calculates the cumulative sum of the exclusive OR, and the value obtained by dividing the cumulative sum by the larger bit size of the input data size or the model data size is obtained. The dissimilarity is calculated as 121 (step S15 in FIG. 4).

類似度判定部１２は非類似度１２１が非類似度閾値１２３を超えないかどうかをチェックし（図４ステップＳ１６）、超えていれば、読込んであったモデルデータの使用を止め、再度、異なるモデルデータでチェック可能なデータがあるかをチェックする（図４ステップＳ１２）。 The similarity determination unit 12 checks whether or not the dissimilarity 121 does not exceed the dissimilarity threshold 123 (step S16 in FIG. 4). It is checked whether there is data that can be checked in the model data (step S12 in FIG. 4).

類似度判定部１２は非類似度１２１が非類似度閾値１２３を超えていなければ、モデルデータと入力データとに次の差分を生成するのに必要な次のｂｉｔがあるかをチェックする（図４ステップＳ１７）。類似度判定部１２は次のｂｉｔがあれば、次のｂｉｔによって差分を生成し、非類似度１２１を計算する（図４ステップＳ１５）。 If the dissimilarity 121 does not exceed the dissimilarity threshold 123, the similarity determination unit 12 checks whether there is a next bit necessary for generating the next difference between the model data and the input data (see FIG. 4 step S17). If there is the next bit, the similarity determination unit 12 generates a difference based on the next bit and calculates the dissimilarity 121 (step S15 in FIG. 4).

次のｂｉｔがない場合、類似度判定部１２はモデルデータまたは入力データのどちらかに次のｂｉｔがあるかをチェックする（図４ステップＳ１７，Ｓ１８）。類似度判定部１２はどちらかにｂｉｔがある場合、不足しているデータの次のｂｉｔを０とする（図４ステップＳ１９）。そして、類似度判定部１２は不足していないデータの次のｂｉｔとの差分とを生成し、非類似度１２１を計算する（図４ステップＳ１５）。 When there is no next bit, the similarity determination unit 12 checks whether there is a next bit in either the model data or the input data (steps S17 and S18 in FIG. 4). When there is a bit in either one, the similarity determination unit 12 sets the next bit of the missing data to 0 (step S19 in FIG. 4). And the similarity determination part 12 produces | generates the difference with the next bit of the data which is not insufficient, and calculates the dissimilarity 121 (FIG. 4, step S15).

モデルデータと入力データとの両方に次のｂｉｔがない場合、類似度判定部１２は生成されていた差分ｂｉｔ列をデータ管理部１１に渡し、データ圧縮部１１４によって圧縮する（図４ステップＳ２０）。 When there is no next bit in both the model data and the input data, the similarity determination unit 12 passes the generated difference bit string to the data management unit 11 and is compressed by the data compression unit 114 (step S20 in FIG. 4). .

この後、データ管理部１１は入力データ名を参照タグとし、データ圧縮部１１４によって圧縮された差分データのデータ名、入力データとの差分をとったモデルデータのデータ名、計算された非類似度をその要素としてデータテーブル１１２に記録する（図４ステップＳ２１）。データ圧縮部１１４によって圧縮された差分データは、圧縮差分データとして圧縮差分データ群１１３に含められる（図４ステップＳ２２）。 Thereafter, the data management unit 11 uses the input data name as a reference tag, the data name of the difference data compressed by the data compression unit 114, the data name of the model data obtained by taking the difference from the input data, and the calculated dissimilarity Is recorded in the data table 112 as an element thereof (step S21 in FIG. 4). The differential data compressed by the data compression unit 114 is included in the compressed differential data group 113 as compressed differential data (step S22 in FIG. 4).

データ記録装置１の記録データの読出し時の動作を図５に示す。データ記録装置１のデータ管理部１１は読出し要求を受けたデータ名を持つデータがモデルデータ群１１１にあるかをチェックする（図５ステップＳ３１）。データ管理部１１はデータがあれば、該当するモデルデータをデータ出力手段１５に渡し、外部へ出力する（図５ステップＳ３７）。 The operation at the time of reading the recording data of the data recording apparatus 1 is shown in FIG. The data management unit 11 of the data recording apparatus 1 checks whether there is data having the data name for which the read request has been received in the model data group 111 (step S31 in FIG. 5). If there is data, the data management unit 11 passes the corresponding model data to the data output means 15 and outputs it to the outside (step S37 in FIG. 5).

データ管理部１１は読出し要求を受けたデータがモデルデータとしてなければ、読出し要求を受けたデータ名がデータテーブル１１２の入力データ名群１１２１にあるかをチェックし（図５ステップＳ３２）、データ名がなければ、処理を終了する。 If the data that received the read request is not model data, the data management unit 11 checks whether the data name that received the read request is in the input data name group 1121 of the data table 112 (step S32 in FIG. 5). If there is not, the process is terminated.

データ管理部１１は読出し要求を受けたデータ名が入力データ名群１１２１にあれば、圧縮差分データ名群１１２２とモデルデータ名群１１２３とからその入力データの要素を参照する（図５ステップＳ３３）。データ管理部１１は参照した圧縮差分データ名を持つデータを、圧縮差分データ群１１３から読出し、データ解凍部１１５にて解凍し、差分データとしてデータ復元部１４に渡す（図５ステップＳ３４）。 If the data name that has received the read request is in the input data name group 1121, the data management unit 11 refers to the element of the input data from the compressed difference data name group 1122 and the model data name group 1123 (step S33 in FIG. 5). . The data management unit 11 reads the data having the referenced compressed differential data name from the compressed differential data group 113, decompresses it with the data decompression unit 115, and passes it to the data restoration unit 14 as differential data (step S34 in FIG. 5).

データ管理部１１は参照したモデルデータ名を持つデータを、モデルデータ群１１１から読出し、データ復元部１４に渡す（図５ステップＳ３５）。データ復元部１４ではデータ管理部１１から渡された差分データとモデルデータとの排他的論理和を復元データとしてデータ出力手段１５へ渡し、外部へ出力する（図５ステップＳ３６）。 The data management unit 11 reads the data having the referenced model data name from the model data group 111 and passes it to the data restoration unit 14 (step S35 in FIG. 5). In the data restoration unit 14, the exclusive OR of the difference data and the model data delivered from the data management unit 11 is delivered to the data output unit 15 as restoration data, and is output to the outside (step S36 in FIG. 5).

データ記録装置１の記録データの削除時の動作を図６及び図７に示す。データ記録装置１のデータ管理部１１は削除要求を受けたデータ名がデータテーブル１１２の入力データ名群１１２１にあるかをチェックする（図６ステップＳ４１）。データ管理部１１は削除要求を受けたデータ名を持つ入力データ名があれば、その要素である圧縮差分データ名を参照し、参照したデータ名を持つ圧縮差分データを削除し（図６ステップＳ４２）、削除要求を受けたデータ名を持つ入力データ名とその要素とをデータテーブル１１２から削除し（図６ステップＳ４３）、処理を終了する。 The operation of the data recording apparatus 1 when deleting the recording data is shown in FIGS. The data management unit 11 of the data recording apparatus 1 checks whether or not the data name that has received the deletion request is in the input data name group 1121 of the data table 112 (step S41 in FIG. 6). If there is an input data name having the data name for which the deletion request has been received, the data management unit 11 refers to the compressed differential data name as the element, and deletes the compressed differential data having the referenced data name (step S42 in FIG. 6). ), The input data name having the data name for which the deletion request has been received and its elements are deleted from the data table 112 (step S43 in FIG. 6), and the process ends.

データ管理部１１は削除要求を受けたデータ名を持つデータが入力データ名群１１２１になければ、削除要求を受けたデータ名を持つモデルデータがモデルデータ群１１１にあるかをチェックし（図６ステップＳ４４）、削除要求を受けたデータ名を持つモデルデータがなければ、処理を終了する。 If the data having the data name for which the deletion request has been received is not in the input data name group 1121, the data management unit 11 checks whether the model data having the data name for which the deletion request has been received is in the model data group 111 (FIG. 6). In step S44), if there is no model data having the data name for which the deletion request has been received, the process is terminated.

データ管理部１１は削除要求を受けたデータ名を持つモデルデータがあれば、データテーブル１１２の入力データ名群１１２１のうち、要素であるモデルデータ名群１１２３に削除要求を受けたデータ名がないかをチェックし（図６ステップＳ４５）、削除要求を受けたデータ名がなければ、削除要求を受けたデータ名を持つモデルデータをモデルデータ群１１１より削除し（図７ステップＳ５３）、処理を終了する。 If there is model data having the data name that has received the deletion request, the data management unit 11 does not have the data name that has received the deletion request in the model data name group 1123 that is an element in the input data name group 1121 of the data table 112. If there is no data name for which the deletion request has been received, the model data having the data name for which the deletion request has been received is deleted from the model data group 111 (step S53 in FIG. 7). finish.

データ管理部１１はモデルデータ名群１１２３に削除要求を受けたデータ名がある場合、削除するモデルデータ名を要素として持つ入力データについて非類似度群１１２４を参照し、最小の非類似度を持つ入力データの圧縮差分データ名とモデルデータ名とを参照する（図６ステップＳ４６）。 When there is a data name for which a deletion request is received in the model data name group 1123, the data management unit 11 refers to the dissimilarity group 1124 for input data having the model data name to be deleted as an element, and has the minimum dissimilarity. Reference is made to the compression difference data name and model data name of the input data (step S46 in FIG. 6).

さらに、データ管理部１１は参照したモデルデータ名を持つデータをモデルデータ群１１１から読出してデータ復元部１４に渡し、参照した圧縮差分データ名を持つデータを圧縮差分データ群１１３から読出してデータ解凍部１１５によって差分データとした上で、データ復元部１４に渡す（図６ステップＳ４７）。 Further, the data management unit 11 reads data having the referenced model data name from the model data group 111 and passes it to the data restoring unit 14, and reads data having the referenced compressed difference data name from the compressed difference data group 113 to decompress the data. The difference data is converted into differential data by the unit 115 and then transferred to the data restoration unit 14 (step S47 in FIG. 6).

データ復元部１４はデータ管理部１１から渡された差分データとモデルデータとの排他的論理和を復元データとし、そのデータをデータ管理部１１に渡す。データ管理部１１はデータ復元部１４から渡されたデータを新たにモデルデータとしてモデルデータ群１１１に加える（図６ステップＳ４８）。 The data restoration unit 14 uses the exclusive OR of the difference data and model data delivered from the data management unit 11 as restoration data, and passes the data to the data management unit 11. The data management unit 11 adds the data passed from the data restoration unit 14 as new model data to the model data group 111 (step S48 in FIG. 6).

データ管理部１１は復元した入力データのデータ名とその要素とをデータテーブル１１２から削除し、復元に使用した圧縮差分データを圧縮差分データ群１１３から削除する（図６ステップＳ４９）。 The data management unit 11 deletes the data name of the restored input data and its elements from the data table 112, and deletes the compressed differential data used for the restoration from the compressed differential data group 113 (step S49 in FIG. 6).

次に、データ管理部１１はデータテーブル１１２に削除するモデル名を要素に持つ入力データが他にあるかをチェックし（図７ステップＳ５０）、入力データが他になければ、削除要求を受けたデータ名を持つモデルデータをモデルデータ群１１１から削除し（図７ステップＳ５３）、処理を終了する。 Next, the data management unit 11 checks whether there is any other input data having the model name to be deleted in the data table 112 (step S50 in FIG. 7). If there is no other input data, the data management unit 11 receives a deletion request. The model data having the data name is deleted from the model data group 111 (step S53 in FIG. 7), and the process ends.

データ管理部１１は入力データが他にあれば、その入力データを、上述した処理と同様に、データ復元部１４で復元し、入力データ名とその要素とをデータテーブル１１２から削除し、復元に使用した圧縮差分データを圧縮差分データ群１１３から削除する。データ復元部１４は復元したデータを類似度判定部１２に渡す（図６ステップＳ５１）。 If there is other input data, the data management unit 11 restores the input data by the data restoration unit 14 in the same manner as the processing described above, deletes the input data name and its elements from the data table 112, and restores them. The used compression difference data is deleted from the compression difference data group 113. The data restoration unit 14 passes the restored data to the similarity determination unit 12 (step S51 in FIG. 6).

類似度判定部１２ではデータ復元部１４から渡されたデータと、最小の非類似度を持ち復元された新たなモデルデータとで、外部から入力されたデータと同様に、類似度を判定して記録する（図６ステップＳ５２）。データ管理部１１は、再度、データテーブル１１２に削除するモデル名を要素に持つ入力データが他にあるかをチェックする（図６ステップＳ５０）。 The similarity determination unit 12 determines the similarity between the data passed from the data restoration unit 14 and the new model data restored with the minimum dissimilarity, in the same manner as the data input from the outside. Recording is performed (step S52 in FIG. 6). The data management unit 11 checks again whether there is any other input data having the model name to be deleted in the data table 112 (step S50 in FIG. 6).

このように、本実施例では、記録データの圧縮条件を付加することで、データサイズの削減とデータ処理時間の短縮とを両立させることができる。 As described above, in this embodiment, it is possible to achieve both reduction in data size and reduction in data processing time by adding recording data compression conditions.

図８は本発明の他の実施例によるデータ記憶システムの構成を示すブロック図である。図８において、本発明の他の実施例によるデータ記憶システムは計算機２と、データ記録媒体３と、記録媒体４と、計算機２とデータ記録媒体３とを接続する媒体搬送経路１００とから構成されている。搬媒体送経路１００は、特にデータ記録媒体３がネットワーク上の記録装置の場合のネットワークである。 FIG. 8 is a block diagram showing the configuration of a data storage system according to another embodiment of the present invention. In FIG. 8, a data storage system according to another embodiment of the present invention comprises a computer 2, a data recording medium 3, a recording medium 4, and a medium transport path 100 connecting the computer 2 and the data recording medium 3. ing. The transport medium transport path 100 is a network particularly when the data recording medium 3 is a recording device on the network.

計算機２はデータ記録媒体３のデータを読み取るためのデータ記録媒体読み取り手段２２と、データ記録媒体３の仮想データ出力部３１を実行する仮想データ出力部実行手段２１とを含んで構成されている。記録媒体４は計算機２が実行するプログラムを格納しており、計算機２がそのプログラムを実行することで、各手段の処理動作が実現される。 The computer 2 includes a data recording medium reading unit 22 for reading data on the data recording medium 3 and a virtual data output unit executing unit 21 for executing the virtual data output unit 31 of the data recording medium 3. The recording medium 4 stores a program executed by the computer 2, and the processing operation of each unit is realized by the computer 2 executing the program.

データ記録媒体３は仮想データ出力部３１を備えたＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｃ−ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）やＤＶＤ−ＲＯＭ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ−ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）等の記録媒体からなり、ネットワーク上の記録装置を含む。 The data recording medium 3 includes a recording medium such as a CD-ROM (Compact Disc-Read Only Memory) and a DVD-ROM (Digital Versatile Disc-Read Only Memory) having a virtual data output unit 31. Including.

仮想データ出力部３１は、図１に示す構成のうち、データを出力するために必要な部分のみを抜き出した構成を含んでいる。つまり、仮想データ出力部３１はデータ管理部３１１と、データ復元部３１２と、データ出力手段３１３とを含んでおり、データ管理部３１１はモデルデータ群３１１１と、データテーブル３１１２と、圧縮差分データ群３１１３と、データ解凍手段３１１４とからなっている。 The virtual data output unit 31 includes a configuration in which only a portion necessary for outputting data is extracted from the configuration illustrated in FIG. That is, the virtual data output unit 31 includes a data management unit 311, a data restoration unit 312, and a data output unit 313, and the data management unit 311 includes a model data group 3111, a data table 3112, and a compressed difference data group. 3113 and data decompression means 3114.

データ記録媒体３の生成時は、仮想データ出力部３１のモデルデータ群３１１１、データテーブル３１１２、圧縮差分データ群３１１３が、上述した図１に示す構成でデータを記録した状態を複製する。 When the data recording medium 3 is generated, the model data group 3111, the data table 3112, and the compression difference data group 3113 of the virtual data output unit 31 replicate the state in which data is recorded with the configuration shown in FIG.

図９は図８の計算機２がデータ記録媒体３からデータを読出す時の動作を示すフローチャートである。これら図８及び図９を参照して本発明の他の実施例による計算機２がデータ記録媒体３からデータを読出す時の動作について説明する。尚、図９に示す処理は計算機２が記録媒体４のプログラムを実行することで実現される。 FIG. 9 is a flowchart showing an operation when the computer 2 of FIG. 8 reads data from the data recording medium 3. The operation when the computer 2 according to another embodiment of the present invention reads data from the data recording medium 3 will be described with reference to FIGS. Note that the processing shown in FIG. 9 is realized by the computer 2 executing the program of the recording medium 4.

計算機２ではデータ記録媒体読み取り手段２２によってデータ記録媒体３の仮想データ出力部３１を読取り、仮想データ出力部実行手段２１へ渡す（図９ステップＳ６１）。 The computer 2 reads the virtual data output unit 31 of the data recording medium 3 by the data recording medium reading unit 22 and passes it to the virtual data output unit executing unit 21 (step S61 in FIG. 9).

仮想データ出力部実行手段２１はデータ記録媒体読み取り手段２２から渡された仮想データ出力部３１を、図１に示す構成の読出し時と同じ動作（図５ステップＳ３１〜Ｓ３７に示す動作）を実行し、目的のデータ（要求データ）を取出す（図９ステップＳ６２）。 The virtual data output unit executing means 21 executes the same operation (the operation shown in steps S31 to S37 in FIG. 5) as the virtual data output unit 31 delivered from the data recording medium reading means 22 when reading the configuration shown in FIG. The target data (request data) is taken out (step S62 in FIG. 9).

本実施例では、データ記録媒体３に記録するデータサイズを縮小しながら、モデルデータ群３１１１に含まれるデータを読出す際に、全てのデータを圧縮している媒体に比べて高速に読出すことができる。 In the present embodiment, when data included in the model data group 3111 is read while reducing the data size to be recorded on the data recording medium 3, all the data is read at a higher speed than the compressed medium. Can do.

本発明は、情報処理産業の情報記録分野において、情報システムのバックアップ用途や、異なるプラットフォームへの情報記録媒体による情報システムの配布等への利用が考えられる。 In the information recording field of the information processing industry, the present invention can be used for backup of information systems, distribution of information systems using information recording media to different platforms, and the like.

本発明の一実施例によるデータ記憶装置の構成を示すブロック図である。It is a block diagram which shows the structure of the data storage device by one Example of this invention. 図１のデータテーブルの構成を示すブロック図である。It is a block diagram which shows the structure of the data table of FIG. 本発明の一実施例によるデータ記憶装置の動作を示すフローチャートである。4 is a flowchart illustrating an operation of the data storage device according to the embodiment of the present invention. 本発明の一実施例によるデータ記憶装置の動作を示すフローチャートである。4 is a flowchart illustrating an operation of the data storage device according to the embodiment of the present invention. 本発明の一実施例によるデータ記憶装置の動作を示すフローチャートである。4 is a flowchart illustrating an operation of the data storage device according to the embodiment of the present invention. 本発明の一実施例によるデータ記憶装置の動作を示すフローチャートである。4 is a flowchart illustrating an operation of the data storage device according to the embodiment of the present invention. 本発明の一実施例によるデータ記憶装置の動作を示すフローチャートである。4 is a flowchart illustrating an operation of the data storage device according to the embodiment of the present invention. 本発明の他の実施例によるデータ記憶システムの構成を示すブロック図である。It is a block diagram which shows the structure of the data storage system by the other Example of this invention. 図８の計算機がデータ記録媒体からデータを読出す時の動作を示すフローチャートである。It is a flowchart which shows operation | movement when the computer of FIG. 8 reads data from a data recording medium.

Explanation of symbols

１データ記憶装置
２計算機
３データ記録媒体
４，１６記録媒体
１１データ管理部
１２類似度判定部
１３データ入力手段
１４，３１２データ復元部
１５，３１３データ出力手段
２１仮想データ出力部実行手段
２２データ記録媒体読取り手段
３１仮想データ出力部
１１１，３１１１モデルデータ群
１１２，３１１２データテーブル
１１３，３１１３圧縮差分データ群
１１４データ圧縮部
１１５データ解凍部
１２１非類似度
１２２差分生成部
１２３非類似度閾値
１２４閾値設定手段
１１２１入力データ名群
１１２２圧縮差分データ名群
１１２３モデルデータ名群
１１２４非類似度群
３１１４データ解凍手段
1 Data storage device
2 computers
3 Data recording media
4,16 Recording medium
11 Data management department
12 Similarity determination unit
13 Data input means 14, 312 Data restoration section 15,313 Data output means
21 Virtual data output unit execution means
22 Data recording medium reading means
31 Virtual data output unit 111, 3111 Model data group 112, 3112 Data table 113, 3113 Compression difference data group
114 Data compression unit
115 Data decompression unit
121 Dissimilarity
122 Difference generator
123 Dissimilarity threshold
124 threshold setting means
1121 Input data name group
1122 Compression difference data name group
1123 Model data name group
1124 Dissimilarity group
3114 Data decompression means

Claims

A data storage system comprising a computer including a data recording device capable of reading and writing data, wherein the means for performing compression using a similar portion between the data when recording a plurality of data in the data recording device A data storage system comprising a data recording device, wherein the total size of the plurality of data is reduced by the compression.

Means for determining the similarity between the data in the data recording device;
2. The data storage according to claim 1, wherein the compression and the decompression are omitted when any of the compression effect and the decompression effect with respect to a time required for at least one of the compression and decompression of the data is not accompanied. system.

A data storage system comprising a computer including a data recording device capable of reading and writing data,
Similarity determination means for calculating dissimilarity between input data and data belonging to a previously accumulated model data group, difference generation means for generating difference data between the input data and the model data, and the difference Data compression means for compressing the difference data received from the generation means, and a data table for storing the compressed difference data name in association with the input data name and model data name and similarity received from the similarity determination means A data storage system comprising: a data management means including: the data recording device.

When the dissimilarity exceeds the dissimilarity threshold in the calculation result of the similarity determination means, the input data name, the model data name, the calculated similarity, and the difference data generated during the calculation 4. The data storage system according to claim 3, wherein the data storage system is passed to the data management means.

5. The data storage system according to claim 4, wherein when the dissimilarity is less than the dissimilarity threshold, the difference generation process and the compression process of the data are suppressed.

The difference generation means generates the difference data by taking an exclusive OR of two or more similar data,
The data storage system according to any one of claims 3 to 5, wherein the total size of the plurality of data is reduced by compressing and recording only the difference data.

A data recording apparatus capable of reading and writing data, and having means for performing compression using a similar portion between the data when recording a plurality of data, and reducing the total size of the plurality of data by the compression A data recording apparatus.

Means for determining the degree of similarity between the data;
8. The data storage according to claim 7, wherein the compression and the decompression are omitted when either the compression effect or the decompression effect with respect to the time required for at least one of the compression and decompression of the data is not accompanied. apparatus.

A data recording device capable of reading and writing data,
Similarity determination means for calculating dissimilarity between input data and data belonging to a previously accumulated model data group, difference generation means for generating difference data between the input data and the model data, and the difference Data compression means for compressing the difference data received from the generation means, and a data table for storing the compressed difference data name in association with the input data name and model data name and similarity received from the similarity determination means And a data management unit including the data management means.

When the dissimilarity exceeds the dissimilarity threshold in the calculation result of the similarity determination means, the input data name, the model data name, the calculated similarity, and the difference data generated during the calculation The data storage device according to claim 9, wherein the data storage device passes the data management means.

11. The data storage device according to claim 10, wherein when the dissimilarity is less than the dissimilarity threshold, the data difference generation processing and compression processing are suppressed.

The difference generation means generates the difference data by taking an exclusive OR of two or more similar data,
The data storage device according to any one of claims 9 to 11, wherein the total size of the plurality of data is reduced by compressing and recording only the difference data.

A similar file recording method used in a data recording device capable of reading and writing data, comprising: a step of performing compression using a similar portion between the data when recording a plurality of data on the data recording device side A similar file recording method, wherein the compression reduces the total size of the plurality of data.

The data recording device side includes a step of determining the degree of similarity between the data,
14. The similar file according to claim 13, wherein the compression and the decompression are omitted when either the compression effect or the decompression effect on the time required for at least one of the data compression and decompression is not accompanied. Recording method.

A similar file recording method used in a data recording device capable of reading and writing data,
On the data recording device side, calculating the dissimilarity between the input data and data belonging to the model data group accumulated in advance, generating difference data between the input data and the model data, A method of recording a similar file, comprising: compressing the difference data; and storing the compressed difference data name, the input data name, the model data name, and the similarity in a data table. .

In the calculation result of the step of calculating the dissimilarity, when the dissimilarity exceeds a dissimilarity threshold, the input data name, the model data name, the calculated similarity, and the difference generated during the calculation 16. The similar file recording method according to claim 15, wherein the data is transferred to the step of storing the data in the data table.

17. The similar file recording method according to claim 16, wherein when the dissimilarity is less than the dissimilarity threshold, the difference generation process and the compression process of the data are suppressed.

The step of generating the difference data generates the difference data by taking an exclusive OR of two or more similar data,
17. The similar file recording method according to claim 15, wherein the total size of the plurality of data is reduced by compressing and recording only the difference data.

A program of a similar file recording method used in a data recording device capable of reading and writing data, and compressing using a similar portion between the data when recording a plurality of data in a computer on the data recording device side A program for executing processing and reducing the total size of the plurality of data by the compression.

A program of a similar file recording method used in a data recording device capable of reading and writing data, wherein the degree of dissimilarity between data input to a computer on the data recording device side and data belonging to a model data group stored in advance A process for calculating the difference data between the input data and the model data, a process for compressing the difference data, and the compressed difference data name, the input data name, the model data name, and the similarity A program for executing a process of storing data in a data table with association.