JPH0772859B2

JPH0772859B2 - Data compression method, data decompression method and device

Info

Publication number: JPH0772859B2
Application number: JP3339307A
Authority: JP
Inventors: 満夫水口
Original assignee: 株式会社デンサン
Priority date: 1991-11-29
Filing date: 1991-11-29
Publication date: 1995-08-02
Anticipated expiration: 2010-08-02
Also published as: JPH05150940A

Description

Detailed Description of the Invention

【０００１】[0001]

【技術分野】この発明は，データ圧縮方法および装置な
らびにデータ伸張方法および装置に関し，とくにコンピ
ュータとコンピュータとの間で通信回線を利用してデー
タを送受信（伝送）する場合に利用され，両コンピュー
タに古いデータ・ファイルが保存されており，一方のコ
ンピュータにおいて古いデータ・ファイルを更新して新
しいデータ・ファイルを作成した場合に，この新しく作
成されたデータ・ファイルを他方のコンピュータに伝送
するときに好適なデータ圧縮方法および装置ならびにデ
ータ伸張方法および装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data compression method and device, and a data decompression method and device, and is particularly used when transmitting and receiving (transmitting) data between computers using communication lines. Suitable for transmitting the newly created data file to another computer when the old data file is stored and one computer has updated the old data file to create a new data file Data compression method and device, and data decompression method and device.

【０００２】[0002]

【従来技術とその問題点】コンピュータ間での通信で
は，通信効率を向上させるためにさまざまなデータ圧縮
の技術が用いられている。これらのデータ圧縮方法は基
本的には，冗長度の高いデータや規則性を有するデータ
でなければ効率的な圧縮はできない。2. Description of the Related Art In communication between computers, various data compression techniques are used to improve communication efficiency. Basically, these data compression methods cannot perform efficient compression unless the data has a high degree of redundancy or regularity.

【０００３】従来の圧縮方法の一例を挙げると次の通り
である。The following is an example of a conventional compression method.

【０００４】同じ文字が２個以上連続した場合には，そ
の文字を２個並べ，それに続けて，上記２個の文字の後
に連続する同じ文字の個数を表わすデータを付加する。
たとえばＡＡのように２個の同じ文字が連続した場合に
はその圧縮データはＡＡ０となる。ＡＡＡの圧縮データ
はＡＡ１となり，ＡＡＡＡの圧縮データはＡＡ２とな
る。When two or more same characters are consecutive, the two characters are arranged, and subsequently, data representing the number of consecutive same characters is added after the above two characters.
For example, when two same characters are consecutive like AA, the compressed data is AA0. The compressed data of AAA becomes AA1 and the compressed data of AAAA becomes AA2.

【０００５】この方法では，ＡＡのような２文字の連続
が多く含まれている場合には，圧縮データが逆に長くな
るという問題がある。またＡＡＡのような３文字の連続
の場合には圧縮の効果がない。This method has a problem that the compressed data becomes longer in the case where many consecutive two characters such as AA are included. Also, in the case of three consecutive characters such as AAA, there is no compression effect.

【０００６】従来のデータ圧縮方法は冗長度の高いデー
タや規則性の強いデータであればその特徴を生かして高
い圧縮率を得ることができる。しかしながら，実行形式
のファイル（プログラム）のように，データとしてみる
とランダム性の強いものに対しては圧縮の効果が非常に
少ないか，または上述のようにデータ長がかえって長く
なるという問題がある。In the conventional data compression method, if the data has a high degree of redundancy or the data has a strong regularity, a high compression rate can be obtained by taking advantage of its characteristics. However, when it comes to data such as executable files (programs) that have a strong randomness, there is a problem that the compression effect is very small, or the data length becomes rather long as described above. .

【０００７】[0007]

【発明の目的，構成，作用および効果】この発明はラン
ダム性の強いデータ・ファイルに対しても，参照用のフ
ァイルさえ存在すれば高効率でデータ圧縮が可能なデー
タ圧縮方法および装置を提供することを目的とする。SUMMARY OF THE INVENTION The present invention provides a data compression method and apparatus capable of highly efficient data compression even for a data file having a strong randomness as long as a reference file exists. The purpose is to

【０００８】この発明はまた，上記のデータ圧縮方法お
よび装置によって圧縮されたデータを元のデータに復元
することのできるデータ伸張方法および装置を提供する
ことを目的とする。It is another object of the present invention to provide a data decompression method and device capable of restoring the data compressed by the above data compression method and device to the original data.

【０００９】この発明によるデータ圧縮方法は，データ
圧縮の対象となるデータ・ファイルとそれに対応する参
照ファイルとを部分的に比較して，上記参照ファイルの
一部とデータ一致率の高い任意データ長のレコードを上
記データ・ファイルにおいて捜し出し，データ一致率の
高いレコードが見付った場合には，上記データ・ファイ
ルの見付ったレコードと上記参照ファイルのそれに対応
するレコードとの排他的論理和演算を行い，かつこの排
他的論理和演算により得られるデータをデータ圧縮し，
上記データ圧縮処理により得られた圧縮データに復元用
補助データを付加することにより圧縮データ・レコード
を作成し，上記参照ファイルのいかなる部分とも一致率
が低い上記データ・ファイルのレコードについてはその
レコードに復元用補助データを付加することにより圧縮
データ・レコードを作成し，これらの圧縮データ・レコ
ードを編集して圧縮ファイルを作成するものである。In the data compression method according to the present invention, the data file to be data-compressed and the corresponding reference file are partially compared, and a part of the reference file has an arbitrary data length having a high data matching rate. When a record with a high data matching rate is found in the above data file, the exclusive OR of the record found in the data file and the corresponding record in the reference file is found. And compress the data obtained by this exclusive OR operation,
A compressed data record is created by adding decompression auxiliary data to the compressed data obtained by the data compression process, and the record of the data file having a low matching rate with any part of the reference file is added to the record. It creates compressed data records by adding auxiliary data for restoration, and edits these compressed data records to create a compressed file.

【００１０】この発明によるデータ圧縮方法は，データ
圧縮の対象となるデータ・ファイルに対応する参照ファ
イルが存在することを前提としている。ファイルの一例
としてはプログラムの実行形式のファイルを挙げること
ができる。たとえば，データ圧縮の対象となるデータ・
ファイルは新バージョンのプログラム，それに対応する
参照ファイルは旧バージョンのプログラムである。プロ
グラムのバージョン・アップは通常，プログラムの一部
を変更することにより行われるので，新バージョンのプ
ログラムの中には旧バージョンのプログラムの部分が多
く残っている。したがって，新旧バージョンのプログラ
ムをそれらの先頭から順次比較していっても必ずしも一
致しないが，比較すべき部分を先頭からずらせば100 ％
に近い率で一致する部分が多く存在する。The data compression method according to the present invention is premised on the existence of a reference file corresponding to a data file to be data-compressed. An example of the file is a program execution format file. For example, the data to be compressed
The file is the new version program, and the corresponding reference file is the old version program. Since the version upgrade of a program is usually performed by changing a part of the program, much of the old version program remains in the new version program. Therefore, even if the programs of the old and new versions are compared sequentially from the beginning, they do not necessarily match, but if the parts to be compared are shifted from the beginning, 100%
There are many parts that match at a rate close to.

【００１１】この発明は，新旧バージョンのプログラム
のように，データ圧縮の対象となるデータ・ファイルと
部分的に一致する参照ファイルの存在を利用している。The present invention utilizes the existence of a reference file that partially matches the data file to be data-compressed, such as old and new versions of the program.

【００１２】この発明によるとまず，データ圧縮の対象
となるデータ・ファイルとそれに対応する参照ファイル
とを部分的に比較して，参照ファイルの一部とデータ一
致率の高い任意データ長のレコードをデータ・ファイル
において捜し出す。According to the present invention, first, a data file to be subjected to data compression and a corresponding reference file are partially compared, and a part of the reference file and a record having an arbitrary data length having a high data matching rate are obtained. Search in data file.

【００１３】上述のように新旧バージョンのプログラム
には100 ％に近い率で一致する部分が含まれているが，
他の部分は殆ど一致しない。このように参照ファイルの
一部ときわめて類似する（または殆ど同一）の部分がデ
ータ・ファイルにおいて見付かる場合があり，そうでな
ければ殆ど一致していないという場合には，一致率はき
わめて高い領域（100 ％の付近）ときわめて低い領域
（０％の付近）に分かれて分布する。したがって，その
中間にしきい値を設けておけばデータ一致率が高いかど
うかを判別することはきわめて容易である。データ一致
率が０％から100％まで連続している場合には，適当に
しきい値（たとえば50％）を定めておき，このしきい値
を用いてデータ一致率が高いか低いかを判定することが
できる。データ一致率が高い，低いというのは相対的な
概念であるが，この発明を実施するアプリケーションご
とにしきい値を定め，このしきい値を用いて高，低を弁
別すればよい。As described above, the old and new versions of the program include parts that match at a rate close to 100%.
The other parts hardly match. In this way, a portion of the reference file that is very similar (or nearly identical) to the portion of the reference file may be found, and if it is otherwise almost unmatched, the matching rate is extremely high ( It is divided into 100%) and extremely low area (0%). Therefore, if a threshold value is set in the middle of it, it is extremely easy to determine whether the data matching rate is high. When the data matching rate is continuous from 0% to 100%, a threshold value (for example, 50%) is set appropriately, and it is determined whether the data matching rate is high or low using this threshold value. be able to. It is a relative concept that the data matching rate is high or low, but a threshold value may be set for each application for carrying out the present invention, and the threshold value may be used to discriminate between high and low.

【００１４】また，この発明においてレコードとは任意
長の連続したデータの集まりを意味し，上述したデータ
一致率の高いファイル部分を捜し出す処理において見付
け出されたファイルの部分，一致率が低いと判定された
ファイル部分，およびそれらに圧縮処理を施して得られ
る（復元用補助データを含む）データの集まりをさす。Further, in the present invention, a record means a collection of continuous data of arbitrary length, and it is determined that the file portion found in the process of searching for a file portion having a high data matching rate, the matching rate is low. The file parts that have been created, and the collection of data (including auxiliary data for decompression) obtained by performing compression processing on them.

【００１５】続いて，この発明によるとデータ一致率の
高いレコードが見付った場合には，上記データ・ファイ
ルの見付ったレコードと上記参照ファイルのそれに対応
するレコードとの排他的論理和演算を行う。データ一致
率の高い２つのレコードの排他的論理和演算により得ら
れるデータ列はワード０の連続を多く含み，データ圧縮
に適したものとなる。Then, according to the present invention, when a record having a high data matching rate is found, the exclusive OR of the found record of the data file and the corresponding record of the reference file. Calculate. The data string obtained by the exclusive OR operation of two records having a high data coincidence rate includes many consecutive word 0s, and is suitable for data compression.

【００１６】この排他的論理和演算により得られるデー
タをデータ圧縮する。データ圧縮の手法としては上述し
た従来の手法を用いてもよいし，後に詳述するこの発明
によるデータ圧縮の手法を用いることもできる。このデ
ータ圧縮処理により得られた圧縮データに復元用補助デ
ータを付加することにより圧縮データ・レコードを作成
する。The data obtained by this exclusive OR operation is data compressed. As the data compression method, the above-described conventional method may be used, or the data compression method according to the present invention described in detail later may be used. A compressed data record is created by adding decompression auxiliary data to the compressed data obtained by this data compression processing.

【００１７】一方，参照ファイルのいかなる部分とも一
致率が低いと判定されたデータ・ファイルのレコードに
ついては圧縮処理を施すことなく，そのレコードに復元
用補助データを単に付加することにより圧縮データ・レ
コードを作成する。この場合には復元用補助データの付
加によりデータ長がかえって長くなるが，この明細書で
は用語の統一のために，この処理により得られるデータ
についても圧縮データ・レコードという用語を用いるこ
ととする。On the other hand, the record of the data file determined to have a low matching rate with any part of the reference file is not subjected to the compression process, but the auxiliary data for restoration is simply added to the record to obtain the compressed data record. To create. In this case, the data length is rather lengthened by the addition of the auxiliary data for restoration, but in this specification, the term "compressed data record" is also used for the data obtained by this process for the purpose of unifying the terms.

【００１８】上述した圧縮処理を含む圧縮データ・レコ
ードの作成と圧縮処理を含まない圧縮データ・レコード
の作成とはデータ・ファイルを構成するすべてのレコー
ドについて行われる。このようにして得られた圧縮デー
タ・レコードを編集して圧縮ファイルを作成する。ここ
で編集とは，一般には圧縮データ・レコードを単に一定
の順番に（たとえば圧縮データ・レコード番号の順に）
並べることを意味するであろう。また編集処理は一般に
は圧縮データ・レコード作成処理と並行して実行される
であろう。The creation of the compressed data record including the compression processing and the creation of the compressed data record not including the compression processing described above are performed for all the records constituting the data file. The compressed data record thus obtained is edited to create a compressed file. Editing here generally means that compressed data records are simply arranged in a certain order (for example, in the order of compressed data record numbers).
It means to line up. Also, the editing process will generally be performed in parallel with the compressed data record creation process.

【００１９】以上のようにしてこの発明によると，参照
ファイルの一部とデータ一致率の高いレコードをデータ
・ファイルにおいて検索し，これらの一致率の高いレコ
ード間の排他的論理和演算を行っているので，ランダム
性の強いデータであっても冗長度の高いレコードに変換
することができる。このようにして圧縮処理に適したレ
コードが得られるので，高効率のデータ圧縮が可能とな
る。この発明によると，一般的には数分の一程度まで圧
縮が可能であり，条件がよければ１／10以下のサイズに
まで圧縮が可能である。発明者が実際に行った結果では
最高７％にまで圧縮することができた。As described above, according to the present invention, a record having a high data matching rate with a part of the reference file is searched in the data file, and the exclusive OR operation is performed between the records having a high matching rate. Therefore, even data with strong randomness can be converted into records with high redundancy. In this way, a record suitable for compression processing is obtained, so that highly efficient data compression is possible. According to the present invention, generally, compression to a fraction of a fraction is possible, and if conditions are good, compression to a size of 1/10 or less is possible. According to the result actually carried out by the inventor, the compression was possible up to 7% at the maximum.

【００２０】データ・ファイルを伝送する場合には，こ
の発明による圧縮方法にしたがって圧縮されたファイル
を送信することにより，通信時間を大幅に短縮すること
ができる。When transmitting a data file, the communication time can be greatly reduced by transmitting the file compressed according to the compression method of the present invention.

【００２１】この発明の一実施態様においては，データ
・ファイルまたはその一部が参照ファイルを用いること
なくそれ自体でデータ圧縮が可能かどうかをまず判断す
る。そして可能であればデータ・ファイルまたはその一
部をそれ自体でデータ圧縮する。可能でなければ上述し
た参照ファイルを利用したデータ圧縮方法を実行する。In one embodiment of the present invention, it is first determined whether a data file or a portion thereof is capable of data compression by itself without the use of a reference file. Then, if possible, the data file or part of it is data compressed by itself. If it is not possible, the data compression method using the above-mentioned reference file is executed.

【００２２】データ・ファイルまたはその一部がそれ自
体でデータ圧縮可能かどうかは，データ・ファイルに含
まれるデータの冗長度が高いかどうか，規則性があるか
どうかの観点から判断することができる。この場合にも
データ圧縮可能性についての一定の基準を設けておき，
その基準にしたがって判断することが好ましい。Whether or not the data file or a part thereof can be compressed by itself can be judged from the viewpoint of whether the data contained in the data file has high redundancy or whether it has regularity. . Even in this case, a certain standard for data compressibility should be set,
It is preferable to judge according to the standard.

【００２３】上述した復元用補助データには具体的に
は，レコード番号，データ・ファイルにおける一致率の
高いレコードの位置を示すデータ（オフセット），参照
ファイルにおける一致率の高いレコードの位置を示すデ
ータ（オフセット），参照ファイルを用いた圧縮処理を
行っているかどうかを示すデータ，データ圧縮されてい
るかどうかを示すデータ，データ・サイズを表わすデー
タ，およびデータ圧縮処理に関するデータ等が含まれよ
う。Specifically, the above-mentioned restoration auxiliary data includes a record number, data indicating a position of a record having a high matching rate in a data file (offset), and data indicating a position of a record having a high matching rate in a reference file. (Offset), data indicating whether compression processing using a reference file is performed, data indicating whether data is compressed, data indicating data size, data relating to data compression processing, etc. may be included.

【００２４】この発明はまた，上述のようにしてデータ
圧縮により作成された圧縮ファイルからデータ・ファイ
ルを復元するデータ伸張方法を提供している。このデー
タ伸張方法においては，データ圧縮方法において用いら
れたものと同じ参照ファイルが利用される。圧縮ファイ
ルの伸張は上述したデータ圧縮処理の逆の手順で行えば
よい。The present invention also provides a data decompression method for restoring a data file from a compressed file created by data compression as described above. In this data decompression method, the same reference file as that used in the data compression method is used. Decompression of the compressed file may be performed in the reverse order of the data compression processing described above.

【００２５】この発明によるデータ伸張方法は，圧縮フ
ァイルを圧縮データ・レコードごとにその復元用補助デ
ータを参照して伸張処理が必要かどうかを判定し，伸張
処理が必要であると判定した場合には，その圧縮データ
・レコードをデータ伸張し，データ伸張されたレコード
と，参照ファイルにおける対応するレコードとの間で排
他的論理和演算を行ってデータ・レコードを作成し，上
記処理により作成されたデータ・レコードと，伸張処理
不要な圧縮データ・レコードに含まれているデータ・レ
コードとを復元用補助データを参照して編集することに
よりデータ・ファイルを復元するものである。The data decompression method according to the present invention determines whether or not decompression processing is required by referring to the decompression auxiliary data for each compressed data record of a compressed file and determining that decompression processing is necessary. Is decompressed from the compressed data record, and the exclusive-OR operation is performed between the decompressed record and the corresponding record in the reference file to create a data record. The data file is restored by editing the data record and the data record contained in the compressed data record that does not require decompression processing by referring to the auxiliary data for restoration.

【００２６】このようにして元のデータ・ファイルが復
元されるから復元されたデータ・ファイルを利用するこ
とが可能となる。Since the original data file is restored in this way, the restored data file can be used.

【００２７】この発明によるデータ伸張方法の一実施態
様においては，データ・ファイルまたはその一部がそれ
自体でデータ圧縮されている場合には，参照ファイルを
用いることなく，上記圧縮ファイルまたはその一部をデ
ータ伸張してデータ・ファイルを復元する。この伸張方
法は，上述した参照ファイルを用いることなくデータ・
ファイルまたはその一部それ自体でデータ圧縮する方法
に対応するものである。In one embodiment of the data decompression method according to the present invention, when the data file or a part thereof is data-compressed by itself, the compressed file or a part thereof is used without using a reference file. To decompress and restore the data file. This decompression method is used for data decompression without using the above-mentioned reference file.
It corresponds to the method of data compression in the file or a part thereof.

【００２８】この発明はさらにデータ伝送方法を提供し
ている。The present invention further provides a data transmission method.

【００２９】このデータ伝送方法が適用されるシステム
においては，圧縮データを送信する送信装置と，この送
信装置から送信された圧縮データを受信する受信装置と
がともに参照ファイルを保持していることを前提とす
る。送信装置においては，上述したデータ圧縮方法にし
たがってデータ圧縮し，この処理により得られた圧縮デ
ータを受信装置に送信する。また受信装置においては，
受信したデータを上述した伸張方法にしたがってデータ
伸張して元のデータに復元する。In a system to which this data transmission method is applied, it is necessary that both the transmitting device that transmits the compressed data and the receiving device that receives the compressed data transmitted from this transmitting device hold the reference file. Assumption. The transmitter compresses the data according to the above-described data compression method, and transmits the compressed data obtained by this processing to the receiver. In the receiving device,
The received data is decompressed according to the decompression method described above to restore the original data.

【００３０】この発明はさらにデータ圧縮装置およびデ
ータ伸張装置を提供している。これらのデータ圧縮およ
び伸張装置は上述したデータ圧縮および伸張方法にそれ
ぞれ対応するものである。The present invention further provides a data compression device and a data decompression device. These data compression and decompression devices correspond to the above-described data compression and decompression methods, respectively.

【００３１】この発明によるデータ圧縮装置は，データ
圧縮の対象となるデータ・ファイルとそれに対応する参
照ファイルとを部分的に比較して，上記参照ファイルの
一部とデータ一致率の高い任意データ長のレコードを上
記データ・ファイルにおいて検索する手段，上記検索手
段による検索によってデータ一致率の高いレコードが見
付った場合には，上記データ・ファイルの見付ったレコ
ードと上記参照ファイルのそれに対応するレコードとの
排他的論理和演算を行い，かつこの排他的論理和演算に
より得られるデータをデータ圧縮するデータ圧縮手段，
および上記データ圧縮手段により得られた圧縮データに
復元用補助データを付加することにより圧縮データ・レ
コードを作成し，上記参照ファイルのいかなる部分とも
一致率が低い上記データ・ファイルのレコードについて
はそのレコードに復元用補助データを付加することによ
り圧縮データ・レコードを作成し，これらの圧縮データ
・レコードを編集して圧縮ファイルを作成する編集手段
を備えている。The data compression apparatus according to the present invention partially compares a data file to be data-compressed with a reference file corresponding to the data file, and compares a part of the reference file with an arbitrary data length having a high data matching rate. Means for searching the record in the data file, and when a record having a high data matching rate is found by the search by the searching means, the record found in the data file corresponds to that in the reference file. Data compression means for performing an exclusive OR operation with the record to be recorded and for compressing the data obtained by the exclusive OR operation,
And a compressed data record is created by adding decompression auxiliary data to the compressed data obtained by the data compression means, and the record of the data file having a low matching rate with any part of the reference file is the record. A compressed data record is created by adding decompression auxiliary data to, and editing means is provided for editing these compressed data records to create a compressed file.

【００３２】この発明によるデータ伸張装置は，圧縮フ
ァイルを圧縮データ・レコードごとにその復元用補助デ
ータを参照して伸張処理が必要かどうかを判定する手
段，上記判定手段によって伸張処理が必要であると判定
された場合には，その圧縮データ・レコードをデータ伸
張し，データ伸張されたレコードと，参照ファイルにお
ける対応するレコードとの間で排他的論理和を演算して
データ・レコードを作成するデータ伸張手段，および上
記データ伸張手段により作成されたデータ・レコード
と，伸張処理不要な圧縮データ・レコードに含まれてい
るデータ・レコードとを復元用補助データを参照して編
集することによりデータ・ファイルを復元する編集手段
を備えている。The data decompression device according to the present invention requires a decompression process by the above-mentioned judging means for deciding whether or not the decompression process is necessary by referring to the decompression auxiliary data for each compressed data record of the compressed file. If it is determined that the compressed data record is decompressed, the data is decompressed and the exclusive-OR operation is performed between the decompressed record and the corresponding record in the reference file to create a data record. A data file by editing the decompressing means and the data record created by the data decompressing means and the data record included in the compressed data record that does not require decompression processing by referring to the auxiliary data for restoration. Is provided with an editing means for restoring.

【００３３】[0033]

【００３４】[0034]

【００３５】[0035]

【００３６】[0036]

【００３７】[0037]

【００３８】この発明はさらに，データ圧縮の対象とな
るデータ・ファイルとそれに対応する参照ファイルとを
部分的に比較して，参照ファイルの一部とデータ一致率
の高い任意データ長のレコードを上記データ・ファイル
において見付け出す方法を提供している。Further, the present invention partially compares the data file to be data-compressed and the corresponding reference file, and records a part of the reference file and a record having an arbitrary data length having a high data matching rate. It provides a way to find it in a data file.

【００３９】この発明による類似するレコードを見付け
出す方法は，データ・ファイルから第１の所定長の第１
の部分データを取出し，参照ファイルから上記第１の所
定長の第１の部分データを取出し，これらの第１の部分
データを取出す位置を少なくとも参照ファイルにおいて
所定バイトずつシフトしながら，データ・ファイルおよ
び参照ファイルからそれぞれ取出した上記第１の部分デ
ータを相互に比較して，一致率の高い第１の部分データ
があるかどうかを調査し，一致率の高い第１の部分デー
タが見付ったときに上記第１の部分データの取出しのた
めのシフト量を固定し，固定した取出し位置の近傍にお
いて上記第１の所定長よりも短い第２の所定長の第２の
部分データを上記データ・ファイルおよび上記参照ファ
イルからそれぞれ取出し，それらの取出し位置を上記参
照ファイルおよびデータ・ファイルにおいて所定バイト
ずつシフトしながら，上記データ・ファイルおよび参照
ファイルからそれぞれ取出した上記第２の部分データを
相互に比較することにより，一致率の高いレコードの範
囲を上記データ・ファイルおよび参照ファイルにおいて
決定するものである。A method of finding similar records according to the present invention is a method of finding a first record of a first predetermined length from a data file.
Of the first partial data of the first predetermined length from the reference file and shifting the position of extracting the first partial data of at least a predetermined byte in the reference file, By comparing the above-mentioned first partial data extracted from the reference files with each other, it was investigated whether or not there was the first partial data with a high matching rate, and the first partial data with a high matching rate was found. Sometimes, the shift amount for taking out the first partial data is fixed, and the second partial data having a second predetermined length shorter than the first predetermined length is provided in the vicinity of the fixed taking-out position. Extract from the file and the reference file respectively, and shift their extraction positions by a specified number of bytes in the reference file and the data file. , By comparing the second partial data taken out from each of the data files and the reference files to each other, the range of the high concordance rate records are those determined in the data file and the reference file.

【００４０】この発明による類似するレコードを見付け
出す方法の一実施態様においては，上記データ・ファイ
ルから固定データ長の１ブロック・データを取出し，こ
の取出した１ブロック・データについて，上記参照ファ
イルにおける上記第１の部分データの取出し位置をシフ
トしながら上記調査処理を行い，一致率の高い第１の部
分データが見付からない場合に，上記データ・ファイル
から取出すべき１ブロック・データの位置を１ブロック
長分シフトして上記調査処理を繰返す。In one embodiment of the method for finding similar records according to the present invention, one block data having a fixed data length is extracted from the data file, and the extracted one block data is stored in the reference file. When the above investigation processing is performed while shifting the extraction position of the first partial data, and the first partial data with a high matching rate is not found, the position of one block data to be extracted from the above data file is set to one block length. After shifting by a minute, the above-mentioned investigation process is repeated.

【００４１】この発明によると，データ・ファイルと参
照ファイルとの部分的な比較を２段階にわたって行って
いる。第１段階では，データ・ファイルおよび参照ファ
イルから比較的長いデータ長をもつ第１の部分データを
抽出して大雑把に比較している。この第１段階の比較処
理において一致率が所定値以上の部分が見付かれば，次
に第２段階に進む。第２段階においては，第１段階で見
付かった一致率の高い第１の部分データの抽出位置を固
定し，その固定した位置の近傍において比較的短いデー
タ長をもつ第２の部分データをデータ・ファイルおよび
参照ファイルから抽出し，これらの第２の部分データを
所定バイトずつ別個にまたは一緒にシフトしながらより
詳細に比較処理を行うことにより，最終的に一致率の高
い任意長のレコードの範囲をデータ・ファイルおよび参
照ファイルにおいて決定している。According to the present invention, the partial comparison between the data file and the reference file is performed in two steps. In the first stage, the first partial data having a relatively long data length is extracted from the data file and the reference file and roughly compared. In the comparison process of the first stage, if a portion with a matching rate equal to or higher than a predetermined value is found, the process proceeds to the second stage. In the second stage, the extraction position of the first partial data having a high coincidence rate found in the first stage is fixed, and the second partial data having a relatively short data length in the vicinity of the fixed position is converted into the data. A range of records of arbitrary length with a high coincidence rate by extracting from the file and the reference file and performing more detailed comparison processing while shifting the second partial data separately or together by predetermined bytes. In the data file and the reference file.

【００４２】このように，この発明によると，第１段階
においてデータ・ファイルの一部分と参照ファイルの一
部分とを大雑把に比較し，類似する部分が見付かったの
ちに第２段階の詳細な比較処理に進むようにしているの
で，最初から詳細な比較処理を行う場合に比べて，はる
かに短時間で類似するレコードをデータ・ファイルと参
照ファイルにおいて見付け出すことができる。これによ
り上述したデータ圧縮方法の実用化が可能となる。Thus, according to the present invention, a part of the data file and a part of the reference file are roughly compared in the first step, and after a similar part is found, a detailed comparison process in the second step is performed. By proceeding, similar records can be found in the data file and the reference file in a much shorter time than if the detailed comparison process were performed from the beginning. This makes it possible to put the data compression method described above into practical use.

【００４３】このようにして，参照ファイルの一部とデ
ータ一致率の高いレコードがデータ・ファイルにおいて
見付かると，上述のようにデータ・ファイルの見付かっ
たレコードと参照ファイルのそれに対応するレコードと
の排他的論理和が演算される。この排他的論理和演算結
果は連続するワード０を多く含むデータ列となる。In this way, when a record having a high data matching rate with a part of the reference file is found in the data file, the record found in the data file and the record corresponding to it in the reference file are excluded as described above. Logical OR is calculated. The result of the exclusive OR operation is a data string including many consecutive words 0.

【００４４】[0044]

【００４５】[0045]

【００４６】[0046]

【００４７】[0047]

【００４８】[0048]

【００４９】[0049]

【００５０】[0050]

【００５１】[0051]

【００５２】[0052]

【００５３】[0053]

【００５４】[0054]

【００５５】[0055]

【００５６】[0056]

[Explanation of Examples]

(1) 全体的な処理の概要図１に示すように送信装置10から受信装置20にデータを
送信する場合を想定する。これらの送，受信装置10，20
はたとえばコンピュータ・システムであり，送信される
データはプログラムである。(1) Overview of Overall Processing It is assumed that the transmitting device 10 transmits data to the receiving device 20 as shown in FIG. These sending and receiving devices 10, 20
Is a computer system, for example, and the data transmitted is a program.

【００５７】送，受信装置10，20はともに同一の参照フ
ァイルＦＳを有している。参照ファイルＦＳはたとえば
これらの装置10，20が実行する旧バージョンのプログラ
ムである。Both the sending and receiving devices 10 and 20 have the same reference file FS. The reference file FS is, for example, an old version program executed by these devices 10 and 20.

【００５８】送信装置10において旧バージョンのプログ
ラムをバージョン・アップすることにより新バージョン
のプログラムが作成される。この新バージョン・プログ
ラムが受信装置20に送信されるべきデータ・ファイルＦ
Ｄである。受信装置20は新バージョン・プログラムを受
信すると，この新バージョン・プログラムによって旧バ
ージョン・プログラムに置きかえることができる。A new version program is created by upgrading the old version program in the transmitter 10. Data file F that this new version program should be sent to the receiver 20
It is D. When the receiving device 20 receives the new version program, the new version program can replace the old version program.

【００５９】新バージョン・プログラムであるデータ・
ファイルＦＤを受信装置20に伝送するために，送信装置
10において参照ファイルＦＳを利用してデータ・ファイ
ルＦＤのデータ圧縮処理が実行され，圧縮ファイルＦＣ
が作成される。圧縮ファイルＦＣは有線（たとえばディ
ジタル回線）または無線で受信装置20に伝送される。Data, which is the new version program
In order to transmit the file FD to the receiving device 20, the transmitting device
In 10, the data compression processing of the data file FD is executed using the reference file FS, and the compressed file FC
Is created. The compressed file FC is transmitted to the receiving device 20 by wire (for example, digital line) or wirelessly.

【００６０】送信装置10から送られた圧縮ファイルＦＣ
を受信すると，受信装置20はそれが保存している参照フ
ァイルＦＳを利用して圧縮ファイルＦＣのデータ伸張処
理を実行し，データ・ファイルＦＤを復元する。Compressed file FC sent from the transmitter 10
When receiving the data, the receiving device 20 executes the data decompression process of the compressed file FC using the reference file FS stored therein, and restores the data file FD.

【００６１】図２は送信装置10において実行される参照
ファイルＦＳを利用したデータ・ファイルＦＤの圧縮処
理の様子を示している。FIG. 2 shows a state of compression processing of the data file FD using the reference file FS executed by the transmission device 10.

【００６２】データ・ファイルＦＤと参照ファイルＦＳ
とが部分的に比較され，データ一致率の高い任意データ
長のレコードが探し出される。図２に示す例では，デー
タ・ファイルＦＤ中のレコードＤＲ₂，ＤＲ₄，ＤＲ₅
がそれぞれ参照ファイルＦＳのレコードＳＲ₂，Ｓ
Ｒ₄，ＳＲ₅と類似しており，データ一致率が高い（以
下，単に類似するという）と判断されたものとする。レ
コードＤＲ₂とＳＲ₂は同じデータ長をもつ。同じよう
にレコードＤＲ₄とＳＲ₄，ＤＲ₅とＳＲ₅もそれぞれ
同じデータ長のものである。レコードＤＲ₂とＤＲ₄と
ＤＲ₅のデータ長は同じ場合もあるし，異なる場合もあ
る。一般にはレコードＤＲ₂とＤＲ₄とＤＲ₅のデータ
長は異なるであろう。Data file FD and reference file FS
And are partially compared, and a record with an arbitrary data length with a high data matching rate is searched for. In the example shown in FIG. 2, the records DR ₂ , DR ₄ , DR ₅ in the data file FD are
Are records SR ₂ and S of the reference file FS, respectively.
It is assumed that it is similar to R ₄ and SR ₅ , and the data matching rate is high (hereinafter, simply referred to as “similar”). The records DR ₂ and SR ₂ have the same data length. Similarly, the records DR ₄ and SR ₄ , and DR ₅ and SR ₅ have the same data length. The data lengths of the records DR ₂ , DR _4, and DR ₅ may be the same or different. Generally, the data lengths of the records DR ₂ , DR ₄ and DR ₅ will be different.

【００６３】参照ファイルＦＳは上述のように旧バージ
ョンのプログラムであり，データ・ファイルＦＤはこの
旧バージョン・プログラムの一部を改良して作成された
新バージョンのプログラムである。したがって，参照フ
ァイルＦＳのうち，修正が加えられなかった部分はその
ままの形でデータ・ファイルＦＤの一部を構成してい
る。このように，データ・ファイルＦＤと参照ファイル
ＦＳは，相互に殆ど一致する多くの部分（レコード）を
もっているので，相互に類似するレコードＤＲ₂とＳＲ
₂，ＤＲ₄とＳＲ₄，ＤＲ₅とＳＲ₅等を抽出すること
ができる。The reference file FS is an old version program as described above, and the data file FD is a new version program created by improving a part of the old version program. Therefore, the part of the reference file FS that has not been modified constitutes a part of the data file FD as it is. In this way, the data file FD and the reference file FS have many parts (records) that are almost identical to each other, so that the records DR ₂ and SR that are similar to each other are similar.
₂ , DR ₄ and SR ₄ , DR ₅ and SR _5, etc. can be extracted.

【００６４】データ・ファイルＦＤにおいて，レコード
ＤＲ₁，ＤＲ₃，ＤＲ_n等はそれらとデータ一致率の高
い（類似する）部分を参照ファイルＦＳ中に見付けるこ
とができなかったものである。プログラムのバージョン
・アップにおいて新たに書き加えられたルーチンや完全
に書き直されたプログラム部分等がこれらのレコードＤ
Ｒ₁，ＤＲ₃，ＤＲ_n等に相当するであろう。In the data file FD, the records DR ₁ , DR ₃ , DR _n, etc. cannot be found in the reference file FS in the portion having a high (similar) data matching rate. These newly added routines and completely rewritten program parts in the program version upgrade are recorded in these records D.
It would correspond to R ₁ , DR ₃ , DR _n, etc.

【００６５】続いて，相互に類似すると判定されたデー
タ・ファイルＦＤのレコードと参照ファイルＦＳの対応
するレコードとの間で排他的論理和（以下，ＸＯＲとい
う）が演算され，この演算結果がデータ圧縮され，さら
にこの圧縮データと後述する復元用補助コードを用いて
圧縮データ・レコードが作成され，圧縮ファイルＦＣに
書込まれる。Subsequently, an exclusive OR (hereinafter referred to as XOR) is calculated between the record of the data file FD and the corresponding record of the reference file FS which are determined to be similar to each other, and the calculation result is the data. After being compressed, a compressed data record is created using this compressed data and a decompression auxiliary code to be described later, and written into the compressed file FC.

【００６６】データ・ファイルＦＤのレコードＤＲ₂と
これに類似する参照ファイルＦＳのレコードＳＲ₂との
ＸＯＲが演算されて中間データＸＲＤ₂が作成される。
レコードＤＲ₂とＳＲ₂とは相互に一致するデータを多
く含んでいるのでそれらのＸＯＲ演算結果である中間デ
ータＸＲＤ₂は０を多く含むデータとなる。この中間デ
ータＸＲＤ₂は冗長度が高いのでデータ圧縮に適してい
る。データ圧縮処理の手法としては公知のものを採用す
ることもできるが，後に示す手法を用いることが好まし
い。データ圧縮処理により生成された圧縮データは後述
するフォーマットにしたがって復元用補助コードととも
に編集されて圧縮データ・レコードｄｒ₂となる。XOR is performed between the record DR ₂ of the data file FD and the record SR ₂ of the reference file FS similar to the record DR ₂ to create the intermediate data XRD ₂ .
Since the records DR ₂ and SR ₂ contain a lot of data that match each other, the intermediate data XRD ₂ that is the XOR operation result of them is data containing a lot of 0s. This intermediate data XRD ₂ has high redundancy and is suitable for data compression. A known method can be adopted as the data compression processing method, but the method described later is preferably used. The compressed data generated by the data compression processing is edited together with the decompression auxiliary code according to the format described later to become the compressed data record dr ₂ .

【００６７】同じように，レコードＤＲ₄とＳＲ₄との
ＸＯＲ演算により中間データＸＲＤ₄が作成され，レコ
ードＤＲ₅とＳＲ₅とのＸＯＲ演算により中間データＸ
ＲＤ₅が作成される。これらの中間データＸＲＤ₄，Ｘ
ＲＤ₅がそれぞれ圧縮処理され，圧縮データ・レコード
・フォーマットにしたがって圧縮データ・レコードｄｒ
₄，ｄｒ₅となる。Similarly, the intermediate data XRD ₄ is created by the XOR operation of the records DR ₄ and SR _4, and the intermediate data XRD is made by the XOR operation of the records DR ₅ and SR _5.
RD ₅ is created. These intermediate data XRD ₄ , X
Each RD ₅ is compressed and compressed data record dr according to the compressed data record format.
₄ and dr ₅ .

【００６８】参照ファイルＦＳ中に類似する部分を見付
けることができなかったデータ・レコードＤＲ₁，ＤＲ
₃，ＤＲ_n等については圧縮処理ができないので，その
ままの形で復元用補助コードとともに圧縮データ・レコ
ード・フォーマットにしたがって編集され，圧縮データ
・レコードｄｒ₁，ｄｒ₃，ｄｒ_n等となる。これらの
レコードｄｒ₁，ｄｒ₃，ｄｒ_nは元のレコードＤ
Ｒ₁，ＤＲ₃，ＤＲ_nよりも復元用補助コードの分だけ
データ・サイズが大きくなっているが，ここでは用語の
統一のために圧縮データ・レコードと呼ぶことにする。Data records DR ₁ , DR for which similar parts could not be found in the reference file FS
_Since compression processing cannot be performed on ₃ , DR _n, etc., they are directly edited together with the decompression auxiliary code in accordance with the compressed data record format to become compressed data records dr ₁ , dr ₃ , dr _n, etc. These records dr ₁ , dr ₃ and dr _n are the original records D
Although R _1, DR _3, an amount corresponding data size of the restoration for the supplementary code than DR _n is greater, it will be referred to herein as the compressed data records for unification of terminology.

【００６９】このようにして作成された圧縮データ・レ
コードｄｒ₁，ｄｒ₂，…，ｄｒ_nが一定の順序（たと
えば後述するレコードNO. の順）に並べられることによ
り圧縮ファイルＦＣが得られる。実際の処理においては
圧縮データ・レコードの作成ごとに作成された圧縮デー
タ・レコードが圧縮ファイルＦＣ内に配列されていくで
あろう。A compressed file FC is obtained by arranging the compressed data records dr ₁ , dr ₂ , ..., Dr _n created in this way in a fixed order (for example, the order of record NO. Described later). In the actual process, the compressed data record created every time the compressed data record is created will be arranged in the compressed file FC.

【００７０】圧縮ファイルＦＣは元のデータ・ファイル
ＦＤに比べると，全体として数分の一から１／10程度，
またはそれ以上にデータ圧縮されている。圧縮ファイル
ＦＣには圧縮データ・レコードｄｒ₁，ｄｒ₃，ｄｒ_n
のようにデータ圧縮されていないレコードも含まれてい
るが，その数は比較的少なく，かつレコードｄｒ₂，ｄ
ｒ₄，ｄｒ₅のようにデータ圧縮されているものの圧縮
率が高いので，全体としてみた場合にもかなり高い圧縮
率を得ることができる。Compared to the original data file FD, the compressed file FC generally has a fraction of 1 to 1/10,
Data is compressed more than that. The compressed file FC has compressed data records dr ₁ , dr ₃ , dr _n.
, The records are not compressed, but the number is relatively small, and the records dr ₂ , d
Although the data is compressed like r ₄ and dr ₅ , the compression rate is high, so that a considerably high compression rate can be obtained even when viewed as a whole.

【００７１】図３は受信装置20において実行される参照
ファイルＦＳを利用した圧縮ファイルＦＣの伸張処理の
様子を示している。FIG. 3 shows a state of decompression processing of the compressed file FC using the reference file FS executed by the receiving device 20.

【００７２】伸張処理は上述した圧縮処理の逆の手順で
行われる。参照ファイル中のレコードを利用して圧縮さ
れた圧縮データ・レコードｄｒ₂，ｄｒ₄，ｄｒ₅等に
ついては，まず伸張処理により中間データＸＲＤ₂，Ｘ
ＲＤ₄，ＸＲＤ₅に変換される。これらの中間データＸ
ＲＤ₂，ＸＲＤ₄，ＸＲＤ₅と参照ファイルＦＳの対応
するレコードＳＲ₂，ＳＲ₄，ＳＲ₅とのＸＯＲ演算に
より，データ・レコードＤＲ₂，ＤＲ₄，ＤＲ₅がそれ
ぞれ復元される。The decompression process is performed in the reverse order of the compression process described above. For the compressed data records dr ₂ , dr ₄ , dr _5, etc. compressed using the records in the reference file, first, the intermediate data XRD ₂ , X ₅ is decompressed.
Converted to RD ₄ and XRD ₅ . These intermediate data X
The XOR operation between RD _2, corresponding record SR ₂ of XRD _4, XRD ₅ reference file FS, SR _4, SR _5, data records DR _2, DR _4, DR ₅ is restored, respectively.

【００７３】圧縮処理の施されていない圧縮データ・レ
コードｄｒ₁，ｄｒ₃，ｄｒ_nについてはそれらから復
元用補助コードが除去されることにより，元のデータ・
レコードＤＲ₁，ＤＲ₃，ＤＲ_nが得られる。With respect to the compressed data records dr ₁ , dr ₃ and dr _n which have not been subjected to the compression processing, the decompressing auxiliary code is removed from them, so that the original data
Records DR ₁ , DR ₃ and DR _n are obtained.

【００７４】このようにして復元されたレコードＤ
Ｒ₂，ＤＲ₄，ＤＲ₅，ＤＲ₁，ＤＲ₃，ＤＲ_nが元の
順序に配列されれば，最終的にデータ・ファイルＦＤが
復元されたことになる。Record D restored in this way
If R ₂ , DR ₄ , DR ₅ , DR ₁ , DR ₃ , and DR _n are arranged in the original order, the data file FD is finally restored.

【００７５】(2) 類似レコードの検索処理データ・ファイルＦＤと参照ファイルＦＳとを部分的に
比較して，参照ファイルＦＳの一部と類似するレコード
をデータ・ファイルＦＤにおいて捜し出す処理について
説明する。(2) Retrieval Process for Similar Records A process for partially comparing the data file FD and the reference file FS and finding a record similar to a part of the reference file FS in the data file FD will be described.

【００７６】図４はデータ・ファイルＦＤおよび参照フ
ァイルＦＳの先頭からそれぞれ１ブロック（たとえば10
24バイト）を取出した様子を示すものである。FIG. 4 shows one block (eg, 10 blocks) from the beginning of the data file FD and the reference file FS.
(24 bytes).

【００７７】上述したように，データ・ファイルＦＤの
レコードＤＲ₂と参照ファイルＦＳのレコードＳＲ₂と
が類似している。レコードＤＲ₂はデータ・ファイルＦ
Ｄの先頭位置Ａからデータ長でＯＦＦＤ₂後方に進んだ
位置Ｂから始まり，位置Ｃで終る。レコードＳＲ₂は参
照ファイルＦＳの先頭位置Ｄから長さＯＦＦＳ₂の位置
Ｅから始まり，位置Ｆまで続く。上述したようにレコー
ドＤＲ₂とＳＲ₂の長さは等しいが，レコードＤＲ₂が
始まる位置ＢとレコードＳＲ₂が始まる位置Ｅとは異な
る。ファイルの先頭から各レコードが始まる位置までの
データ長（ＯＦＦＤ₂やＯＦＦＳ₂）をそのレコードの
オフセットという。As described above, the record DR ₂ of the data file FD and the record SR _{2 of the} reference file FS are similar. Record DR ₂ is data file F
The data length is OFFD ₂ from the beginning position A of D, and it starts at position B and ends at position C. The record SR ₂ starts from the beginning position D of the reference file FS at position E of length OFFS ₂ and continues to position F. As described above, the records DR ₂ and SR _{2 have} the same length, but the position B where the record DR ₂ starts and the position E where the record SR ₂ starts are different. The data length (OFFD ₂ or OFFS ₂ ) from the beginning of the file to the position where each record starts is called the offset of that record.

【００７８】類似レコードを検索する処理は結局のとこ
ろ，類似するレコードＤＲ₂とＳＲ₂の開始位置ＢとＥ
（オフセットＯＦＦＤ₂とＯＦＦＳ₂）および終了位置
ＣとＦをそれぞれ見付け出すための処理である。After all, the process of retrieving the similar records ends with the start positions B and E of the similar records DR ₂ and SR _2.
(Offsets OFFD ₂ and OFFS ₂ ) and end positions C and F are found.

【００７９】図４に示す例では見付け出された類似レコ
ードＤＲ₂，ＳＲ₂は１ブロック長よりも短い。したが
って，これらのレコードＤＲ₂の全体とレコードＳＲ₂
の全体とがＸＯＲ演算される。見付け出された類似レコ
ードが１ブロック長よりも長い場合には，これらの類似
レコードは１ブロックずつ分割され，分割された１ブロ
ックごとにＸＯＲ演算が行われ，ＸＯＲ演算結果の圧縮
処理が行われ，圧縮データ・レコードの作成が行われ
る。すなわち，この実施例では，すべてのデータは１ブ
ロックを単位として（または１ブロックよりも短いデー
タ長の状態で）すべての処理が施される。In the example shown in FIG. 4, the similar records DR ₂ and SR ₂ found are shorter than one block length. Therefore, the entire record DR ₂ and the record SR _{2 are}
Is subjected to XOR operation. When the found similar records are longer than one block length, these similar records are divided by one block, XOR operation is performed for each divided block, and the compression processing of the XOR operation result is performed. , Compressed data record is created. That is, in this embodiment, all data is subjected to all processing in units of one block (or in a state where the data length is shorter than one block).

【００８０】類似レコードの検索処理は大雑把な第１段
階の処理とより詳細な第２段階の処理とからなる。The similar record search process includes a rough first stage process and a more detailed second stage process.

【００８１】まず，第１段階の処理について説明する。First, the processing in the first stage will be described.

【００８２】図５(A) に示すように，データ・ファイル
ＦＤの１ブロックと参照ファイルＦＳの１ブロックから
第１の所定長の第１の部分データｆｄ₁，ｆｄ₂，ｆｄ
₃とｆｓ₁₀，ｆｓ₂₀，ｆｓ₃₀とが抜き出される。この実
施例では各ブロックの先端部と後端部と中間部の３箇所
において第１の部分データが抜き出されている。第１の
部分データの取出しは１箇所でも，２箇所でも，４箇所
以上でもよい。また，この実施例では第１の部分データ
のデータ長は10バイトである。As shown in FIG. 5A, from one block of the data file FD and one block of the reference file FS, the first partial data fd ₁ , fd ₂ , fd of the first predetermined length are obtained.
₃ and fs ₁₀ , fs ₂₀ , fs ₃₀ are extracted. In this embodiment, the first partial data is extracted at three points, that is, the front end portion, the rear end portion, and the intermediate portion of each block. The extraction of the first partial data may be performed at one location, two locations, or four or more locations. Further, in this embodiment, the data length of the first partial data is 10 bytes.

【００８３】これらの第１の部分データの対応するもの
同志が相互に比較される。すなわち，データｆｄ₁とデ
ータｆｓ₁₀とが比較され，それらの一致率が所定値以上
かどうかが判断される。同じようにデータｆｄ₂とｆｓ
₂₀とが比較され，データｆｄ₃とｆｓ₃₀とが比較され，
それらの一致率が所定値以上かどうかがそれぞれ判定さ
れる。データが一致しているかどうかは一般にはワード
単位（１バイト）で行われるであろう。Corresponding comrades of these first partial data are compared with each other. That is, the data fd ₁ and the data fs ₁₀ are compared with each other, and it is determined whether or not their matching rate is equal to or higher than a predetermined value. Similarly, data fd ₂ and fs
₂₀ is compared, data fd ₃ and fs ₃₀ are compared,
It is determined whether the matching rates are equal to or higher than a predetermined value. Whether or not the data match will generally be done word by word (1 byte).

【００８４】いずれの比較においても一致率が所定値以
下の場合には，データ・ファイルＦＤにおける第１の部
分データｆｄ₁，ｆｄ₂，ｆｄ₃をそのままにしておい
て，参照ファイルＦＳにおいて抜出すべき第１の部分デ
ータを図５(B) に示すように後方に１バイトシフトす
る。参照ファイルＦＳにおいて次に抜出される部分デー
タをｆｓ₁₁，ｆｓ₂₁，ｆｓ₃₁とする。これらの部分デー
タｆｄ₁，ｆｄ₂，ｆｄ₃とｆｓ₁₁，ｆｓ₂₁，ｆｓ₃₁と
をそれぞれ比較してそのデータ一致率の判定を行う。In any of the comparisons, if the matching rate is less than or equal to the predetermined value, the first partial data fd ₁ , fd ₂ , fd ₃ in the data file FD is left as it is and extracted in the reference file FS. The first partial data to be shifted is shifted backward by 1 byte as shown in FIG. 5 (B). Partial data to be extracted next in the reference file FS are fs ₁₁ , fs ₂₁ , and fs ₃₁ . These partial data fd ₁ , fd ₂ , fd ₃ are compared with fs ₁₁ , fs ₂₁ , fs ₃₁ , respectively, and the data matching rate is determined.

【００８５】参照ファイルＦＳから取出すべき部分デー
タを１バイトずつシフトしながら上記の処理を繰返して
いく。The above processing is repeated while shifting the partial data to be taken out from the reference file FS by 1 byte.

【００８６】図５(C) に示すように，参照ファイルＦＳ
から取出すべき部分データを参照ファイルＦＳの先頭か
らｍバイトシフトして，ｆｄ₁，ｆｄ₂，ｆｄ₃とｆｓ
_1m，ｆｓ_2m，ｆｓ_3mとをそれぞれ比較したときに，デー
タ・ファイルの部分データｆｄ₂と参照ファイルの部分
データｆｓ_2mとの一致率が所定値を超えたとすると，こ
れらの部分データｆｄ₂，ｆｓ_2mを含むある範囲におい
てデータ・ファイルＦＤと参照ファイルＦＳとが類似し
ていると考えられるので，類似している範囲を定めるた
めに第２段階の処理に進む。As shown in FIG. 5C, the reference file FS
The partial data to be extracted from the reference file FS is shifted by m bytes from the beginning, and fd ₁ , fd ₂ , fd ₃ and fs
_If the matching rate of the partial data fd _{2 of the} data file and the partial data fs _2m of the reference file exceeds a predetermined value when comparing _{1 m} , fs _2m , and fs _3m , respectively, these partial data fd ₂ , Since it is considered that the data file FD and the reference file FS are similar in a certain range including fs _2m , the process proceeds to the second stage to determine the similar range.

【００８７】部分データｆｄ₂とｆｓ_2mとの一致率が所
定値を超えたときに，参照ファイルＦＳから取出す第１
の部分データをｆｓ_2mの位置から１バイトずつシフト
し，これらの第１の部分データとデータ・ファイルＦＤ
の部分データｆｄ₂とを比較して，部分データｆｄ₂と
最も一致率の高い参照ファイルＦＳの部分データを捜し
出すようにすると一層好ましい。When the concordance rate of the partial data fd ₂ and fs _2m exceeds a predetermined value, the first fetched from the reference file FS
Shift the partial data of 1 byte by 1 byte from the position of fs _2m , and the first partial data and data file FD
Of comparing the partial data fd _2, more when to locate partial data of high reference file FS most consistent rate and partial data fd ₂ preferred.

【００８８】もし，参照ファイルＦＳから取出すべき第
１の部分データを順次１バイトずつシフトしていって，
取出すべき第１の部分データが参照ファイルＦＳの終端
にきてしまったときには，データ・ファイルＦＤから先
に取出した１ブロック長のデータと類似する部分は参照
ファイルＦＳには存在しないと判断して，データ・ファ
イルＦＤから次の１ブロック長のデータを取出して，上
記と同じような処理を繰返していく。If the first partial data to be taken out from the reference file FS is sequentially shifted by 1 byte,
When the first partial data to be extracted reaches the end of the reference file FS, it is determined that the reference file FS does not have a portion similar to the one-block-length data previously extracted from the data file FD. , The data of the next one block length is taken out from the data file FD, and the same processing as above is repeated.

【００８９】次に図６を参照して第２段階の処理につい
て説明する。Next, the second stage processing will be described with reference to FIG.

【００９０】第２段階の処理では，第１段階の処理にお
いて一致率が所定値以上である（または一致率が最も高
い）と判定された第１の部分データｆｄ₂とｆｓ_2mの位
置を固定する。すなわち，参照ファイルＦＳののシフト
量ｍを固定する。そして，データ・ファイルＦＤの第１
の部分データｆｄ₂の中から第１の部分データよりもデ
ータ長の短い第２の部分データｄ₁を取出す。この実施
例では第２の部分データｄ₁は１バイト長（１ワード）
であるので，以下第２の部分データをワードということ
にする。参照ファイルＦＳからもワードｄ₁に対応する
位置の近傍においてワードＳ₁₁，Ｓ₁₂またはＳ₁₃等を取
出し，これらとワードｄ₁とを順次比較する。In the second stage processing, the positions of the first partial data fd ₂ and fs _2m for which the matching rate is determined to be equal to or higher than the predetermined value (or the highest matching rate) in the first stage processing are fixed. To do. That is, the shift amount m of the reference file FS is fixed. And the first of the data file FD
The second partial data d ₁ having a shorter data length than the first partial data is extracted from the partial data fd ₂ of the above. In this embodiment, the second partial data d ₁ has a length of 1 byte (1 word)
Therefore, the second partial data will be referred to as a word hereinafter. See taken out word S _11, S ₁₂ or S ₁₃ and the like in the vicinity of the position corresponding to the word d ₁ from the file FS, sequentially compares with these and word d _1.

【００９１】ワードｄ₁とＳ₁₂とが一致したとすれば，
次にこれらの左または右に隣接するワードを両ファイル
ＦＤ，ＦＳから取出して一致するかどうかを調べる。両
ファイルＦＤ，ＦＳから取出すべきワードを１ワードず
つ一方向にシフトしながら，一致しないワードが所定組
出現するまでたとえばｄ₂とＳ₂₂，ｄ₃とＳ₃₂というよ
うに順次比較していく。If the words d ₁ and S ₁₂ match,
Next, these left or right adjacent words are extracted from both files FD and FS to check whether they match. Words to be extracted from both files FD and FS are shifted one word at a time in one direction, and are sequentially compared, for example, d ₂ and S ₂₂ , d ₃ and S ₃₂ until a predetermined set of unmatched words appears.

【００９２】ワードｄ₁とＳ₁₁も一致した場合には，参
照ファイルＦＳのシフト量を（ｍ−１）に固定して，同
じようにこれらの左または右に隣接するワードを取出し
て一致するかどうかを調べる。取出すべきワードを１ワ
ードずつ一方向にシフトしながら，一致しないワードが
所定組出現するまで，たとえばｄ₂とＳ₂₁，ｄ₃とＳ₃₁
というように順次比較していく。If the words d ₁ and S ₁₁ also match, the shift amount of the reference file FS is fixed to (m-1), and words adjacent to the left or right of these are similarly extracted and matched. Find out if While shifting the words to be taken out one word at a time, until a predetermined set of unmatched words appears, for example, d ₂ and S ₂₁ , d ₃ and S _31.
It will be compared sequentially.

【００９３】このようにして，一致するワードの組が数
多く（所定割合以上）出現する範囲のうちで最も広い範
囲を捜し出して，その範囲をデータ・ファイルＦＤにお
いてはＢ〜Ｃ，参照ファイルＦＳにおいてはＥ〜Ｆと同
定する。ＢとＥがこのようにして捜し出されたレコード
のオフセットである。In this way, the widest range is searched from the range in which a large number of matching word sets (more than a predetermined ratio) appear, and the range is searched for in the data file FD, B to C, and in the reference file FS. Identify E to F. B and E are the offsets of the records found in this way.

【００９４】もし，ワードｄ₁とワードＳ₁₁，Ｓ₁₂，Ｓ
₁₃等が一致しなかった場合には，データ・ファイルＦＤ
においてワードｄ₁に隣接するワードｄ₄と参照ファイ
ルＦＳにおける対応する位置の近傍のワードＳ₁₂，
Ｓ₁₃，Ｓ₁₄等とを比較し，一致するワードの組を捜すこ
とになる。If word d ₁ and words S ₁₁ , S ₁₂ , S
_{If 13} mag does not match, data file FD
, The word d ₄ adjacent to the word d ₁ and the word S ₁₂ near the corresponding position in the reference file FS,
Comparing the S _13, S ₁₄ or the like, so that the search for a set of matching word.

【００９５】再び図４を参照して，このようにしてデー
タ・ファイルＦＤのレコードＤＲ₂と参照ファイルＦＳ
のレコードＳＲ₂とが類似すると判定されると，次にデ
ータ・ファイルＦＤからは鎖線で示すようにレコードＤ
Ｒ₂に続く１ブロック長のデータが読出され，この１ブ
ロック長のデータについて上述した第１段階と第２段階
の処理が行われる。Referring again to FIG. 4, in this way the record DR ₂ of the data file FD and the reference file FS
If it is determined that the record SR ₂ is similar to the record SR _{2 of the}
Data of one block length following R ₂ is read, and the above-mentioned first and second steps are performed on the data of one block length.

【００９６】このような処理を繰返していくことによ
り，データ・ファイルＦＤの全域について，参照ファイ
ルＦＳの部分と類似するレコードが見付け出される。By repeating such processing, a record similar to the part of the reference file FS is found in the entire area of the data file FD.

【００９７】(3) 圧縮処理このようにして見付け出された参照ファイルＦＳのレコ
ードＳＲ_iと類似するデータ・ファイルＦＤのレコード
ＤＲ_iをレコードＳＲ_iを利用して圧縮する処理の一例
について説明する。(3) Compressing Process An example of the process of compressing the record DR _i of the data file FD similar to the record SR _i of the reference file FS thus found by using the record SR _i will be described. .

【００９８】レコードＤＲ_i，ＳＲ_iが１ブロック長よ
り長い場合には，上述したように，１ブロックごとにＸ
ＯＲ演算が行われ，１ブロックごとに圧縮処理が行われ
る。When the records DR _i and SR _i are longer than one block length, as described above, X is calculated for each block.
An OR operation is performed and compression processing is performed for each block.

【００９９】ＸＯＲ演算後の１ブロック長（またはこれ
よりも短い）の中間データは上述したようにワード００
（16進数表現，以下同じ）を多く含んでいる。こような
中間データを構成する全ワードについて，その出現頻度
の統計をとり，出現頻度の高い方から順に所定種類数の
ワードを選択する。この実施例では出現頻度が１番，２
番および３番の３種類のワードを選択するものとし，こ
れらをＨ１，Ｈ２，Ｈ３とする。Ｈ１，Ｈ２，Ｈ３を高
頻度ワードということにする。殆どの場合，Ｈ１はワー
ド００であろう。Intermediate data of one block length (or shorter than this) after the XOR operation is word 00 as described above.
It contains a lot of (hexadecimal expression, the same below). With respect to all the words forming such intermediate data, statistics of the appearance frequency are taken, and a predetermined number of types of words are selected in order from the one having the highest appearance frequency. In this embodiment, the appearance frequencies are 1 and 2.
No. 3 and No. 3 words are selected, and these are designated as H1, H2, and H3. H1, H2, and H3 are referred to as high-frequency words. In most cases H1 will be word 00.

【０１００】次に，その１ブロック長の中間データには
出現しないワードを圧縮のために必要とする種類数捜し
出す。この実施例では９種類のワードが必要であると
し，それらをＳ１，Ｓ２，Ｓ３，…，Ｓ９で表わす。こ
れらのワードを置換ワードということにする。Next, the number of types required for compression is searched for words that do not appear in the intermediate data of one block length. In this embodiment, it is assumed that nine kinds of words are required, which are represented by S1, S2, S3, ..., S9. These words are called replacement words.

【０１０１】さらにこの実施例では頻繁に出現する連続
文字もデータ圧縮の対象とする。たとえば，この種の連
続文字としてはテキスト・データの場合よく現われる改
行，復帰を表わす０Ｄ０Ａがある。Further, in this embodiment, continuous characters that frequently appear are also subject to data compression. For example, a continuous character of this kind is 0D0A which indicates a line feed and a carriage return which often appear in the case of text data.

【０１０２】データ圧縮のために次のような置換を行
う。The following replacement is performed for data compression.

【０１０３】ワードＨ１が２個連続した場合，これらを
ワードＳ１で置換する。When two words H1 are consecutive, these are replaced by the word S1.

【０１０４】ワードＨ１が３個連続した場合，これらを
ワードＳ２で置換する。When three consecutive words H1 are present, these are replaced by the word S2.

【０１０５】ワードＨ１が４個連続した場合，これらを
ワードＳ３で置換する。When four words H1 are consecutive, these are replaced by the word S3.

【０１０６】ワードＨ１が５個以上連続した場合，これ
らをワードＳ４と連続する個数を表わす数字（８ビット
で255 個の連続まで表現可能）との組合せで表現する。When five or more words H1 are consecutive, these are expressed in combination with the word S4 and a numeral representing the number of consecutive words (up to 255 consecutive words can be represented by 8 bits).

【０１０７】ワードＨ２が２個連続した場合，これらを
ワードＳ５で置換する。When two words H2 are consecutive, these are replaced by the word S5.

【０１０８】ワードＨ２が３個以上連続した場合，これ
らをワードＳ６と連続する個数を表わす数字の組合せに
より表現する。When three or more words H2 are consecutive, these are expressed by a combination of the numbers representing the number of consecutive words S6.

【０１０９】ワードＨ３が２個連続した場合，これらを
ワードＳ７で置換する。When two words H3 are consecutive, these are replaced by the word S7.

【０１１０】ワードＨ３が３個以上連続した場合，これ
らをワードＳ８と連続する個数を表わす数字の組合せに
より表現する。When three or more words H3 are consecutive, these are represented by a combination of the numbers representing the consecutive numbers with the word S8.

【０１１１】連続文字０Ｄ０ＡをワードＳ９で置換す
る。Replace consecutive characters 0D 0A with word S9.

【０１１２】これらをまとめると次のようになる。These are summarized as follows.

【０１１３】Ｈ１Ｈ１ → Ｓ１Ｈ１Ｈ１Ｈ１ → Ｓ２Ｈ１Ｈ１Ｈ１Ｈ１ → Ｓ３Ｈ１が５個以上連続したとき → Ｓ４＋個数Ｈ２Ｈ２ → Ｓ５Ｈ２が３個以上連続したとき → Ｓ６＋個数Ｈ３Ｈ３ → Ｓ７Ｈ３が３個以上連続したとき → Ｓ８＋個数０Ｄ０Ａ → Ｓ９H1 H1 → S1 H1 H1 H1 → S2 H1 H1 H1 H1 → S3 When H1 is 5 or more in succession → S4 + number H2 H2 → S5 H2 is 3 or more in succession → S6 + number H3 H3 → S7 H3 When three or more are consecutive → S8 + number 0D 0A → S9

【０１１４】類似するレコードのＸＯＲ演算結果は，何
回も繰返すように，ワード００を多く含み，かつ出現す
るワードの種類数も多くはない。このため，上述のよう
な圧縮方法により，非常に効率の高いデータ圧縮が可能
となる。The XOR operation results of similar records include many words 00 so that they are repeated many times, and the number of types of words that appear is not large. Therefore, the above-described compression method enables very efficient data compression.

【０１１５】図７は上記の圧縮方法にしたがう圧縮処理
の様子を示すものである。この図において，高頻度ワー
ドおよび置換ワード以外の数字は16進数表現されてい
る。高頻度ワード以外のワード（たとえば図７の８８）
についてはそのまま配列される。FIG. 7 shows a state of compression processing according to the above compression method. In this figure, numbers other than high-frequency words and replacement words are expressed in hexadecimal. Words other than high-frequency words (eg 88 in FIG. 7)
Are arranged as they are.

【０１１６】圧縮データにおいて，実圧縮データＤ１の
前に，置換ワードの列ＣＣおよび高頻度ワードの列ＣＰ
Ｃが一定の順序で配列されている。In the compressed data, a column CC of replacement words and a column CP of high-frequency words precede the actual compressed data D1.
Cs are arranged in a fixed order.

【０１１７】圧縮データの伸張処理は上記の圧縮処理手
順を逆にたどっていくことにより行われる。Decompression processing of compressed data is performed by following the above compression processing procedure in reverse.

【０１１８】(4) 圧縮データ・レコード・フォーマット図８は圧縮データ・レコード・フォーマットを示してい
る。この圧縮データ・レコードも可変長である。(4) Compressed Data Record Format FIG. 8 shows the compressed data record format. This compressed data record is also variable length.

【０１１９】圧縮データ・レコードは，レコードNO. Ｒ
ＮＯ，データ・ファイルにおける先頭からのオフセット
ＯＦＦＤ，参照ファイルにおける先頭からのオフセット
ＯＦＦＳ，参照ファイルを利用して圧縮処理をしている
かどうかを表わす参照フラグＣＦ，圧縮処理をしている
かどうかを表わす圧縮フラグＣＰＦ，後に続くデータ
（置換ワード列ＣＣ，高頻度ワード列ＣＰＣおよび実圧
縮データＤ１）の長さを示すサイズ・データＳＺ（以上
を，復元用補助データという），置換ワード列ＣＣ，高
頻度ワード列ＣＰＣおよび実圧縮データＤ１から構成さ
れる。レコードＤＲ₁，ＤＲ₃，ＤＲ_nのように圧縮処
理が行われないレコードについては，復元用補助データ
ＲＮＯ，ＯＦＦＤ，ＯＦＦＳ，ＣＦ，ＣＰＦ，ＳＺに続
いて，圧縮されていないこれらのレコードのデータ（非
圧縮データ）Ｄ２が配列されることにより，圧縮データ
・レコードが作成される。The compressed data record is record No. R.
NO, offset OFFD from the beginning in the data file, offset OFFS from the beginning in the reference file, reference flag CF indicating whether compression processing is performed using the reference file, compression indicating whether compression processing is performed Flag CPF, size data SZ indicating the length of the following data (replacement word string CC, high-frequency word string CPC and actual compressed data D1) (the above is referred to as auxiliary data for restoration), replacement word string CC, high frequency It is composed of a word string CPC and actual compressed data D1. For records such as records DR ₁ , DR ₃ , and DR _n that are not compressed, the uncompressed data of these records is followed by decompression auxiliary data RNO, OFFD, OFFS, CF, CPF, SZ. By arranging (uncompressed data) D2, a compressed data record is created.

【０１２０】後に説明するが，この実施例では参照ファ
イルＦＳを全く参照しないで圧縮する処理も行われる。
データ・ファイルＦＤ（またはその１ブロック）がそれ
自体で冗長度が高いまたは規則性がある場合には，参照
ファイルを全く用いることなく圧縮処理が可能である。
この場合にも，１ブロックごとに圧縮されることにより
得られた圧縮データは図８に示すフォーマットにしたが
って編集される。As will be described later, in this embodiment, the compression process is also performed without referring to the reference file FS.
If the data file FD (or one block thereof) itself has high redundancy or regularity, the compression process can be performed without using any reference file.
Also in this case, the compressed data obtained by compressing each block is edited according to the format shown in FIG.

【０１２１】上述した参照フラグＣＦは，このように参
照ファイルを全く利用することなくデータ・ファイルが
それ自体で圧縮された場合，およびＤＲ₁，ＤＲ₃，Ｄ
Ｒ_nのように類似するレコードを参照ファイルで見付け
ることができなかった場合にリセットされ，参照ファイ
ルのレコードを利用して圧縮された場合にはセットされ
る。The above-mentioned reference flag CF is used when the data file is compressed by itself without using the reference file as described above, and DR ₁ , DR ₃ , D.
It is reset when a similar record such as R _n cannot be found in the reference file, and is set when the record in the reference file is used for compression.

【０１２２】圧縮フラグは，ＤＲ₁，ＤＲ₃，ＤＲ_nの
ように類似するレコードが参照ファイルで見付けること
ができずに圧縮処理が施されていない場合にリセットさ
れ，それ以外の参照ファイルを利用する，しないにかか
わらず圧縮処理が施されている場合にはセットされる。The compression flag is reset when similar records such as DR ₁ , DR ₃ , and DR _n cannot be found in the reference file and the compression processing is not performed, and other reference files are used. It is set if the compression process is performed regardless of whether or not.

【０１２３】(5) 処理の流れ図９は送信装置10におけるデータ・ファイルの圧縮処理
手順の流れを示している。(5) Process Flow FIG. 9 shows a flow of a data file compression process procedure in the transmission device 10.

【０１２４】ファイル・オープン等の初期処理（ステッ
プ101 ）ののち，データ・ファイルＦＤから１ブロック
長のデータを読込み（ステップ103），その１ブロック
・データが参照ファイルＦＳを利用することなくそれ自
体で圧縮可能かどうかが判断される（ステップ104 ）。
これは上述したように，冗長度が高いかどうか，規則性
が強いかどうかなどを基準にして判定される。After initial processing such as file opening (step 101), data of one block length is read from the data file FD (step 103), and the one block data itself does not use the reference file FS. It is determined whether or not the compression is possible (step 104).
As described above, this is determined based on whether the redundancy is high, the regularity is strong, or the like.

【０１２５】それ自体で圧縮処理が可能であれば，参照
フラグＣＦがリセットされ，圧縮フラグＣＰＦがセット
され（ステップ105 ），圧縮処理に進み，上述した圧縮
処理または公知の手法によるデータ圧縮が行われる（ス
テップ110 ）。圧縮されたデータは上述した圧縮データ
・レコード・フォーマットにしたがって編集され，圧縮
ファイル（ＦＣ）の所定位置に（たとえばレコードNO.
順にしたがう位置に）格納される（ステップ112 ）。If the compression process is possible by itself, the reference flag CF is reset, the compression flag CPF is set (step 105), and the process advances to the compression process, and the above-mentioned compression process or data compression by a known method is performed. (Step 110). The compressed data is edited according to the above-mentioned compressed data record format, and is stored at a predetermined position in the compressed file (FC) (for example, record NO.
It is stored (at positions following the order) (step 112).

【０１２６】それ自体で圧縮が可能でなければ，参照フ
ァイルＦＳがサーチされ，類似するレコードが参照ファ
イルＦＳに存在するかどうかがチェックされる（ステッ
プ106 ）。If the file itself is not compressible, the reference file FS is searched and it is checked whether a similar record exists in the reference file FS (step 106).

【０１２７】類似するレコードがあれば，上述したよう
にデータ・ファイルＦＤと参照ファイルＦＳの類似する
レコード間でＸＯＲ演算が行われ，中間データが得られ
る（ステップ108 ）。参照フラグＣＦおよび圧縮フラグ
ＣＰＦがともにセットされ（ステップ109 ），上述した
または公知の手法によりデータ圧縮が施され（ステップ
110 ），圧縮データ・レコード・フォーマットにしたが
って圧縮ファイル（ＦＣ）の所定位置に書込まれる（ス
テップ112 ）。If there are similar records, the XOR operation is performed between the similar records in the data file FD and the reference file FS as described above, and intermediate data is obtained (step 108). Both the reference flag CF and the compression flag CPF are set (step 109), and data compression is performed by the above-described or known method (step 109).
110), according to the compressed data record format, is written in a predetermined position of the compressed file (FC) (step 112).

【０１２８】類似するレコードが見付からない場合に
は，参照フラグＣＦおよび圧縮フラグＣＰＦがともにリ
セットされ（ステップ111 ），圧縮処理されることな
く，圧縮データ・レコード・フォーマットにしたがって
圧縮ファイルＦＣに書込まれる（ステップ112 ）。If no similar record is found, both the reference flag CF and the compression flag CPF are reset (step 111) and the data is written to the compressed file FC according to the compressed data record format without being compressed. (Step 112).

【０１２９】以上の処理は１ブロック長ごとに繰返し実
行され，データ・ファイルＦＤ中のすべてのデータにつ
いて処理が終了すれば（ステップ102 ），ファイル・ク
ローズ等の終了処理（ステップ113 ）をもってすべての
処理が終る。The above processing is repeatedly executed for each block length, and when the processing is completed for all the data in the data file FD (step 102), all the processing is completed by the ending processing such as file closing (step 113). Processing is complete.

【０１３０】図10は受信装置20における受信した圧縮フ
ァイルの復元処理の手順を示している。FIG. 10 shows the procedure of the decompression process of the received compressed file in the receiving device 20.

【０１３１】初期処理ののち（ステップ121 ），圧縮フ
ァイルＦＣから１圧縮データ・レコードのデータを読込
み（ステップ123 ），圧縮フラグＣＰＦを参照して圧縮
処理が施されているかどうかが判定される（ステップ12
4 ）。After the initial processing (step 121), the data of one compressed data record is read from the compressed file FC (step 123), and it is judged by referring to the compression flag CPF whether or not the compression processing has been performed (step 121). Step 12
Four ).

【０１３２】圧縮処理が施されているものであればデー
タ伸張処理が実行され（ステップ125 ），そうでなけれ
ばこのステップ125 はスキップされる。If the compression processing has been performed, the data expansion processing is executed (step 125), and if not, this step 125 is skipped.

【０１３３】続いて，参照フラグＣＦの状態をみて，参
照ファイルＦＳを利用した圧縮かどうかがチェックされ
（ステップ126 ），そうであれば参照ファイル・オフセ
ットＯＦＦＳを用いて対応するレコードが参照ファイル
ＦＳから読出され（ステップ127 ），伸張されたデータ
と参照ファイルＦＳのレコードとのＸＯＲ演算が行われ
る（ステップ128 ）。これにより，元のデータが復元す
るので，データ・ファイル・オフセットＯＦＦＤを参照
してデータ・ファイルＦＤの該当場所に書込まれる（ス
テップ129 ）。Next, by checking the state of the reference flag CF, it is checked whether or not the compression uses the reference file FS (step 126), and if so, the corresponding record is obtained by using the reference file offset OFFS. Is read (step 127) and the decompressed data and the record of the reference file FS are subjected to XOR operation (step 128). As a result, the original data is restored, and the data file offset OFFD is referred to and the data is written in the corresponding location of the data file FD (step 129).

【０１３４】データ圧縮されていない場合には実データ
がそのまま，参照ファイルを利用しないでそれ自体で圧
縮されている場合にはステップ125 で伸張されたデータ
が，データ・ファイル・オフセットＯＦＦＤを手がかり
にデータ・ファイルＦＤの元の場所に書込まれる（ステ
ップ129 ）。When the data is not compressed, the actual data is as it is. When the data is compressed by itself without using the reference file, the data decompressed in step 125 is used as a clue for the data file offset OFFD. It is written to the original location of the data file FD (step 129).

【０１３５】以上の処理は圧縮ファイルＦＣの圧縮デー
タ・レコードごとに実行され，すべての圧縮データにつ
いて処理が終了すれば（ステップ122 ），終了処理（ス
テップ130 ）を経てすべての処理を終る。The above processing is executed for each compressed data record of the compressed file FC, and when the processing is completed for all the compressed data (step 122), all the processing is completed through the ending processing (step 130).

【図面の簡単な説明】[Brief description of drawings]

【図１】データ・ファイルの通信システムを示すブロッ
ク図である。FIG. 1 is a block diagram illustrating a data file communication system.

【図２】参照ファイルを利用したデータ圧縮処理を説明
するものである。FIG. 2 illustrates a data compression process using a reference file.

【図３】参照ファイルを利用したデータ伸張処理を説明
するものである。FIG. 3 illustrates a data decompression process using a reference file.

【図４】データ・ファイルと参照ファイルの１ブロック
・データを示す。FIG. 4 shows one block data of a data file and a reference file.

【図５】類似レコード検索処理の第１段階を示す。FIG. 5 shows a first stage of similar record search processing.

【図６】類似レコード検索処理の第２段階を示す。FIG. 6 shows a second stage of similar record search processing.

【図７】原データとそれをデータ圧縮して得られる圧縮
データとの例を示す。FIG. 7 shows an example of original data and compressed data obtained by compressing the original data.

【図８】圧縮データ・レコード・フォーマットを示す。FIG. 8 shows a compressed data record format.

【図９】送信装置におけるデータ・ファイルの圧縮処理
手順を示すフロー・チャートである。FIG. 9 is a flow chart showing a compression processing procedure of a data file in the transmission device.

【図１０】受信装置における圧縮ファイルの伸張処理手
順を示すフロー・チャートである。FIG. 10 is a flowchart showing a procedure of decompressing a compressed file in the receiving device.

[Explanation of symbols]

10 送信装置 20 受信装置 10 transmitter 20 receiver

Claims

[Claims]

1. A data file to be subjected to data compression and a reference file corresponding to the data file are partially compared, and a part of the reference file and a record having an arbitrary data length having a high data matching rate are recorded in the data file. When a record having a high data matching rate is found in the file, an exclusive OR operation is performed between the record found in the data file and the corresponding record in the reference file, and The data obtained by the exclusive OR operation is data-compressed, and the auxiliary data for restoration is added to the compressed data obtained by the above-mentioned data compression processing to create a compressed data record, and any part of the reference file can be created. For records in the above data files that have a low match rate, add auxiliary data for restoration to those records. To create a reduced data record, to create a compressed file to edit these compressed data records, data compression method.

2. Determining whether the data file or part thereof can be data compressed by itself without using the reference file, and if possible, the data file or part thereof by itself. A data compression method, wherein data is compressed, and if not possible, the data compression method according to claim 1 is executed.

3. The auxiliary data for restoration is a record number, data indicating the position of a record with a high matching rate in a data file, data indicating the position of a record with a high matching rate in a reference file, and compression using a reference file. The data compression method according to claim 1 or 2, which includes data indicating whether processing is performed, data indicating whether data is compressed, data indicating a data size, and data related to data compression processing.

4. A compressed file is checked for each compressed data record by referring to its auxiliary data for decompression to determine whether decompression processing is required. If it is determined that decompression processing is required, the compressed data The record is decompressed, an exclusive OR operation is performed between the decompressed record and the corresponding record in the reference file to create a data record, and the data record created by the above process and the decompression Data contained in compressed data records that do not require processing
A data decompression method in which a data file is restored by editing records and auxiliary data for restoration.

5. A method of decompressing the compressed data record to restore the data file without using the reference file, when the data file or a part thereof is data-compressed by itself. Item 4. The data decompression method according to Item 4.

6. The auxiliary data for restoration is a record number, data indicating a position of a record having a high matching rate in a data file, data indicating a position of a record having a high matching rate in a reference file, and compression using a reference file. The data decompression method according to claim 4 or 5, which includes data indicating whether processing is performed, data indicating whether data is compressed, data indicating a data size, and data related to data compression processing.

7. The system according to claim 1, wherein the transmitting device transmitting the compressed data and the receiving device receiving the compressed data transmitted from the transmitting device both hold the reference file. A data transmission method, in which data is compressed according to the data compression method described in, and the compressed data obtained by this processing is transmitted from the transmission device to the reception device.

8. The system according to claim 1 or 2, wherein the transmitting device transmitting the compressed data and the receiving device receiving the compressed data transmitted from the transmitting device both hold the reference file. 6. The data compression method according to claim 4, wherein the compressed data obtained by this processing is transmitted from the transmitting device to the receiving device, and the received data is received by the receiving device.
A data transmission method that expands data according to the data expansion method described in.

9. A data file to be data-compressed and a corresponding reference file are partially compared, and a part of the reference file is replaced with a record having an arbitrary data length having a high data matching rate. When a record having a high data matching rate is found by the searching means in the file, the record found in the data file and the corresponding record in the reference file are exclusive. A data compression means for performing a logical sum operation and for compressing the data obtained by the exclusive logical sum operation, and a compressed data record by adding decompression auxiliary data to the compressed data obtained by the data compression means. To a record in the above data file that has a low match rate with any part of the above referenced file. Create a compressed data record by adding auxiliary data for restoration in the record Te, editing means for creating a compressed file by editing these compressed data records, data compression device provided with a.

10. The decompressing auxiliary data is a record number, data indicating a position of a record having a high matching rate in a data file, data indicating a position of a record having a high matching rate in a reference file, and compression using a reference file. 10. The data compression apparatus according to claim 9, including data indicating whether processing is performed, data indicating whether data is compressed, data indicating a data size, and data related to a data compression process.

11. A means for deciding whether decompression processing is required by referring to auxiliary data for decompression of a compressed file for each compressed data record, and in the case where the decompression processing judges that decompression processing is necessary. Is a data decompression means for decompressing the compressed data record and performing an exclusive OR between the data decompressed record and the corresponding record in the reference file to create a data record; Editing means for restoring the data file by editing the data record created by the decompressing means and the data record included in the compressed data record that does not require decompression processing by referring to the auxiliary data for restoration, Data decompression device equipped with.

12. The method according to claim 12,Data match with part of the reference file above
Records with a high rate of arbitrary data length are
The process of finding out in the above The first partial data of the first predetermined length from the data file
Take out the data,the aboveFrom the reference file, the first predetermined length
Extract the first partial data of
At least the position to take outthe aboveIn the reference file
While shifting by a predetermined byte,the abovedata file
andthe aboveThe above 1st extracted from each reference file
Comparing the partial data of each other with each other,
Minute data is examined, and when the first partial data with a high concordance rate is found,
The shift amount for extracting the partial data of No. 1 is fixed, and in the vicinity of the fixed extraction position, the first predetermined length
The second partial data of the second predetermined length which is much shorter than the above data
・ Extracted from the file and the reference file above
The extraction location of those files and the reference file and data above.
While shifting by a predetermined byte in the data file,
Above data file andthe aboveFrom reference file it
It is possible to compare the above-mentioned second partial data that has been extracted with each other.
The range of records with a high matching rate is
Files andthe aboveDetermine in reference fileprocessing
including，Data compression according to claim 1. Method.

13. The one-block data having a fixed data length is extracted from the data file, and the extracted one-block data is shifted while shifting the extraction position of the first partial data in the reference file. If the first partial data with a high concordance rate is not found,
13. The data compression method according to claim 12, wherein the position of one block of data to be extracted from the data file is shifted by one block length and the checking process is repeated.