JPH0936747A

JPH0936747A - Data compression method and data compressor

Info

Publication number: JPH0936747A
Application number: JP7181485A
Authority: JP
Inventors: Hidetoshi Sakakibara; 秀敏榊原; Atsuko Toda; 亜津子戸田
Original assignee: Toshiba Corp; Toshiba Computer Engineering Corp
Current assignee: Toshiba Corp; Toshiba Computer Engineering Corp
Priority date: 1995-07-18
Filing date: 1995-07-18
Publication date: 1997-02-07

Abstract

PROBLEM TO BE SOLVED: To provide the data compression method and the data compressor by which data can be compressed at a high speed without taking mush time for data retrieval by adopting the hash method for data compression by the LZ slide dictionary method so as to retrieve data. SOLUTION: When a data input output device 1 reads compression processing object data to a memory 3, a control circuit 2 retrieves the data string the same as compression processing object data string from a hash table by the hash method and compares the data of the retrieved data string with that of the compression processing object data string one by one and provides an output of equal data length of both the data strings, repeats the retrieval and comparison till the retrieval object is not in existence and selects a data string whose equal data length is longest among data strings at the retrieved hand position and succeeding positions and provides an output of the head position of the selected data string and the equal data length as equality information and registers the head position of the compression processing object data string to a hash table as a retrieval object.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明が属する技術分野】本発明は、文章データ等のデ
ータを圧縮するデータ圧縮方法及びデータ圧縮装置に関
する。TECHNICAL FIELD The present invention relates to a data compression method and a data compression apparatus for compressing data such as text data.

【０００２】[0002]

【従来の技術】文章データ等のデータを圧縮するデータ
圧縮方法の一例として、ＬＺ（Lempel-Ziv）スライド辞
書法がある。このＬＺスライド辞書法は、図１２に示す
ように、圧縮処理対象であるデータをその先頭から検索
していき、１度出現したデータ列が再度出現した場合
は、以前に出現したデータ列の先頭アドレスと両データ
列で一致するデータの長さとを一致情報として、再度出
現したデータ列に置換えて出力することによりデータ圧
縮を行う方法である。2. Description of the Related Art As an example of a data compression method for compressing text data or the like, there is an LZ (Lempel-Ziv) slide dictionary method. As shown in FIG. 12, this LZ slide dictionary method searches for data to be compressed from the beginning, and when a data string that appears once appears again, the beginning of the previously appeared data string is searched. This is a method of performing data compression by replacing the address and the length of the matching data in both data strings with the matching information and replacing it with the data string that has appeared again.

【０００３】このＬＺスライド辞書法においては、圧縮
対象データ列が以前に出現したデータであるか否かを検
索するために、様々なデータの検索法が用いられてい
る。例えば、２分木による検索法においては、入力デー
タ中に現れる各文字コードを先頭（根節点）として、順
次２つの節点に分岐して行く木構造をなす検索パスを構
成する。そして、ある文字コードとその後に続く文字コ
ードとについて、これらの文字コードを例えば１６進数
コードで表現した場合の大小の比較結果によって、後の
文字コードをどちらの節点に分岐させて配置するかを決
定して行く。In the LZ slide dictionary method, various data retrieval methods are used to retrieve whether or not the data string to be compressed is data that has previously appeared. For example, in the search method using a binary tree, each character code appearing in the input data is set as a head (root node), and a search path having a tree structure that branches into two nodes in sequence is configured. Then, regarding a certain character code and the character code that follows, depending on the comparison result of the size when these character codes are expressed in hexadecimal code, for example, to which node the subsequent character code should be branched and arranged. Make a decision.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、上記２
分木による検索では、一つのデータ列を検索するのに節
点に配置された一文字毎に比較していくため、木の成長
状態がアンバランスになっている（検索パスの一部だけ
が長くなっている）場合には、その検索パスの最終の節
点（葉節点）に配置されたデータを検索するには、非常
に長い時間を要するという欠点がある。[Problems to be Solved by the Invention] However, the above 2
In the search using the branch tree, the tree growth state is unbalanced because one character string placed at each node is compared to search one data string (only a part of the search path is long. However, it takes a very long time to search the data located at the final node (leaf node) of the search path.

【０００５】また、トライ木による検索法が採用される
こともあり、このトライ木の場合には、２分木に比べて
より高速に検索を行うことができる。しかし、上記トラ
イ木の場合、膨大なメモリ空間を必要とするという不具
合があった。In some cases, a search method using a trie tree is adopted, and in the case of this trie tree, the search can be performed faster than in the case of the binary tree. However, the above trie tree has a problem that it requires a huge memory space.

【０００６】本発明は上記課題を解決するもので、その
目的は、ＬＺスライド辞書法によるデータを圧縮する場
合に、データの検索を高速で実行できると共に、検索時
に必要なメモリ空間を極力少なくすることができるデー
タ圧縮方法及びデータ圧縮装置を提供するにある。The present invention is intended to solve the above problems, and an object thereof is to perform data retrieval at high speed when compressing data by the LZ slide dictionary method, and to minimize the memory space required at the time of retrieval. (EN) Provided is a data compression method and a data compression device capable of performing the same.

【０００７】[0007]

【課題を解決するための手段】本発明のデータ圧縮方法
は、ＬＺスライド辞書法を用いて文書データ等のデータ
を圧縮するデータ圧縮方法において、圧縮処理対象のデ
ータ中における圧縮処理済みデータの次の位置から始ま
る圧縮対象データ列と同じデータ列の候補を前記圧縮処
理済みデータの中からハッシュ法によって検索する検索
ステップと、この検索ステップにおいて検索されたデー
タ列と前記圧縮対象データ列とを比較して一致している
データ長を求める比較ステップと、前記圧縮対象データ
列と同じデータ列の候補を前記圧縮処理済みデータの中
から検索候補がなくなるまで前記検索ステップと前記比
較ステップを繰返し実行する繰返しステップと、前記比
較ステップにおいて求められた一致データ長の中から最
長のデータ長のデータ列を選択して出力する選択ステッ
プと、前記圧縮対象データ列をハッシュ法による検索候
補としてハッシュ表に登録する登録ステップとを備えた
ところに特徴を有する。A data compression method according to the present invention is a data compression method for compressing data such as document data using the LZ slide dictionary method. A search step of searching the compression processed data for a candidate of the same data string as the compression target data string starting from the position by a hash method, and comparing the data string searched in this search step with the compression target data string. And a comparison step of obtaining a matched data length, and a search step and the comparison step are repeatedly executed until there are no search candidates for the same data string as the compression target data string from the compressed data. The data with the longest data length is selected from the matching data lengths obtained in the iterative step and the comparison step. Having a selecting step of selecting and outputting data column, characterized in was a registration step of registering the hash table the compressed data string as the search candidate by hashing.

【０００８】この場合、繰返しステップにおいては、検
索及び比較を所定回数繰返した後打切るように構成する
ことが好ましい。また、検索及び比較の打切り回数は、
３乃至２０回に設定することが好ましい。更に、登録ス
テップにおいては、圧縮対象データ列をハッシュ法によ
る検索候補のうちの最初に検索される位置に登録するよ
うに構成することが良い。加えて、比較ステップにおい
て最長の一致データ長を検出したときの検索回数を統計
的に記憶することにより、繰返しステップにおける打切
り回数を自動的に最適化して決定するように構成しても
良い。In this case, in the repeating step, it is preferable that the search and the comparison are repeated a predetermined number of times and then terminated. In addition, the number of times the search and comparison are terminated
It is preferably set to 3 to 20 times. Further, in the registration step, it is preferable that the compression target data string is registered in the first searched position among the search candidates by the hash method. In addition, by statistically storing the number of searches when the longest matching data length is detected in the comparison step, the number of cutoffs in the repeating step may be automatically optimized and determined.

【０００９】本発明のデータ圧縮装置は、ＬＺスライド
辞書法を用いて文書データ等のデータを圧縮するデータ
圧縮装置において、圧縮処理対象のデータ中における圧
縮処理済みデータの次の位置から始まる圧縮対象データ
列と同じデータ列の候補を前記圧縮処理済みデータの中
からハッシュ法によって検索する検索手段と、この検索
手段により検索されたデータ列と前記圧縮対象データ列
とを比較して一致しているデータ長を求める比較手段
と、前記圧縮対象データ列と同じデータ列の候補を前記
圧縮処理済みデータの中から検索候補がなくなるまで検
索及び比較を繰返し実行する繰返し手段と、前記比較手
段により求められた一致データ長の中から最長のデータ
長のデータ列を選択して出力する選択手段と、前記圧縮
対象データ列をハッシュ法による検索候補としてハッシ
ュ表に登録する登録手段とを備えたところに特徴を有す
る。The data compression apparatus of the present invention is a data compression apparatus for compressing data such as document data using the LZ slide dictionary method, and the compression target starting from the next position of the compression processed data in the compression processing target data. A search unit that searches for candidates of the same data string as the data string from the compressed data by the hash method and the data string searched by this searching unit and the compression target data string are compared and they match. The comparison means for obtaining the data length, the repetition means for repeatedly executing the search and comparison of the candidates of the same data string as the compression target data string from the compression processed data until there are no search candidates, and the comparison means The selection means for selecting and outputting the data string having the longest data length from the matched data lengths and the data string for compression Characterized in place and a registration means for registering the hash table as a search candidate by Interview method.

【００１０】この構成の場合、繰返し手段は、検索及び
比較を所定回数繰返したら、その後の検索及び比較を打
切るように構成することが良い。また、繰返し手段にお
ける打切り回数は、３乃至２０回の何れかに設定するこ
とが好ましい。更に、登録手段を、圧縮対象データ列を
ハッシュ法による検索候補のうちの最初に検索される位
置に登録するように構成することも良い。そして、比較
手段により最長の一致データ長を検出したときの検索回
数を統計的に記憶することにより、繰返し手段における
打切り回数を自動的に最適化して決定するように構成す
ることも好ましい構成である。In the case of this configuration, it is preferable that the repeating means is configured to terminate the subsequent search and comparison after repeating the search and comparison a predetermined number of times. Further, it is preferable to set the number of times of termination in the repeating means to any of 3 to 20 times. Further, the registration means may be configured to register the data string to be compressed at the first searched position among the search candidates by the hash method. It is also preferable that the number of times of retrieval when the longest matching data length is detected by the comparison means is statistically stored so that the number of times of termination in the repeating means is automatically optimized and determined. .

【００１１】上記手段によれば、圧縮対象データ列と同
じデータ列を圧縮処理済みデータの中からハッシュ法に
よって検索するから、データ列を高速で検索することが
できる。そして、このハッシュ法による検索時には、メ
モリ空間にハッシュ表を作成するが、このハッシュ表に
は、検索候補のデータ列の位置（圧縮処理済みデータ中
における位置）を登録するだけであり、実際のデータを
登録しないから、ハッシュ表はそれほど大きなメモリ空
間を使用しない。また、ハッシュチェーンも、検索候補
のデータ列の位置をチェーン状に登録するだけであり、
実際のデータを登録しないから、それほど大きなメモリ
空間を使用しない。According to the above means, since the same data string as the data string to be compressed is searched from the compressed data by the hash method, the data string can be searched at high speed. Then, at the time of searching by this hash method, a hash table is created in the memory space, but in this hash table, only the position of the search candidate data string (the position in the compressed data) is registered, and the actual The hash table does not use much memory space because it does not register data. In addition, the hash chain also only registers the position of the search candidate data string in a chain,
It doesn't use much memory space because it doesn't register the actual data.

【００１２】また、検索及び比較を所定回数繰返した
後、打切るように構成すると、データの検索をより一層
高速に実行できる。更に、検索及び比較を打切る回数
を、３乃至２０回に設定すると、データの圧縮効率を十
分良好なものとしながら、圧縮処理に要する時間も十分
短くすることができる。加えて、圧縮対象データ列をハ
ッシュ法による検索候補のうちの最初に検索される位置
に登録するように構成すると、最長の一致データ長を早
く得る確率を高くすることができる。また、比較ステッ
プにおいて、最長の一致データ長を検出したときの検索
回数を統計的に記憶することにより、繰返しステップに
おける打切り回数を自動的に最適化して決定するように
構成すると、データ圧縮効率及び圧縮処理速度を最適化
することができる。If the search and comparison are repeated a predetermined number of times and then terminated, the data search can be executed at a higher speed. Furthermore, by setting the number of times of aborting the search and comparison to 3 to 20, it is possible to sufficiently shorten the time required for the compression process while making the data compression efficiency sufficiently good. In addition, if the compression target data string is registered in the first searched position among the search candidates by the hash method, it is possible to increase the probability that the longest matching data length can be obtained quickly. Further, when the comparison step is configured to statistically store the number of searches when the longest matching data length is detected and to automatically optimize and determine the number of truncations in the repeating step, the data compression efficiency and The compression processing speed can be optimized.

【００１３】[0013]

【発明の実施の形態】以下本発明の第１実施例につい
て、図１乃至図１０を参照して説明する。データ圧縮装
置の電気的構成を示す図１において、データの入出力手
段であるデータ入出力装置１は例えばＳＣＳＩインター
フェイスなどからなり、圧縮対象となるデータファイル
が記憶されている例えばハードディスクやフロッピーデ
ィスクなどの図示しない外部記憶装置に対してデータの
入出力を行う機能を有している。上記データ入出力装置
１は、マイコンなどからなる制御回路２とアドレス及び
データバスライン並びに制御信号線を介して接続されて
いる。上記制御回路２には、メモリ３がアドレス及びデ
ータバスライン並びに制御信号線を介して接続されてい
る。BEST MODE FOR CARRYING OUT THE INVENTION A first embodiment of the present invention will be described below with reference to FIGS. In FIG. 1 showing an electrical configuration of a data compression device, a data input / output device 1 which is a data input / output means is composed of, for example, a SCSI interface, and a data file to be compressed is stored in, for example, a hard disk or a floppy disk. It has a function of inputting / outputting data to / from an external storage device (not shown). The data input / output device 1 is connected to a control circuit 2 including a microcomputer via address and data bus lines and control signal lines. A memory 3 is connected to the control circuit 2 via address and data bus lines and control signal lines.

【００１４】また、制御回路２には、データ圧縮処理を
実行制御するための制御プログタムが記憶されている。
そして、上記制御回路２は、検索手段である検索ロジッ
ク４、比較手段である一致比較ロジック５、選択手段で
ある選択ロジック６、データ圧縮ロジック７、並びに、
登録手段である登録ロジック８としての各機能を有する
ように構成されている。そして、制御回路２は、繰返し
手段としての機能をも有している。以下、上記各ロジッ
クの機能について説明する。まず、検索ロジック４は、
メモリ３上に読込まれた圧縮対象データファイル中の圧
縮対象データ列に対してハッシング（ハッシュ関数によ
る演算）を行い、その圧縮対象データ列と同じデータ列
を、後述する検索表に先頭位置を示す位置情報が書込ま
れて登録されているデータ列の中から検索する機能を有
している。The control circuit 2 also stores a control program for controlling execution of data compression processing.
The control circuit 2 includes a search logic 4 as a search means, a match comparison logic 5 as a comparison means, a selection logic 6 as a selection means, a data compression logic 7, and
It is configured to have each function as a registration logic 8 which is a registration means. The control circuit 2 also has a function as a repeating unit. The functions of the above logics will be described below. First, the search logic 4
Hashing (computation by a hash function) is performed on the compression target data string in the compression target data file read on the memory 3, and the same data string as the compression target data string is shown in the search table to be described later with the head position. It has a function of searching from a data string in which position information is written and registered.

【００１５】一致比較ロジック５は、圧縮対象データ列
と、検索ロジック４によって検索された先頭位置から始
まるデータ列とを１データ（例えば１文字）ずつ比較し
て、両者で一致しているデータ長を出力する機能を有し
ている。選択ロジック６は、一致比較ロジック５によっ
て出力される一致データ長が最長であるデータ列を、検
索ロジック４により検索された先頭位置から始まるデー
タ列の中から選択する機能を有している。データ圧縮ロ
ジック７は、選択ロジック６によって選択されたデータ
列の先頭位置と、一致比較ロジック５がそのデータ列に
ついて出力した一致データ長とを一致情報として出力す
るものである。そして、登録ロジック８は、圧縮処理対
象データ列の先頭位置を検索候補である位置情報として
ハッシュ表に登録するものである。The match comparison logic 5 compares the data string to be compressed and the data string starting from the head position searched by the search logic 4 one by one (for example, one character), and the data lengths that match each other. Has the function of outputting. The selection logic 6 has a function of selecting the data string having the longest match data length output from the match comparison logic 5 from the data strings starting from the head position searched by the search logic 4. The data compression logic 7 outputs the start position of the data string selected by the selection logic 6 and the matched data length output by the match comparison logic 5 for the data string as matching information. Then, the registration logic 8 registers the start position of the compression processing target data string in the hash table as position information that is a search candidate.

【００１６】次に、上記実施例の作用について図２乃至
図１０を参照して説明する。尚、図２及び図３に示すフ
ローチャートは、制御回路２の制御プログラムの制御内
容、即ち、データ圧縮処理の概略制御内容を示すもので
ある。また、圧縮対象データ（ファイル）として、図４
（ａ）に示すような文字を並べて成るデータを用いる。
この文字列のデータをデータ圧縮装置によって圧縮する
場合、操作者はデータ圧縮装置の図示しないキーボード
などの入力手段を操作することにより、圧縮対象データ
ファイルのファイル名を指定すると共に、圧縮処理の実
行コマンドを入力する。すると、制御回路２は、上記指
定されたデータファイルをデータ入出力装置１を介して
外部記憶装置から読込んでメモリ３上に転送した後、デ
ータ圧縮処理を開始する。Next, the operation of the above embodiment will be described with reference to FIGS. The flow charts shown in FIGS. 2 and 3 show the control contents of the control program of the control circuit 2, that is, the general control contents of the data compression processing. In addition, as the data (file) to be compressed, FIG.
Data composed of characters arranged as shown in (a) is used.
When compressing the data of this character string by the data compression device, the operator operates the input means such as a keyboard (not shown) of the data compression device to specify the file name of the data file to be compressed and execute the compression process. Enter the command. Then, the control circuit 2 reads the specified data file from the external storage device via the data input / output device 1, transfers it to the memory 3, and then starts the data compression process.

【００１７】まず、図２のフローチャートにおいて、
「初期処理」の処理ステップＲ１を実行する。ここで
は、制御回路２は、メモリ３における圧縮処理に使用す
る領域のの初期化などを行う。続いて、「検索処理」の
処理ステップＲ２に移行する。この「検索処理」は、図
３のフローチャートによって示されるサブルーチンとし
て定義されており、以下この図３のフローチャートに従
って説明する。First, in the flow chart of FIG.
The processing step R1 of "initial processing" is executed. Here, the control circuit 2 initializes an area used for compression processing in the memory 3. Then, the process proceeds to the process step R2 of "search process". This "search process" is defined as a subroutine shown by the flowchart of FIG. 3, and will be described below with reference to the flowchart of FIG.

【００１８】図３に示すように、制御回路２は、メモリ
３上に設けられた検索処理に関する領域の初期化を行い
（ステップＳ１）、続いて、次の「データ検索」を実行
する（ステップＳ２）。ここでは、検索ロジック４によ
って、ハッシングを行うためにメモリ３より図４（ａ）
に示すデータ列の０番目（第０アドレス）の文字データ
「Ａ」と、１番目（第１アドレス）の文字データ「Ｂ」
の２文字を読出す。この場合、ハッシュ関数として排他
的論理和を選び、第０アドレスの文字データ「Ａ」のＪ
ＩＳコード0x41と、第１アドレスの文字データ「Ｂ」の
ＪＩＳコード0x42との排他的論理和を論理演算して、ハ
ッシュ値0x03を得る。そして、このハッシュ値0x03によ
ってハッシュ表の検索を行うように構成されている。As shown in FIG. 3, the control circuit 2 initializes the area related to the search processing provided on the memory 3 (step S1), and then executes the next "data search" (step S1). S2). Here, in order to perform hashing by the search logic 4, FIG.
0th (0th address) character data “A” and 1st (1st address) character data “B” of the data string shown in
Read the two characters. In this case, the exclusive OR is selected as the hash function, and J of the character data “A” at the 0th address is selected.
The hash value 0x03 is obtained by performing the logical operation of the exclusive OR of the IS code 0x41 and the JIS code 0x42 of the character data “B” at the first address. Then, the hash table is configured to be searched by the hash value 0x03.

【００１９】上記ハッシュ表はハッシュ値をインデック
スとして、そのハッシュ値を持つ文字列の先頭位置（ア
ドレス）が位置情報として登録されている表（データテ
ーブル）であり、メモリ３上に展開されている。尚、前
記ステップＲ１の初期化において、ハッシュ表及び後述
するハッシュチェーンには例えば全ビット「１」のデー
タ（0xFFFF…）が書込まれるように構成されており、こ
れにより、データの登録が無い状態となっている。従っ
て、この時点でハッシュ値0x03によってハッシュ表のデ
ータ検索を行って登録されているデータを読出すと、”
登録データなし”を検出する（図５参照）。尚、ハッシ
ュ表とハッシュチェーンとを合わせたものを、検索表
（検索対象）として定義する。The hash table is a table (data table) in which the head position (address) of a character string having the hash value is registered as position information using the hash value as an index, which is expanded on the memory 3. . Incidentally, in the initialization of the step R1, for example, data (0xFFFF ...) Of all bits “1” is written in the hash table and a hash chain which will be described later, so that no data is registered. It is in a state. Therefore, at this point, if the registered data is read by performing a data search on the hash table with the hash value 0x03,
“No registered data” is detected (see FIG. 5). A combination of the hash table and the hash chain is defined as a search table (search target).

【００２０】続いて、「登録データなしか？」の判断を
行う（ステップＳ３）。この時点では”登録データな
し”を検出しているので、ステップＳ３にて「ＹＥＳ」
へ進み、図２に示すステップＲ３へ移行し、「検索結果
出力」の処理を実行する。ここでは、データ圧縮ロジッ
ク７によって、ステップＲ２で行われた検索処理の結果
を、出力データ列として図４（ｂ）に示すような形式で
出力するものである。尚、図４（ｂ）に示すように、圧
縮処理された出力データ列の第０アドレスには管理部が
配置されるが、この管理部の設定については後述する。Then, it is judged whether or not there is no registered data (step S3). Since "no registered data" is detected at this point, "YES" in step S3.
Then, the process proceeds to step R3 shown in FIG. 2, and the process of "search result output" is executed. Here, the data compression logic 7 outputs the result of the search process performed in step R2 as an output data string in a format as shown in FIG. 4B. As shown in FIG. 4B, the management unit is arranged at the 0th address of the compressed output data string, and the setting of this management unit will be described later.

【００２１】また、上記ステップＲ３においては、ステ
ップＲ２の検索処理の結果、最長一致データ長が所定値
（例えば「３」）以上か否かを判断してデータを出力す
る。具体的には、この時点では、ステップＳ３で登録デ
ータを検出できなかったので一致情報の出力は行われ
ず、最長一致データ長は初期化されたまま「０」であ
り、所定値「３」未満であるから、第１アドレスに文字
データ「Ａ」をそのまま出力する。そして、「ハッシュ
表への登録」の処理を実行する（ステップＲ４）。ここ
では、ハッシュ表への位置情報の登録を行う。この登録
処理は、図６乃至１０に示す概念図のようにして実行さ
れる。Further, in the step R3, it is judged whether or not the longest matching data length is a predetermined value (for example, "3") or more as a result of the search processing in the step R2, and the data is output. Specifically, at this time point, the registration data could not be detected in step S3, so that the matching information is not output, and the longest matching data length is "0" as it is initialized and less than the predetermined value "3". Therefore, the character data "A" is directly output to the first address. Then, the process of "registration in hash table" is executed (step R4). Here, the position information is registered in the hash table. This registration process is executed as in the conceptual diagrams shown in FIGS.

【００２２】具体的には、この時点での検索結果は、ハ
ッシュ表のインデックス0x03に対して登録されているデ
ータは一つも無かったので、ハッシュ表に先頭のデータ
「Ａ」の位置（アドレス）を示す位置情報「０」を、”
登録データなし”を示すコードに上書きして登録する
（図６参照）。そして、圧縮対象データ列のポインタを
インクリメントして、検索ロジック４が次に第１アドレ
スの文字データ「Ｂ」に注目するようにする。また、ル
ープカウンタをインクリメントする。Specifically, in the search result at this time, since there is no data registered for the index 0x03 of the hash table, the position (address) of the first data "A" in the hash table. Position information "0" indicating
Registration is performed by overwriting the code indicating "no registration data" (see FIG. 6). Then, the pointer of the data string to be compressed is incremented and the search logic 4 next focuses on the character data "B" at the first address. To do so. Also, the loop counter is incremented.

【００２３】続いて、「処理したデータ数分ループした
か？」の判断を行う（ステップＲ５）。ここでは、処理
データ数とループカウンタのカウント値とを比較するこ
とにより、ステップＲ４におけるハッシュ表への登録処
理を処理データ数分ループして実行したか否かを判断す
る。ここで、処理データ数は、選択ロジック６によって
得られる最長一致データ長が「３」以上の場合、後述す
るように圧縮処理が行われるため、その一致データ長に
等しく設定される数値である。そして、最長一致データ
長が「０」または「２」の場合は圧縮処理は行われず、
１文字データがそのまま出力されるので、処理データ数
は「１」と設定される数値である（最長一致データ長
「１」は出力されない）。Then, it is judged whether "looped for the number of processed data?" (Step R5). Here, by comparing the number of pieces of processed data with the count value of the loop counter, it is determined whether or not the registration processing in the hash table in step R4 has been looped for the number of pieces of processed data. Here, when the longest matching data length obtained by the selection logic 6 is "3" or more, the number of pieces of processing data is a numerical value set equal to the matching data length because compression processing is performed as described later. If the longest matching data length is “0” or “2”, the compression process is not performed,
Since one character data is output as it is, the number of processed data is a numerical value set as "1" (the longest matching data length "1" is not output).

【００２４】そして、この時点では、最長一致データ長
は「０」であって処理データ数は「１」に設定されてい
るから、ループカウンタのカウント値「１」に等しく、
判断ステップＲ５において「ＹＥＳ」へ進み、「データ
終わり？」の判断を行う（ステップＲ６）。ここでは、
圧縮対象データ列を最後まで処理したか否かが判断され
るが、この時点ではまだ終りでないから、ステップＲ６
にて「ＮＯ」へ進み、ステップＲ２へ戻る。尚、ステッ
プＲ４及びＲ５は、登録ステップに対応している。At this point in time, the longest matching data length is "0" and the number of processed data is set to "1", which is equal to the count value "1" of the loop counter.
In judgment step R5, the process proceeds to "YES" to judge "end of data?" (Step R6). here,
It is determined whether or not the compression target data string has been processed to the end. However, at this point, the process is not yet completed, so step R6
Then, the process proceeds to "NO" and returns to step R2. The steps R4 and R5 correspond to the registration step.

【００２５】続いて、ステップＲ２においては、前回と
同様にしてステップＳ１で最長一致データ長の初期化が
行われた後、ステップＳ２に移行して、今度は第１アド
レスの文字データ「Ｂ」と第２アドレスの文字データ
「Ｃ」とでハッシングを行い、ハッシュ値0x01を得る。
そして、インデックス0x01によってハッシュ表の検索を
行うが、やはり”登録データなし”を検出し、次のステ
ップＳ３で「ＮＯ」へ進み、ステップＲ３へ移行する。Then, in step R2, the longest match data length is initialized in step S1 as in the previous time, and then the process proceeds to step S2, this time the character data "B" of the first address. And hashing is performed with the character data “C” at the second address to obtain a hash value 0x01.
Then, the hash table is searched by the index 0x01, but "no registered data" is detected, and the process proceeds to "NO" in the next step S3 and proceeds to step R3.

【００２６】このステップＲ３においては、前回と同様
にして、出力データ列の第２アドレスに文字データ
「Ｂ」をそのまま出力する。そして、ステップＲ４に移
行し、ハッシュ表のインデックス0x01の領域に文字デー
タ「Ｂ」の位置（アドレス）を示す位置情報「１」を書
込んで登録する。また、圧縮対象データ列のポインタを
インクリメントして次は第２アドレスの文字データ
「Ｃ」に注目するようにしてから、次のステップＲ５に
移行する。このステップＲ５では、１文字だけ処理した
ので「ＹＥＳ」へ進み、ステップＲ６に移行する。この
ステップＲ６では、データはまだ終りではないので「Ｎ
Ｏ」へ進み、ステップＲ２に戻る。In step R3, the character data "B" is output as it is to the second address of the output data string, as in the previous time. Then, the process proceeds to step R4, and the position information "1" indicating the position (address) of the character data "B" is written and registered in the area of the index 0x01 of the hash table. Further, the pointer of the data string to be compressed is incremented to focus on the character data "C" at the second address, and then the process proceeds to the next step R5. Since only one character has been processed in step R5, the process proceeds to "YES" and proceeds to step R6. At this step R6, the data is not yet over, so "N
O ”, and returns to step R2.

【００２７】次の第３及び第４アドレスの文字列「Ｃ
Ｄ」、また第４及び第５アドレスの文字列「ＤＡ」につ
いても上記と同様に一致情報は出力されず、出力データ
列には文字データ「Ｃ」及び「Ｄ」をそのまま出力し、
ハッシュ表の各ハッシュ値のインデックスが示す領域に
は、位置情報「２」及び「３」がそれぞれ書込まれる。A character string "C" of the following third and fourth addresses
D ”and the character string“ DA ”of the fourth and fifth addresses are not output the same as above, and the character data“ C ”and“ D ”are directly output to the output data string.
Position information "2" and "3" are written in the areas indicated by the indexes of the hash values in the hash table.

【００２８】そして、圧縮処理対象データ列のポインタ
が「４」になると、ステップＳ２において第４及び第５
アドレスの文字列「ＡＢ」に対してハッシングが行われ
る。すると、ハッシュ値0x03を得るので、ハッシュ表の
検索を行うと登録データの位置情報「０」を得る（図６
参照）。よって、次の判断ステップＳ３では「ＮＯ」へ
進み、次の「同じデータ列か？」の判断ステップＳ４に
移行する。When the pointer of the data string to be compressed becomes "4", the fourth and fifth pointers are obtained in step S2.
Hashing is performed on the character string "AB" of the address. Then, since the hash value 0x03 is obtained, when the hash table is searched, the position information “0” of the registered data is obtained (FIG. 6).
reference). Therefore, in the next determination step S3, the process proceeds to "NO", and the process proceeds to the next "Same data string?" Determination step S4.

【００２９】ここで、ハッシュ法は大集合から小集合へ
の写像を行うものであるから、異なるデータ列でも同一
のハッシュ値になる場合（衝突）がある。本実施例で
は、同じハッシュ値を示す文字列の位置情報は、全て同
一のインデックスに連なる連鎖的なデータ，所謂ハッシ
ュチェーン（以下、単にチェーンと称す）に登録される
ので、判断ステップＳ４においては、位置情報「０」の
データ列が現在の圧縮処理対象データ列と同じものであ
るかが確認される。そのため、判断ステップＳ４では、
一致比較ロジック５によって、圧縮処理対象データ列の
第０アドレスのデータ「Ａ」を読出して、第４アドレス
のデータ「Ａ」と比較し、一致の結果を得る。ここで、
同じハッシュ値において第１データが一致していれば第
２データも必ず一致するので確認は終了する。そして、
次の「１データずつ比較」の処理ステップＳ５に移行す
る。Since the hash method is a mapping from a large set to a small set, there are cases where different data strings have the same hash value (collision). In the present embodiment, the position information of character strings indicating the same hash value are all registered in chained data linked to the same index, that is, a so-called hash chain (hereinafter, simply referred to as a chain). , It is confirmed whether the data string of the position information “0” is the same as the current data string to be compressed. Therefore, in the determination step S4,
The match comparison logic 5 reads the data “A” at the 0th address of the data string to be compressed and compares it with the data “A” at the fourth address to obtain a match result. here,
If the first data is the same in the same hash value, the second data is also the same, so the confirmation ends. And
Then, the process proceeds to step S5 of the next "compare one data at a time".

【００３０】この処理ステップＳ５においては、文字列
の第３データ以降を１データずつ比較するため、一致比
較ロジック５により第２アドレスのデータ「Ｃ」及び第
６アドレスのデータ「Ｃ」を読出して比較し、一致の結
果を得る。次に第３アドレスのデータ「Ｄ」及び第７ア
ドレスのデータ「Ｅ」を読出して比較すると不一致の結
果を得るので、ここまでの一致データ長「３」を、メモ
リ３の現在の一致データ長の記憶領域に書込んで記憶さ
せる。尚、ステップＳ４及びＳ５は、比較ステップに対
応している。In the processing step S5, since the third and subsequent data of the character string are compared one by one, the coincidence comparison logic 5 reads the data "C" at the second address and the data "C" at the sixth address. Compare and get a match result. Next, when the data “D” at the third address and the data “E” at the seventh address are read and compared, a non-match result is obtained. Therefore, the match data length “3” up to this point is compared with the current match data length of the memory 3. To be stored in the memory area. Note that steps S4 and S5 correspond to the comparison step.

【００３１】続いて、「最長一致長か？」の判断を行う
（ステップＳ６）。ここでは、選択ロジック７によっ
て、ステップＳ５において得られた現在の一致データ長
「３」が、最長一致データ長の領域に記憶されている内
容より大きいか否かが判断される。この場合、最長一致
データ長はステップＳ１で初期化されており「０」であ
るから、ステップＳ６にて「ＹＥＳ」へ進み、「一致位
置と最長一致長を更新」の処理を実行する（ステップＳ
７）。Then, it is judged whether "the longest matching length?" (Step S6). Here, the selection logic 7 determines whether or not the current matching data length “3” obtained in step S5 is larger than the content stored in the area of the longest matching data length. In this case, since the longest match data length is initialized at step S1 and is "0", the process proceeds to "YES" at step S6 and the process of "update match position and longest match length" is executed (step S
7).

【００３２】このステップＳ７においては、メモリ３上
の一致位置の記憶領域に「０」を、最長一致データ長の
記憶領域に「３」を書込んで記憶させる。そして、次の
「チェーンの次をとる」の処理ステップＳ８に移行す
る。尚、ステップＳ６及びＳ７は、選択ステップに対応
する。In step S7, "0" is written in the storage area of the matching position on the memory 3 and "3" is written in the storage area of the longest matching data length to be stored. Then, the process proceeds to the next "take the next chain" processing step S8. Note that steps S6 and S7 correspond to the selection step.

【００３３】上記処理ステップＳ８においては、インデ
ックス0x03のチェーンの次の（この場合、第１）データ
を読出す。すると、”登録データなし”のコードが読出
されるので（図６参照）、次のステップＳ３では「Ｎ
Ｏ」へ進み、ステップＲ３へ移行する。尚、ステップＳ
２は検索ステップに対応し、ステップＳ３及びＳ８は繰
返しステップに対応する。In the processing step S8, the next (first in this case) data of the chain of index 0x03 is read. Then, the code of "no registered data" is read (see FIG. 6), so "N" is displayed in the next step S3.
Go to "O" and move to step R3. Incidentally, step S
2 corresponds to the search step, and steps S3 and S8 correspond to the iterative step.

【００３４】そして、ステップＲ３では、ステップＲ２
の検索処理で得られた最長一致データ長が所定値「３」
以上であるので、データ圧縮ロジック７によって、出力
データ列の第５アドレスに一致情報として一致位置
「０」及び一致データ長「３」を、例えば各１バイトサ
イズとした２バイトサイズのデータとして書込む。そし
て、第５アドレスが一致情報であることを示すため、第
０アドレスの管理部に設定を行う。Then, in step R3, step R2
The longest matching data length obtained by the search process of is a predetermined value "3"
As described above, the data compression logic 7 writes the matching position “0” and the matching data length “3” as the matching information to the fifth address of the output data string, for example, as the data of the 2-byte size with each 1-byte size. Put in. Then, in order to show that the fifth address is the matching information, the management unit of the 0th address is set.

【００３５】この管理部は、他の文字データと同じ１バ
イトサイズのデータであり、出力データ列の第０アドレ
スからデータ８個おきに挿入され、その内容は、その後
に続く８個のデータの内何データ目がデータ圧縮処理の
結果により出力された一致情報であるかを示すものであ
る。例えば、この場合設定を行うには、第０アドレスの
管理部の内容を読出すと、既に「１」が立てられている
ビットをクリアせずに新たな情報を付加するため、その
内容と第４ビット（第５アドレスに対応する）のみ
「１」を立てた１バイトデータとを排他的論理和により
論理演算する。この時点では、管理部の内容はステップ
Ｒ１における初期設定時に０クリアされているので、排
他的論理和により論理演算した結果は、図４（ｃ）に示
すように第４ビットのみ「１」となる。そして、ステッ
プＲ４に移行する。This management unit is data of the same 1-byte size as other character data, and is inserted every 8th data from the 0th address of the output data string, and its content is the data of the following 8 data. It shows what number of data is the matching information output as a result of the data compression processing. For example, in order to perform the setting in this case, when the content of the management unit at the 0th address is read out, new information is added without clearing the bit for which "1" has already been set. Logical operation is performed by exclusive OR with 4-byte (corresponding to the fifth address) 1-byte data in which "1" is set. At this point, the contents of the management section are cleared to 0 at the time of initialization in step R1, so the result of the logical operation by the exclusive OR is only the 4th bit as "1" as shown in FIG. 4 (c). Become. Then, the process proceeds to step R4.

【００３６】ステップＲ４においては、位置情報「４」
をハッシュ表に登録する。この時点では、図６に示すよ
うに、ハッシュ表のインデックス0x03が示す領域には位
置情報「０」が既に登録されているので、その位置情報
「０」をチェーンの第１領域に送出する。そして、ハッ
シュ表のインデックス0x03が示す領域に位置情報「４」
を書込んで更新登録する。この更新登録後の状態を図７
に示す。その後、圧縮処理対象データ列のポインタをイ
ンクリメントして「５」にすると、ループカウンタをイ
ンクリメントした後ステップＲ５に移行する。In step R4, position information "4"
Is registered in the hash table. At this point in time, as shown in FIG. 6, since the position information "0" has already been registered in the area indicated by the index 0x03 in the hash table, the position information "0" is sent to the first area of the chain. Then, the location information “4” is displayed in the area indicated by the index 0x03 in the hash table.
To write and register for update. The state after this update registration is shown in FIG.
Shown in After that, when the pointer of the data string to be compressed is incremented to "5", the loop counter is incremented and then the process proceeds to step R5.

【００３７】ステップＲ５においては、この時点での処
理データ数は、ステップＳ５において選択ロジック６に
より得られた最長一致データ長が「３」であるから
「３」と設定されており、ループカウンタは１であるか
ら「ＮＯ」と判断して、この後を２回ループしてステッ
プＲ４を２回実行する。即ち、文字列「ＢＣ」でハッシ
ングしてその位置情報「５」と、文字列「ＣＥ」でハッ
シングしてその位置情報「６」とを、ハッシュ表のそれ
ぞれのインデックスが示す領域に登録する。この処理の
終了時点で、圧縮処理対象データ列のポインタは「７」
である。そして、ステップＲ６では、データはまだ終了
ではないので「ＮＯ」と判断して、ステップＲ２に移行
する。At step R5, the number of processed data at this time is set to "3" because the longest matching data length obtained by the selection logic 6 at step S5 is "3", and the loop counter is set to Since it is 1, it is determined to be “NO”, and after that, step R4 is executed twice by looping twice. That is, the position information “5” by hashing the character string “BC” and the position information “6” by hashing the character string “CE” are registered in the areas indicated by the respective indexes of the hash table. At the end of this processing, the pointer of the data string to be compressed is "7".
It is. Then, in step R6, since the data is not yet finished, it is determined to be "NO", and the process proceeds to step R2.

【００３８】ステップＲ２では、第７及び第８アドレス
の文字列「ＥＡ」について、上記と同様な処理により、
出力データ列への出力及びハッシュ表への登録を行う。
以下はデータ列「ＡＢ」に注目して説明する。第８及び
第９アドレスの文字列「ＡＢ」について処理を行う場
合、ステップＳ２でハッシングしてハッシュ表の検索を
行うと、位置情報「４」を得る（図７参照）。ステップ
Ｓ４及びＳ５において、第４アドレスから始まる文字列
について１データずつ比較をおこなうが、得られる一致
データ長は「２」である。従って、ステップＳ６では初
期化された最長一致データ長「０」と比較して「ＹＥ
Ｓ」と判断して、ステップＳ７に移行する。そして、ス
テップＳ７において一致位置「４」及び最長一致データ
長「２」を更新登録すると、ステップＳ８に移行する。In step R2, the character string "EA" of the seventh and eighth addresses is processed by the same process as above.
Output to output data string and register in hash table.
The following description will be focused on the data string “AB”. When processing is performed on the character strings "AB" of the eighth and ninth addresses, the hash table is searched by hashing in step S2, and position information "4" is obtained (see FIG. 7). In steps S4 and S5, the character strings starting from the fourth address are compared one by one, and the obtained matching data length is "2". Therefore, in step S6, a comparison is made with the initialized longest matching data length "0" and "YE
S "is determined and the process proceeds to step S7. When the matching position “4” and the longest matching data length “2” are updated and registered in step S7, the process proceeds to step S8.

【００３９】ステップＳ８においては、インデックス0x
03のチェーンの第１領域に格納されたデータを読出す。
すると、位置情報「０」を得るので、ステップＳ３では
「ＮＯ」と判断してステップＳ４に移行し、更に、ステ
ップＳ４では「ＹＥＳ」と判断してステップＳ５に移行
する。この場合も、ステップＳ５において得られる一致
データ長は「２」であるので、以下同様にステップＳ
６，Ｓ８，Ｓ９，Ｓ１０，Ｓ３と移行して、ステップＳ
１０ではチェーンの第２領域に格納された”データな
し”コードを得るので、ステップＳ３では「ＹＥＳ」と
判断してステップＲ３に移行する。In step S8, the index 0x
The data stored in the first area of the 03 chain is read.
Then, since the position information "0" is obtained, it is determined to be "NO" in step S3 and the process proceeds to step S4, and further, "YES" is determined in step S4 and the process proceeds to step S5. In this case as well, since the matching data length obtained in step S5 is "2", the same applies to step S below.
6, S8, S9, S10, S3, step S
In "10", the "no data" code stored in the second area of the chain is obtained, so that "YES" is determined in step S3 and the process proceeds to step R3.

【００４０】ステップＲ３においては、一致データ長が
「２」の場合はデータ圧縮効果がないので、先頭の文字
データ「Ａ」をそのまま出力データ列に出力する。そし
て、ステップＲ５でハッシュ表に登録されている位置情
報「４」をチェーンに送出し、位置情報「８」をハッシ
ュ表に登録する（図８参照）。以下、文字列「ＢＢ」及
び「ＢＡ」についても同様の処理を行う。In step R3, when the matching data length is "2", there is no data compression effect, so the leading character data "A" is output as it is to the output data string. Then, in step R5, the position information “4” registered in the hash table is sent to the chain, and the position information “8” is registered in the hash table (see FIG. 8). Hereinafter, the same processing is performed for the character strings “BB” and “BA”.

【００４１】次に、第１１及び第１２アドレスの文字列
「ＡＢ」について、検索処理を行う。まず、ハッシュ表
に登録された位置情報「８」について文字列の比較を行
うと、３文字目で異なるので一致データ長「２」を得
る。そして、次の検索処理でチェーンの第１領域に格納
された位置情報「４」を得て文字列の比較を行い、やは
り一致データ長「２」を得る。更に、次の検索処理でチ
ェーンの第２領域に格納された位置情報「０」を得て文
字列の比較を行う。すると、やはり一致データ長「２」
を得る。更に次の検索処理では”データなし”を検出し
て検索処理を終了するが、最長一致データ長は「２」で
あるから、結果として圧縮処理は実行されない。Next, a search process is performed for the character string "AB" of the eleventh and twelfth addresses. First, when the character strings are compared for the position information “8” registered in the hash table, the matching data length “2” is obtained because the third character is different. Then, in the next search process, the position information “4” stored in the first area of the chain is obtained, the character strings are compared, and the matching data length “2” is also obtained. Further, the positional information “0” stored in the second area of the chain is obtained in the next search process, and the character strings are compared. Then the matching data length is "2"
Get. Further, in the next search process, "no data" is detected and the search process is ended, but the longest matching data length is "2", and as a result, the compression process is not executed.

【００４２】以降、文字列「ＡＢ」，即ちインデックス
0x03のチェーンは、図９及び図１０に示すように、入力
データの位置情報「１１」と「１６」がハッシュ表に順
次登録されることにより、位置情報データが順次チェー
ンに送出されて次第に長くなって行く。その中で最長の
一致データ長が得られたものを一致情報として文字列の
代わりに出力データ列に出力する。以下同様にして、デ
ータ圧縮処理を圧縮処理対象データの全てについて行う
と、ステップＲ６で「ＹＥＳ」へ進み、「後処理」の処
理ステップＲ７に移行する。このステップＲ７では、図
４（ｂ）に示す圧縮処理後のデータであるメモリ３上の
出力データを、データ入出力装置１によって外部記憶装
置へ出力して、データ圧縮処理を終了する。Thereafter, the character string "AB", that is, the index
As shown in FIGS. 9 and 10, the 0x03 chain has the position information “11” and “16” of the input data sequentially registered in the hash table, so that the position information data is sequentially transmitted to the chain and gradually becomes longer. I'm going. The one having the longest matching data length among them is output to the output data string instead of the character string as matching information. Similarly, if the data compression processing is performed for all the compression processing target data, the process proceeds to “YES” in step R6 and proceeds to the processing step R7 of “post-processing”. In this step R7, the output data on the memory 3, which is the data after the compression process shown in FIG. 4B, is output to the external storage device by the data input / output device 1, and the data compression process ends.

【００４３】以上のように本実施例によれば、制御回路
２によって、圧縮処理対象のデータ中における圧縮処理
済みデータの次の位置から始まる圧縮対象データ列と同
じデータ列の候補を圧縮処理済みデータの中からハッシ
ュ法によって検索すると共に、この検索されたデータ列
と圧縮対象データ列とを比較して一致しているデータ長
を求める構成とした。そして、制御回路２によって、圧
縮対象データ列と同じデータ列の候補を圧縮処理済みデ
ータの中から検索候補がなくなるまで上記検索及び比較
を繰返し実行するように構成した。更に、制御回路２に
よって、上記比較により求められた一致データ長の中か
ら最長のデータ長のデータ列を選択して出力すると共
に、圧縮対象データ列をハッシュ法による検索候補とし
てハッシュ表に登録するように構成した。この構成によ
れば、圧縮処理対象データ列と同じデータ列の検索を行
うのにハッシュ法を用いているので、データ列の検索が
高速に実行できると共に、使用するメモリ空間を十分小
さくすることができる。As described above, according to this embodiment, the control circuit 2 compresses the candidates of the same data string as the data string to be compressed starting from the position next to the compressed data in the data to be compressed. The data is searched by the hash method, and the searched data string and the data string to be compressed are compared to obtain the matching data length. Then, the control circuit 2 is configured to repeatedly execute the above-described search and comparison until candidates for the same data string as the data string to be compressed are excluded from the compressed data. Further, the control circuit 2 selects and outputs the data string having the longest data length from the matching data lengths obtained by the comparison, and registers the compression target data string in the hash table as a search candidate by the hash method. As configured. According to this configuration, since the hash method is used to search for the same data string as the compression target data string, the data string can be searched at high speed and the memory space used can be made sufficiently small. it can.

【００４４】図１１は本発明の第２実施例を示すもので
あり、第１実施例と同一部分には同一符号を付して説明
を省略し、以下異なる部分のみ説明する。図１１におい
ては、第１実施例における図２のフローチャートのステ
ップＳ７とステップＳ８との間に、「検索回数カウンタ
＋１」の処理ステップＳ９及び「検索回数＝制限回数
？」の判断ステップＳ１０を挿入するように構成されて
いる。そして、第２の実施例の他の構成は、第１実施例
と同じ構成である。FIG. 11 shows a second embodiment of the present invention. The same parts as those of the first embodiment are designated by the same reference numerals and the description thereof will be omitted. Only different parts will be described below. In FIG. 11, a processing step S9 of “search count counter + 1” and a determination step S10 of “search count = limit count?” Are inserted between steps S7 and S8 of the flowchart of FIG. 2 in the first embodiment. Is configured to. The other structure of the second embodiment is the same as that of the first embodiment.

【００４５】次に、上記第２実施例の作用を説明する。
第２実施例では、ステップＳ７の処理を終えると、次の
「検索回数カウンタ＋１」の処理ステップＳ９に移行す
る。このステップＳ９においては、検索ロジック４によ
って一つのデータ列について検索表を１回検索すると、
ステップＳ１で初期化された検索回数カウンタを１カウ
ントアップして、次の「検索回数＝制限回数？」の判断
ステップＳ１０に移行する。Next, the operation of the second embodiment will be described.
In the second embodiment, when the process of step S7 is completed, the process proceeds to the next "search counter +1" process step S9. In this step S9, when the search table is searched once for one data string by the search logic 4,
The search count counter initialized in step S1 is incremented by 1, and the process proceeds to the next "search count = limit count?" Determination step S10.

【００４６】このステップＳ１０においては、検索回数
カウンタのカウント値が、検索表の検索の制限回数とし
て予め決められている値に等しくなったか否かが判断さ
れる。この場合、制限回数は「３」に設定されている。
上記ステップＳ１０において、検索回数カウンタのカウ
ント値が制限回数に等しくなると、「ＹＥＳ」へ進み、
検索処理を完了する。また、ステップＳ１０において、
検索回数カウンタのカウント値が制限回数に等しくない
と（制限回数より小さいと）、「ＮＯ」へ進み、ステッ
プＳ８に移行する。尚、第１実施例における繰返しステ
ップにステップＳ９及びＳ１０を加えたものが、第２実
施例の繰返しステップに対応する。In step S10, it is determined whether or not the count value of the search counter is equal to a predetermined value as the search limit of the search table. In this case, the limit number of times is set to "3".
When the count value of the search number counter becomes equal to the limit number in step S10, the process proceeds to "YES",
Complete the search process. In step S10,
If the count value of the search number counter is not equal to the limit number (less than the limit number), the process proceeds to "NO" and proceeds to step S8. The repeating step in the first embodiment plus steps S9 and S10 corresponds to the repeating step in the second embodiment.

【００４７】ここで、第２実施例においても、第１実施
例と同じ圧縮処理対象データについて圧縮処理を行うも
のとして、同様に文字列「ＡＢ」に注目して説明する。
図７に示すように、ハッシュ値0x03のインデックスが示
すハッシュ表に文字列「ＡＢ」の位置情報「４」が登録
されている段階までは、ステップＳ９における検索回数
カウンタのカウント値は「３」未満であり、ステップＳ
１０においては「ＮＯ」と判断してステップＳ８に移行
するので、第１実施例と同様に処理される。Here, in the second embodiment as well, the same compression target data as in the first embodiment is subjected to the compression processing, and the character string "AB" will be similarly noted and described.
As shown in FIG. 7, until the position information “4” of the character string “AB” is registered in the hash table indicated by the index of the hash value 0x03, the count value of the search counter in step S9 is “3”. Is less than step S
In step 10, since the determination is "NO" and the process proceeds to step S8, the same process as in the first embodiment is performed.

【００４８】そして、図８に示す、次の位置情報「８」
が登録された段階で、第１１及び第１２アドレスの文字
列「ＡＢ」について検索を行うと、ステップＳ８におい
てチェーンの第２領域にある位置情報「０」を読出した
後、ステップＳ３乃至Ｓ７まで同様に処理を行い、次の
ステップＳ９において検索回数カウンタのカウント値は
制限回数である「３」に達する。すると、ステップＳ１
０にて「ＹＥＳ」へ進むから、それ以上の検索及び比較
を行うこと無く検索処理を打切るように構成されてい
る。Then, the next position information "8" shown in FIG.
When the character string “AB” at the 11th and 12th addresses is searched at the time when is registered, the position information “0” in the second area of the chain is read in step S8, and then steps S3 to S7 are performed. The same process is performed, and in the next step S9, the count value of the search number counter reaches "3" which is the limited number of times. Then, step S1
Since the process proceeds to “YES” at 0, the search process is terminated without performing further search and comparison.

【００４９】また、図９及び１０に示すように、その後
にデータ圧縮処理が進んで位置情報「１１」と「１６」
がハッシュ表に順次登録されることによりチェーンが更
に長くなった場合でも、この後に圧縮処理対象データ列
中に文字列「ＡＢ」が出現した場合は、ステップＲ２で
の検索処理は３回だけ，即ち、位置情報「１６」，「１
１」及び「８」についてのみ行ってそれ以上の検索表の
検索は行なわず、その中で最長の一致データ長が得られ
たものを一致情報として出力データ列に出力するように
構成されている。Further, as shown in FIGS. 9 and 10, the data compression process proceeds thereafter, and the position information "11" and "16" are entered.
Even if the chain becomes longer due to being sequentially registered in the hash table, if the character string “AB” appears in the compression processing target data string after this, the search processing in step R2 is performed only three times. That is, position information "16", "1"
Only "1" and "8" are searched, and no further search is performed in the search table, and the one having the longest matching data length is output to the output data string as matching information. .

【００５０】尚、第２実施例では、検索表内に出現して
いる一致データ列全てについて検索を行わないため、最
長の一致データ長が得られない可能性がある。しかし、
一般的な文章データの特徴として、「最長の一致長を有
するデータ列は、注目している位置の直ぐ近くに現れや
すい」という性質がある。従って、第２実施例のよう
に、最新の一致データ列から３回だけ検索する方式を用
いても、圧縮効率の劣化は殆ど無いのである。In the second embodiment, the longest matching data length may not be obtained because the matching data strings appearing in the search table are not searched. But,
A feature of general text data is that a data string having the longest matching length is likely to appear immediately near the position of interest. Therefore, even if the method of searching the latest matching data string only three times as in the second embodiment is used, the compression efficiency is hardly deteriorated.

【００５１】以上のように第２実施例によれば、検索処
理を行う回数をカウントして、予め３回として設定され
た制限回数に等しくなると検索を打切り、その時点で最
長の一致データ長を示したデータ列の先頭位置とその一
致データ長とを一致情報として出力するように構成した
ので、一致データ列がなくなるまで検索表の検索を反復
する第１実施例とは異なり、検索に要する時間はチェー
ンが長くなった場合でも毎回同じとなり、データ圧縮処
理をより一層高速に実行することができる。しかも、こ
の場合、データ圧縮効率を低下させることはほとんどな
い。As described above, according to the second embodiment, the number of times the search processing is performed is counted, and when the number of times of search processing becomes equal to the preset number of times, the search is aborted and the longest matching data length at that time is determined. Since the start position of the indicated data string and the matching data length thereof are output as matching information, the time required for the search is different from the first embodiment in which the search of the search table is repeated until there is no matching data string. Is the same every time even if the chain becomes long, and the data compression process can be executed even faster. Moreover, in this case, the data compression efficiency is hardly reduced.

【００５２】尚、本発明は上記し且つ図面に記載した実
施例にのみ限定されること無く、次のような変形が可能
である。まず、文字データの１６進数コードをＪＩＳコ
ードとしたが、ＥＢＣＤＩＣコード若しくはＡＳＣＩＩ
コードでも良い。また、ハッシュ関数を排他的論理和と
したが、これに限らず、衝突の発生がなるべく少なくな
るような適当な演算（論理演算に限らない）を適宜選択
するように構成しても良い。更に、ステップＲ３におい
て、一致情報を出力する最長一致データ長の所定値を
「３」としたが、「４」以上の適当な値に設定しても良
い。加えて、ハッシングの対象とする文字数を３文字以
上としても良い。The present invention is not limited to the embodiments described above and illustrated in the drawings, and the following modifications are possible. First, the hexadecimal code of the character data is the JIS code, but the EBCDIC code or ASCII
It can be a code. Further, although the hash function is the exclusive OR, the present invention is not limited to this, and an appropriate calculation (not limited to the logical calculation) that causes the occurrence of collisions as much as possible may be appropriately selected. Further, in step R3, the predetermined value of the longest match data length for outputting the match information is set to "3", but it may be set to an appropriate value of "4" or more. In addition, the number of characters to be hashed may be three or more.

【００５３】また、一致情報のデータ（２バイトのデー
タ）は、一致位置（アドレス）を表わす１バイトのデー
タと、一致データ長を表わす１バイトのデータとから構
成したが、これに限らず、圧縮対象データの大きさと予
想される最大一致データ長との兼合いによって適宜変更
しても良く、例えば一致位置のサイズを１２ビット、一
致データ長のサイズを４ビットとしても良い。更に、一
致情報のデータサイズは２バイトに限らず、圧縮処理対
象データの大きさに応じてビット数が更に必要な場合
は、例えば３バイトとしても良い。この場合、その一致
情報のデータサイズに応じて圧縮効率を考慮し、一致情
報を出力する最低の一致データ長を「３」から「４」に
するなど、適宜変更して良い。Further, the matching information data (2-byte data) is composed of 1-byte data representing the matching position (address) and 1-byte data representing the matching data length, but is not limited to this. The size of the data to be compressed may be appropriately changed depending on the balance with the expected maximum matching data length. For example, the size of the matching position may be 12 bits and the size of the matching data length may be 4 bits. Further, the data size of the matching information is not limited to 2 bytes, and may be 3 bytes, for example, if the number of bits is further required according to the size of the compression processing target data. In this case, the compression efficiency may be considered according to the data size of the matching information, and the minimum matching data length for outputting the matching information may be appropriately changed, such as from “3” to “4”.

【００５４】更にまた、圧縮処理対象データは、アルフ
ァベットなどからなる言語の文章に限らず、日本語など
のその他の言語でも良い。また、言語の１文字を表現す
る１６進数コードが２バイトである場合は、管理部を、
そのデータサイズを２バイトとして、１６文字毎に１個
付加するように構成しても良い。Furthermore, the data to be compressed is not limited to sentences in a language consisting of alphabets, but may be other languages such as Japanese. Also, if the hexadecimal code that represents one character of the language is 2 bytes,
The data size may be 2 bytes, and one may be added for every 16 characters.

【００５５】また、反復ロジックにおける検索の制限回
数（打切り回数）を３回としたが、この回数は、データ
圧縮効率と処理速度とを考慮して、どちらをより重視す
るかによって適宜決定すれば良い。例えば、圧縮効率を
重視する場合は検索制限回数を大きな値に設定し、処理
速度を重視する場合は検索制限回数を小さな値に設定す
る。データ圧縮効率と処理速度との適当なバランスを重
視して決定する場合は、検索制限回数は３乃至２０回の
何れかに設定することが望ましい。この場合、最も好ま
しい制限回数は、１０回前後である。また、圧縮処理対
象とするデータの性質に応じて、ユーザーが検索の制限
回数を指定可能にするように構成しても良い。Further, although the number of times of retrieval (the number of times of termination) in the iterative logic is set to 3, the number of times can be appropriately determined depending on which is more important in consideration of the data compression efficiency and the processing speed. good. For example, the search limit number is set to a large value when the compression efficiency is emphasized, and the search limit number is set to a small value when the processing speed is emphasized. In the case of making a decision by giving importance to an appropriate balance between the data compression efficiency and the processing speed, it is desirable to set the search limit number to any of 3 to 20 times. In this case, the most preferable limit is around 10. Further, the user may be allowed to specify the limited number of searches depending on the nature of the data to be compressed.

【００５６】一方、上記各実施例においては、最新の圧
縮対象データ列をハッシュ表の最初に検索される位置に
登録するように構成したが、これに限られるものではな
く、最新の圧縮対象データ列を一番最後に検索される位
置に登録するように構成しても良い。On the other hand, in each of the above embodiments, the latest compression target data string is registered in the first searched position of the hash table. However, the present invention is not limited to this, and the latest compression target data The row may be registered at the last searched position.

【００５７】また、上記各実施例のステップＳ５におい
て、限界として規定される最長一致データ長（最長一致
データ長の記憶領域の物理的な制限，若しくは、文章デ
ータの性質からこれ以上の一致データ長は有り得ないと
想定されるもの）を検出した場合は、それ以降の検索を
打切るように構成しても良い。この構成によれば、更に
圧縮処理を高速にすることができる。Further, in step S5 of each of the above-described embodiments, the longest matching data length defined as a limit (the physical limit of the storage area of the longest matching data length, or the matching data length longer than this due to the nature of the sentence data). If it is assumed that is not possible) is detected, the subsequent search may be terminated. With this configuration, the compression processing can be further speeded up.

【００５８】更に、図１１のフローチャートの「リター
ン」の直前に、「検索回数を記憶」の処理ステップＳ１
１を設け、データ列の検索毎に、最長の一致長が検出さ
れたデータ列が何回目の検索で見付かったか、その検索
回数をメモリ３に記憶させるように構成する。そして、
データ圧縮処理がある程度進んだ段階で、「検索回数最
適化処理」の処理ルーチンを実行するように構成しても
良い。この処理ルーチンにおいては、ステップＳ１１で
記憶させた検索回数に対して適当な統計処理を行うこと
によって、圧縮効率が十分良い範囲で且つ検索処理時間
を十分短縮できる検索回数を自動的に求めて設定する
（最適化する）ことができるように構成されている。Further, immediately before "return" in the flow chart of FIG. 11, a processing step S1 of "store the number of times of retrieval" is performed.
1 is provided, and the number of searches for the data string in which the longest matching length is detected is found every time the data string is searched. And
The processing routine of the “search count optimization processing” may be executed when the data compression processing has advanced to some extent. In this processing routine, an appropriate statistical process is performed on the number of searches stored in step S11 to automatically obtain and set the number of searches in which compression efficiency is sufficiently good and the search processing time can be sufficiently shortened. It is configured to be able to (optimize).

【００５９】尚、上記各実施例や変形例等によって圧縮
処理して出力した出力データに対して、例えば算術符号
化などの他の圧縮方法を適用して更にデータ圧縮するよ
うに構成しても良く、このように構成すると、更にデー
タ圧縮効率を高めることができる。It should be noted that the output data output after being compressed according to the above-described embodiments and modifications may be further compressed by applying another compression method such as arithmetic coding. Well, with this configuration, the data compression efficiency can be further improved.

【００６０】[0060]

【発明の効果】本発明は、以上の説明から明らかなよう
に、ＬＺスライド辞書法を用いて文書データ等のデータ
を圧縮するデータ圧縮処理において、圧縮対象データ列
と同じデータ列を圧縮処理済みデータの中からハッシュ
法によって検索する構成としたので、データ列を高速で
検索することができると共に、検索時に必要なメモリ空
間を極力少なくすることができる。As apparent from the above description, according to the present invention, in a data compression process for compressing data such as document data using the LZ slide dictionary method, the same data sequence as the compression target data sequence has been compressed. Since the data is searched by the hash method from the data, the data string can be searched at high speed, and the memory space required for the search can be minimized.

【００６１】また、この構成の場合、検索及び比較を所
定回数繰返した後、打切るように構成すると、データの
検索及びそれに伴うデータ圧縮処理をより一層高速に実
行できる。更に、検索を打切る回数を、３乃至２０回の
何れかに設定すると、データの圧縮効率と圧縮処理速度
との適度なバランスをとることができる。そして、最新
の圧縮対象データ列をハッシュ法による検索候補のうち
の最初に検索される位置に登録するように構成すると、
最長の一致データ長を得る確率を高めることができる。
また、最長の一致データ長を検出したときの検索回数を
統計的に記憶することにより、検索を打切る回数を自動
的に最適化して決定するように構成すると、データ圧縮
効率及び圧縮処理速度を最適化することができる。Further, in this configuration, if the search and comparison are repeated a predetermined number of times and then terminated, the data search and the data compression process associated therewith can be executed at a higher speed. Furthermore, by setting the number of times the search is terminated to any of 3 to 20, it is possible to achieve an appropriate balance between the data compression efficiency and the compression processing speed. Then, when the latest compression target data string is configured to be registered in the first searched position among the search candidates by the hash method,
The probability of obtaining the longest matching data length can be increased.
In addition, by statistically storing the number of searches when the longest matching data length is detected, the number of times the search is aborted is automatically optimized and determined, and the data compression efficiency and the compression processing speed are improved. Can be optimized.

[Brief description of drawings]

【図１】本発明の第１実施例の構成を示すブロック図FIG. 1 is a block diagram showing a configuration of a first embodiment of the present invention.

【図２】制御内容のフローチャートFIG. 2 is a flowchart of control contents.

【図３】図２のステップＲ２の制御内容のフローチャー
トFIG. 3 is a flowchart of the control contents of step R2 in FIG.

【図４】入力データ及び出力データ並びに管理部の内容
を示す図FIG. 4 is a diagram showing input data, output data, and contents of a management unit.

【図５】初期化された状態の検索表の内容を示す概念図FIG. 5 is a conceptual diagram showing contents of a search table in an initialized state.

【図６】ハッシュ表の更新登録処理を示す概念図FIG. 6 is a conceptual diagram showing update registration processing of a hash table.

【図７】図６相当図FIG. 7 is a diagram corresponding to FIG. 6;

【図８】図６相当図FIG. 8 is a diagram corresponding to FIG. 6;

【図９】図６相当図FIG. 9 is a view corresponding to FIG.

【図１０】図６相当図FIG. 10 is a view corresponding to FIG.

【図１１】本発明の第２実施例を示す図２相当図FIG. 11 is a view corresponding to FIG. 2 showing a second embodiment of the present invention.

【図１２】ＬＺスライド辞書法によるデータ圧縮処理の
概念図FIG. 12 is a conceptual diagram of data compression processing by the LZ slide dictionary method.

[Explanation of symbols]

１はデータ入出力装置、４は検索ロジック（検索手
段）、５は一致比較ロジック（比較手段）、６は選択ロ
ジック（選択手段）、７はデータ圧縮ロジック、８は登
録ロジック（登録手段）を示す。1 is a data input / output device, 4 is a search logic (search means), 5 is a match comparison logic (comparison means), 6 is a selection logic (selection means), 7 is a data compression logic, and 8 is a registration logic (registration means). Show.

Claims

[Claims]

1. A data compression method for compressing data such as document data using the LZ slide dictionary method, the same data sequence as a compression target data sequence starting from the position next to the compression processed data in the compression target data. Search step for searching the candidate of the compressed data by the hash method, and comparing the data string searched in this search step with the compression target data string to obtain a matching data length A step of repeating the search step and the comparing step until a search candidate for the same data string as the compression target data string is exhausted from the compressed data, and Selection step to select and output the data string with the longest data length from the matching data length , The data compression method is characterized in that a registration step of registering the hash table the compressed data string as the search candidate by hashing.

2. The data compression according to claim 1, wherein in the repeating step, after the search step and the comparison step are repeated a predetermined number of times, the subsequent search step and the comparison step are terminated. Method.

3. The data compression method according to claim 2, wherein the number of times of termination in the repeating step is set to 3 to 20 times.

4. The registering step, wherein the compression target data string is registered at a position searched first among the search candidates by the hash method.
4. The data compression method according to any one of 3 to 3.

5. The number of times of search in the repeating step is automatically optimized and determined by statistically storing the number of times of searching when the longest matching data length is detected in the comparing step. The data compression method according to claim 2.

6. A data compression apparatus for compressing data such as document data using the LZ slide dictionary method, the same data string as a compression target data string starting from the position next to the compression processed data in the compression target data. Searching means for searching the candidate of the compressed data by the hash method, the data string searched by the searching means, and the compression target data string to obtain a matching data length Means, a repeating means for repeatedly executing a search and comparison of candidates for the same data string as the compression target data string from the compressed data until there are no search candidates, and a matching data length obtained by the comparing means. Selecting means for selecting and outputting the data string having the longest data length from the inside, and a search candidate by the hash method for the data string to be compressed Data compression apparatus characterized by comprising a registration means for registering in the hash table with.

7. The data compression apparatus according to claim 6, wherein the repeating unit is configured to terminate the subsequent search and comparison after repeating the search and comparison a predetermined number of times.

8. The number of times of termination in the repeating means is
The data compression apparatus according to claim 7, wherein the data compression apparatus is set to 3 to 20 times.

9. The data according to claim 6, wherein the registration unit registers the compression target data string at a position searched first among the search candidates by the hash method. Compressor.

10. The repeating unit statistically stores the number of searches when the comparison unit detects the longest matching data length, thereby automatically optimizing and determining the number of search and comparison aborts. 7. The method according to claim 6, wherein
Alternatively, the data compression device according to item 9.