JP2002135128A

JP2002135128A - Data-compression method, data compression/expansion method, data-compression device, and data compression/ expansion device

Info

Publication number: JP2002135128A
Application number: JP2000326119A
Authority: JP
Inventors: Toshiyuki Hirose; 寿幸廣瀬
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2000-10-25
Filing date: 2000-10-25
Publication date: 2002-05-10

Abstract

PROBLEM TO BE SOLVED: To provide a dictionary with capable of conducting processing in single path processing ability, and moreover can obtain sufficient compressibility. SOLUTION: A compression/expansion section 12, which comprises a slide dictionary part 121, a base pointer part 122, an interval calculation part 123, and a gamma-encoding part 124, stores read data as the contents of a dictionary, reads the contents of the dictionary used for encoding from a location of a base pointer, finds longest agreement character string out of the contents, encodes an interval from the base pointer to the first agreement character, and moves the base pointer to a location of the agreement character prior to next character encoding, to perform data compression by a slide dictionary method.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、文章データ等のデ
ータを圧縮するデータ圧縮方法、データ圧縮・伸長方
法、データ圧縮装置及びデータ圧縮・伸長装置に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data compression method for compressing data such as text data, a data compression / decompression method, a data compression device, and a data compression / decompression device.

【０００２】[0002]

【従来の技術】近年、情報処理技術の発達により、コン
ピュータによって文字コード，ベクトル情報，画像など
様々な種類の膨大なデータが処理されるようになってい
る。そして、従来より、二次記憶装置の容量の節約，デ
ータ通信における通信時間の短縮を実現するための技術
として、データ圧縮が知られている。これは、データの
中に含まれる、冗長な情報を取り除くこと、データ列を
他の短いデータ列に変換することなどによって実現され
る。2. Description of the Related Art In recent years, with the development of information processing technology, a huge amount of various types of data such as character codes, vector information, and images have been processed by computers. Conventionally, data compression has been known as a technique for reducing the capacity of the secondary storage device and shortening the communication time in data communication. This is realized by removing redundant information included in the data, converting the data string to another short data string, and the like.

【０００３】データ圧縮の手法は、圧縮データを元のデ
ータに完全には復元できない非可逆圧縮法と、圧縮デー
タを元のデータに完全に復元できる可逆圧縮法の２種類
に分けられる。非可逆圧縮法は、データの完全な復元が
できないので、音声や画像などのデータの圧縮に主に利
用されている。また、データの完全な保存が求められる
二次記憶装置等には元のデータに完全に復元できる可逆
圧縮法が使われる。[0003] Data compression techniques are classified into two types: an irreversible compression method in which compressed data cannot be completely restored to original data, and a lossless compression method in which compressed data can be completely restored to original data. Since the irreversible compression method cannot completely restore data, it is mainly used for compression of data such as audio and images. For a secondary storage device or the like that requires complete storage of data, a reversible compression method that can completely restore the original data is used.

【０００４】可逆圧縮の手法としては、データ内の文字
の出現確立から個々の符号を発生するハフマン符号や算
術符号等の確立統計型符号化方式と、データ内の文字列
の繰り返し出現性を利用したスライド辞書法（ＬＺ７７
符号）や動的辞書法（ＬＺ７８符号）などの辞書型符号
化方式がある。[0004] As a method of lossless compression, a statistical coding method such as a Huffman code or an arithmetic code that generates individual codes from the appearance of characters in data and a repetitive appearance of a character string in data are used. Slide dictionary method (LZ77
Code) and a dynamic dictionary method (LZ78 code).

【０００５】ここで、スライド辞書法（ＬＺ７７符号）
は、J.Ziv とA.Lempelが１９７７年に発表したデータ圧
縮アルゴリズムであって、既に読み込まれたデータを辞
書として登録し、その中から最も長い一致文字列を探し
出して、一致した位置と一致した長さで元の符号を置き
換えて出力していくというものである。スライド辞書法
（ＬＺ７７符号）では、適応範囲が一部に限られる局所
的な辞書を持ち、スライド窓と呼ばれる文字列の中を移
動する窓を使って辞書から文字列を抽出する。Here, the slide dictionary method (LZ77 code)
Is a data compression algorithm released in 1977 by J.Ziv and A.Lempel, which registers already read data as a dictionary, finds the longest matching character string from it, and matches it with the matching position. The original code is replaced with the length that has been set and output. In the slide dictionary method (LZ77 code), a local dictionary having a limited adaptation range is provided, and a character string is extracted from the dictionary using a window called a sliding window that moves in the character string.

【０００６】すなわち、スライド辞書法（ＬＺ７７符
号）では、現在符号化を行っている文字を含めた文字列
をスライド窓と呼び、そのスライド窓中にある文字列が
参照の対象となる。また、符号語は、ｍ：最長一致文字列のスライド窓の先頭からの位置（オ
フセット）ｌ：スライド窓にある最長一致文字列の長さｘ：一致しなかった最初の文字（最長一致文字列の次の
文字）なるｍ，ｌ，ｘからなる２進数で表される。文字列がス
ライド窓に存在しなかった場合はｍ＝０，ｌ＝０とす
る。また、スライド窓が符号語を出力するたびにｌ＋１
文字分次の文字列にずらす。That is, in the slide dictionary method (LZ77 code), a character string including a character currently being encoded is called a slide window, and the character string in the slide window is to be referred to. The codeword is: m: the position (offset) of the longest matching character string from the top of the sliding window l: the length of the longest matching character string in the sliding window x: the first character that did not match (the longest matching character string) Is represented by a binary number consisting of m, l, x. If the character string does not exist in the sliding window, m = 0 and l = 0. Also, every time the sliding window outputs a codeword, l + 1
Shift to the next character string.

【０００７】このようにスライド辞書法（ＬＺ７７符
号）では、最長一致文字列を過去のテキストから見つ
け、その開始位置と長さを符号語として使う。As described above, in the slide dictionary method (LZ77 code), the longest matching character string is found from the past text, and its start position and length are used as code words.

【０００８】この種の技術は、例えば、特開昭60−1162
28号公報公報に記載されている。[0008] This type of technology is disclosed in, for example, Japanese Patent Application Laid-Open No. 60-1162.
No. 28 gazette.

【０００９】また、スライド辞書法（ＬＺ７７符号）に
は、開始位置と長さに関する表現の仕方や最長一致文字
列の探索の方法によって幾つかのバリエーションが存在
する。There are several variations in the slide dictionary method (LZ77 code) depending on how to express the start position and length and how to search for the longest matching character string.

【００１０】[0010]

【発明が解決しようとする課題】ところで、ハフマン符
号や算術符号等を採用したデータ圧縮は、２パスで処理
を行うので処理が重く、ハードウエアでは実用的でな
く、ソフトウエアで行う必要がある。By the way, data compression employing Huffman code, arithmetic code, and the like is performed in two passes and is therefore heavy, impractical in hardware, and must be performed in software. .

【００１１】また、データ内の文字列の繰り返し出現性
を利用した辞書型符号化方式の動的辞書法（ＬＺ７８符
号）を採用したデータ圧縮は、１パスで処理を行うこと
ができるのであるが、辞書に大きなメモリを必要とす
る。Also, data compression employing a dynamic dictionary method (LZ78 code) of a dictionary type encoding method utilizing the repetitive appearance of character strings in data can be processed in one pass. , Requires a large memory for the dictionary.

【００１２】そして、辞書型符号化方式のスライド辞書
法（ＬＺ７７符号）を採用したデータ圧縮は、１パスで
処理を行うことができ、比較的に小さな辞書で実行する
ことができるのであるが、さらなる辞書の小型化と圧縮
率の改善が必要とされている。Data compression using the slide dictionary method (LZ77 code) of the dictionary type encoding method can be performed in one pass, and can be executed with a relatively small dictionary. There is a need for further miniaturization of the dictionary and improvement of the compression ratio.

【００１３】そこで、本発明の目的は、上述の如き従来
の実状に鑑み、１パスで処理が可能でしかも小さな辞書
で十分な圧縮率を得ることができるようにしたデータ圧
縮方法、データ圧縮・伸長方法、データ圧縮装置及びデ
ータ圧縮・伸長装置を提供することにある。Accordingly, an object of the present invention is to provide a data compression method, a data compression method and a data compression method capable of performing processing in one pass and obtaining a sufficient compression ratio with a small dictionary in view of the above-mentioned conventional situation. An object of the present invention is to provide a decompression method, a data compression device, and a data compression / decompression device.

【００１４】[0014]

【課題を解決するための手段】本発明は、読み込まれた
データを辞書として登録し、その中から最も長い一致文
字列を探し出して、一致した位置と一致した長さで元の
符号を置き換えて出力するスライド辞書法によるデータ
圧縮方法であって、符号化における辞書の参照をベース
ポインタの位置から行い、上記ベースポインタから最初
に一致した文字までのインターバルを符号化し、次の文
字を符号化する前に、ベースポインタを文字が一致した
位置に移動することを特徴とする。According to the present invention, the read data is registered as a dictionary, the longest matching character string is searched from the dictionary, and the original code is replaced with the matching position and length. This is a data compression method based on a slide dictionary method for outputting, in which a reference to a dictionary in encoding is performed from the position of a base pointer, an interval from the base pointer to the first matching character is encoded, and the next character is encoded. Previously, the base pointer is moved to a position where characters match.

【００１５】また、本発明は、読み込まれたデータを辞
書として登録し、その中から最も長い一致文字列を探し
出して、一致した位置と一致した長さで元の符号を置き
換えて出力するスライド辞書法によるデータ圧縮・伸長
方法であって、圧縮時には、符号化における辞書の参照
をベースポインタの位置から行い、上記ベースポインタ
から最初に一致した文字までのインターバルを符号化
し、次の文字を符号化する前に、ベースポインタを文字
が一致した位置に移動する処理を繰り返すことにより符
号化を行い、伸長時には、符号化した文字について、復
号における辞書の参照をベースポインタの位置から行
い、上記ベースポインタから最初に一致した文字までの
インターバルを復号し、次の文字を復号する前に、ベー
スポインタを文字が一致した位置に移動する処理を繰り
返すことにより復号することを特徴とする。The present invention also provides a slide dictionary which registers read data as a dictionary, finds the longest matching character string from the dictionary, replaces the original code with a length matching the matching position, and outputs the result. A data compression / expansion method based on the method. At the time of compression, a dictionary in encoding is referenced from the position of a base pointer, an interval from the base pointer to the first matching character is encoded, and the next character is encoded. Before performing the encoding, the encoding is performed by repeating the process of moving the base pointer to the position where the character matches, and at the time of decompression, for the encoded character, reference to the dictionary in decoding is performed from the position of the base pointer in decoding. From the first character to the first matched character, and before decoding the next character, move the base pointer to one character. Characterized by decoding by repeating the process of moving to the position.

【００１６】また、本発明は、読み込まれたデータを辞
書として登録し、その中から最も長い一致文字列を探し
出して、一致した位置と一致した長さで元の符号を置き
換えて出力するスライド辞書法によりデータ圧縮処理を
行うデータ圧縮装置であって、読み込まれたデータが辞
書として登録されるスライド辞書部と、符号化における
上記スライド辞書部の参照位置として前回文字が一致し
た位置を指定するベースポインタと、上記ベースポイン
タから最初に一致した文字までのインターバルを算出す
るインターバル算出部と、上記インターバル算出部によ
り算出されたインターバルを符号化する符号化部とを備
えることを特徴とする。The present invention also provides a slide dictionary that registers read data as a dictionary, finds the longest matching character string from the dictionary, replaces the original code with a length matching the matching position, and outputs the slide dictionary. A data compression apparatus for performing data compression processing by a method, comprising: a slide dictionary section in which read data is registered as a dictionary; and a base for designating a position at which a previous character matched as a reference position of the slide dictionary section in encoding. It is characterized by comprising a pointer, an interval calculating section for calculating an interval from the base pointer to the first matching character, and an encoding section for encoding the interval calculated by the interval calculating section.

【００１７】さらに、本発明は、読み込まれたデータを
辞書として登録し、その中から最も長い一致文字列を探
し出して、一致した位置と一致した長さで元の符号を置
き換えて出力するスライド辞書法によるデータ圧縮・伸
長装置であって、符号化における辞書の参照をベースポ
インタの位置から行い、上記ベースポインタから最初に
一致した文字までのインターバルを符号化し、次の文字
を符号化する前に、ベースポインタを文字が一致した位
置に移動する処理を繰り返すことにより符号化を行う圧
縮処理手段と、符号化した文字について、復号における
辞書の参照をベースポインタの位置から行い、上記ベー
スポインタから最初に一致した文字までのインターバル
を復号し、次の文字を復号する前に、ベースポインタを
文字が一致した位置に移動する処理を繰り返すことによ
り復号する伸長処理手段とを備えることを特徴とする。Further, according to the present invention, a slide dictionary in which read data is registered as a dictionary, a longest matching character string is searched from the dictionary, and an original code is replaced by a length matching a matching position and output. A data compression / decompression device based on the method, wherein a reference to a dictionary in encoding is performed from a position of a base pointer, an interval from the base pointer to a first matched character is encoded, and before encoding a next character, Compression processing means for performing encoding by repeating the process of moving the base pointer to the position where the character matches, and referring to the dictionary in decoding from the position of the base pointer for decoding the encoded character, Decode the interval up to the character that matches the character, and before decoding the next character, move the base pointer to the position where the character matches. Characterized in that it comprises a decompression processing means for decoding by repeating the process of moving to.

【００１８】[0018]

【発明の実施の形態】以下、本発明の実施の形態につい
て図面を参照しながら詳細に説明する。Embodiments of the present invention will be described below in detail with reference to the drawings.

【００１９】本発明は、例えば図１に示すような構成の
テープストリーマドライブ１０に適用される。The present invention is applied to, for example, a tape streamer drive 10 having a configuration as shown in FIG.

【００２０】このテープストリーマドライブ１０は、外
部のホストコンピュータ２０との間でデータの授受を行
うためのインターフェースとしてＳＣＳＩ(Smoll Compu
terSystem Interface) インターフェイス１１を備え
る。このＳＣＳＩインターフェイス１１には、圧縮／伸
長処理部１２が接続されている。この圧縮／伸長処理部
１２には、ＩＦ／ＥＣＣコントローラ１３が接続されて
いる。このＩＦ／ＥＣＣコントローラ１３には、バッフ
ァメモリ１４、記録用ＲＦ処理部１５及び再生用ＲＦ処
理部１６が接続されている。そして、上記記録用ＲＦ処
理部１５及び再生用ＲＦ処理部１６には、ロータリート
ランス１７を介して記録ヘッド１８Ａ，１８Ｂ及び再生
ヘッド１９Ａ，１９Ｂが接続されている。The tape streamer drive 10 has a SCSI (Smoll Compu- ter) as an interface for exchanging data with an external host computer 20.
terSystem Interface). A compression / expansion processing unit 12 is connected to the SCSI interface 11. An IF / ECC controller 13 is connected to the compression / decompression processing unit 12. The buffer memory 14, the recording RF processing unit 15, and the reproduction RF processing unit 16 are connected to the IF / ECC controller 13. The recording heads 18A and 18B and the reproducing heads 19A and 19B are connected to the recording RF processing unit 15 and the reproducing RF processing unit 16 via a rotary transformer 17.

【００２１】なお、このテープストリーマドライブ１０
は、磁気テープ１に対して、ヘリカルスキャン方式によ
り記録／再生を行うもので、上記記録ヘッド１８Ａ，１
８Ｂ及び再生ヘッド１９Ａ，１９Ｂが図示しない回転ド
ラムに設けられている。記録ヘッド１８Ａ，１８Ｂは互
いにアジマス角の異なる２つのギャップが極めて近接し
て配置される構造となっている。再生ヘッド１９Ａ，１
９Ｂも互いにアジマス角の異なるヘッドとされるが、例
えば９０度離れた状態で配置される。The tape streamer drive 10
Performs recording / reproducing on the magnetic tape 1 by a helical scan method. The recording heads 18A, 1
8B and reproduction heads 19A and 19B are provided on a rotating drum (not shown). The recording heads 18A and 18B have a structure in which two gaps having different azimuth angles are arranged very close to each other. Reproducing head 19A, 1
9B are also heads having different azimuth angles, but are arranged, for example, 90 degrees apart.

【００２２】このテープストリーマドライブ１０は、デ
ータ記録時にホストコンピュータ２０から、固定長のレ
コード（ｒｅｃｏｒｄ）という伝送データ単位によりＳ
ＣＳＩインターフェイス１１を介して逐次データが入力
され、圧縮／伸長部１２に供給される。なお、このよう
なテープストリーマドライブシステムにおいては、可変
長のデータの集合単位によってホストコンピュータ２０
よりデータが伝送されるモードも存在する。The tape streamer drive 10 transmits the data from the host computer 20 at the time of data recording by using a transmission data unit called a fixed-length record.
Data is sequentially input via the CSI interface 11 and supplied to the compression / decompression unit 12. In such a tape streamer drive system, the host computer 20 is controlled by a set unit of variable-length data.
There are also modes in which more data is transmitted.

【００２３】圧縮／伸長部１２では、入力されたデータ
について図示しないシステムコントローラによる指示に
従い必要に応じて、本発明に係るデータ圧縮／伸長方法
により圧縮／伸長処理が施される。The compression / expansion unit 12 performs a compression / expansion process on the input data according to an instruction from a system controller (not shown) by a data compression / expansion method according to the present invention as needed.

【００２４】圧縮／伸長部１２により圧縮処理が施され
たデータは、ＩＦ／ＥＣＣコントローラ１３に供給され
る。ＩＦ／ＥＣＣコントローラ１３では、圧縮／伸長部
１２から供給されたデータをバッファメモリ１４に一旦
蓄積して、最終的にグループ（Ｇｒｏｕｐ）という磁気
テープの４０トラック分に相当する固定長の単位として
データを扱うようにして、このデータに対してＥＣＣフ
ォーマット処理を行う。The data subjected to the compression processing by the compression / expansion unit 12 is supplied to an IF / ECC controller 13. The IF / ECC controller 13 temporarily stores the data supplied from the compression / expansion unit 12 in the buffer memory 14, and finally stores the data as a fixed length unit corresponding to 40 tracks of a magnetic tape called a group (Group). And performs ECC format processing on this data.

【００２５】ＩＦ／ＥＣＣコントローラ１３におけるＥ
ＣＣフォーマット処理では、記録データについて誤り訂
正コードを付加するとともに、磁気記録に適合するよう
にデータについて変調処理を行う。そして、このＥＣＣ
フォーマット処理を施した記録データを記録用ＲＦ処理
部１５に供給する。記録用ＲＦ処理部１５では供給され
た記録データに対して増幅、記録イコライジング等の処
理を施して記録信号を生成する。この記録用ＲＦ処理部
１５により生成された記録信号は、ロータリートランス
１７を介して記録ヘッド１８Ａ，１８Ｂに供給され、磁
気テープ１に記録される。E in the IF / ECC controller 13
In the CC format processing, an error correction code is added to recording data, and the data is subjected to modulation processing so as to be compatible with magnetic recording. And this ECC
The recording data subjected to the format processing is supplied to the recording RF processing unit 15. The recording RF processing unit 15 performs processing such as amplification and recording equalization on the supplied recording data to generate a recording signal. The recording signal generated by the recording RF processing unit 15 is supplied to the recording heads 18A and 18B via the rotary transformer 17, and is recorded on the magnetic tape 1.

【００２６】また、データ再生動作について簡単に説明
すると、磁気テープ１の記録データが再生ヘッド１９
Ａ，１９ＢによりＲＦ再生信号として読み出され、この
ＲＦ再生信号がロータリートランス１７を介して再生用
ＲＦ処理部１６に供給される。この再生用ＲＦ処理部１
６では、供給されたＲＦ再生信号に対して、再生イコラ
イジング、再生クロック生成、２値化、デコード（例え
ばビタビ復号）などの処理が行われる。このようにして
読み出されたデータはＩＦ／ＥＣＣコントローラ１３に
供給され、バッファメモリ１４に一時蓄積されて、誤り
訂正処理等が施される。そして、上記ＩＦ／ＥＣＣコン
トローラ１３により誤り訂正処理等が施されたデータ
は、所定の時点で上記バッファメモリ１４から読み出さ
れて圧縮／伸長部１２に供給される。The data reproducing operation will be briefly described. The data recorded on the magnetic tape 1 is read from the reproducing head 19.
A and 19B read out as an RF reproduction signal, and this RF reproduction signal is supplied to the reproduction RF processing unit 16 via the rotary transformer 17. This RF processing unit for reproduction 1
In 6, the supplied RF reproduction signal is subjected to processing such as reproduction equalization, reproduction clock generation, binarization, decoding (for example, Viterbi decoding). The data read in this way is supplied to the IF / ECC controller 13, where it is temporarily stored in the buffer memory 14 and subjected to error correction processing and the like. The data subjected to the error correction processing and the like by the IF / ECC controller 13 is read from the buffer memory 14 at a predetermined time and supplied to the compression / decompression unit 12.

【００２７】圧縮／伸長部１２では、図示しないシステ
ムコントローラによる判断に基づいて、当該記録時に圧
縮／伸長部１２により圧縮が施されたデータであれば伸
長処理を行い、非圧縮データであればデータ伸長処理を
行わずにデータをそのまま出力する。The compression / decompression unit 12 performs decompression processing if the data has been compressed by the compression / decompression unit 12 at the time of the recording, and decompresses the data if it is uncompressed data, based on the judgment by the system controller (not shown). Outputs the data as it is without performing decompression processing.

【００２８】圧縮／伸長部１２の出力データはＳＣＳＩ
インターフェイス１１を介して再生データとしてホスト
コンピュータ２０に出力される。The output data of the compression / expansion unit 12 is SCSI
The data is output to the host computer 20 as reproduction data via the interface 11.

【００２９】ここで、このテープストリーマドライブ１
０における圧縮／伸長部１２は、その機能的な構成を図
２に示してあるように、スライド辞書部１２１、ベース
ポインタ部１２２、インターバル算出部１２３及びガン
マ符号化部１２４により構成され、図３のフローチャー
トに示す手順に従って圧縮処理を行い、また、図４のフ
ローチャートに示す手順に従って伸長処理を行う。Here, the tape streamer drive 1
As shown in FIG. 2, the compression / decompression unit 12 at 0 includes a slide dictionary unit 121, a base pointer unit 122, an interval calculation unit 123, and a gamma encoding unit 124. The compression process is performed according to the procedure shown in the flowchart of FIG. 4, and the decompression process is performed according to the procedure shown in the flowchart of FIG.

【００３０】すなわち、この圧縮／伸長部１２による圧
縮処理では、図３のフローチャートに示すように、圧縮
処理を開始すると、先ず、ステップＳ１において０埋め
によりスライド辞書を初期化するとともにライトポイン
タ、ベースポインタ及びランレングスを０にする。That is, in the compression processing by the compression / decompression unit 12, as shown in the flowchart of FIG. 3, when the compression processing is started, first, in step S1, the slide dictionary is initialized by padding with zeros, and the write pointer and the base are initialized. Set the pointer and run length to 0.

【００３１】次のステップＳ２では１文字入力し、ステ
ップＳ３において「前の文字」＋「現在の文字」につい
て辞書検索を行い、次のステップＳ４で一致するか否か
を判定する。In the next step S2, one character is input. In step S3, a dictionary search is performed for "the previous character" + "the current character", and in the next step S4, it is determined whether or not they match.

【００３２】このステップＳ４の判定結果がＹＥＳすな
わち「前の文字」＋「現在の文字」がスライド辞書上の
文字列と一致する場合には、ステップＳ７に移ってイン
ターバル＝マッチアドレス−ベースポインタによりイン
ターバルを算出する。If the decision result in the step S4 is YES, that is, if "the previous character" + "the current character" coincides with the character string on the slide dictionary, the process shifts to a step S7, where interval = match address-base pointer. Calculate the interval.

【００３３】また、上記ステップＳ４の判定結果がＮＯ
すなわち「前の文字」＋「現在の文字」がスライド辞書
上の文字列と一致しない場合には、ステップＳ５に移っ
て「現在の文字」について辞書検索を行い、次のステッ
プＳ６で一致するか否かを判定する。If the result of the determination in step S4 is NO
That is, if "previous character" + "current character" does not match the character string on the slide dictionary, the process proceeds to step S5, where a dictionary search is performed for "current character", and whether or not it matches in the next step S6. Determine whether or not.

【００３４】そして、このステップＳ６の判定結果がＹ
ＥＳすなわち「現在の文字」がスライド辞書上の文字列
と一致する場合には、ステップＳ７に移ってインターバ
ル＝マッチアドレス−ベースポインタによりインターバ
ルを算出する。The result of the determination in step S6 is Y
If the ES, that is, the "current character" matches the character string on the slide dictionary, the process proceeds to step S7, and the interval is calculated by the interval = match address-base pointer.

【００３５】次のステップＳ８では、ステップＳ７で算
出したインターバルが１であるか否かを判定する。In the next step S8, it is determined whether or not the interval calculated in step S7 is one.

【００３６】このステップＳ８の判定結果がＹＥＳすな
わちインターバルが１である場合には、ステップＳ９に
移ってランレングス＝ランレングス＋１すなわちランレ
ングスをインクリメントして、ステップＳ１２に移って
ベースカウンタ＝マッチアドレスすなわちベースポイン
タをマッチアドレスの位置に移動してから、ステップＳ
１６に移る。If the decision result in the step S8 is YES, that is, if the interval is 1, the process proceeds to a step S9, where the run length = run length + 1, that is, the run length is incremented, and the process proceeds to a step S12, where the base counter = match address That is, after moving the base pointer to the position of the match address,
Move to 16.

【００３７】また、上記ステップＳ６の判定結果がＮＯ
すなわちインターバルが１でない場合には、ステップＳ
１０に移ってフラッシュ動作を行い、次のステップＳ１
１でインターバルを次の表１にしたがって符号化して出
力し、さらにステップＳ１２でベースカウンタ＝マッチ
アドレスすなわちベースポインタをマッチアドレスの位
置に移動してから、ステップＳ１６に移る。If the result of the determination in step S6 is NO
That is, if the interval is not 1, step S
Then, the process proceeds to step S1 where the flash operation is performed.
At step 1, the interval is coded according to the following Table 1, and is output. Further, at step S12, the base counter = match address, that is, the base pointer is moved to the position of the match address, and then the process proceeds to step S16.

【００３８】[0038]

【表１】 [Table 1]

【００３９】また、上記ステップＳ８の判定結果がＮＯ
すなわち「現在の文字」がスライド辞書上の文字列と一
致しない場合には、ステップＳ１３に移ってフラッシュ
動作を行い、次のステップＳ１４で「１」＋「現在の文
字」を出力し、さらにステップＳ１５でベースポインタ
＝ベースポインタ＋１すなわちベースポインタをインク
リメントしてから、ステップＳ１６に移る。The result of the determination in step S8 is NO.
That is, if the "current character" does not match the character string on the slide dictionary, the process proceeds to step S13 to perform a flash operation, and in the next step S14, "1" + "current character" is output. In S15, the base pointer = base pointer + 1, that is, the base pointer is incremented, and then the process proceeds to step S16.

【００４０】ステップＳ１６では、前の文字＝文字すな
わち「前の文字」を更新するとともに、ライトポインタ
の示すアドレスに「現在の文字」を文字を書き込む。さ
らに、ライトポインタ＝ライトポインタ＋１すなわちラ
イトポインタをインクリメントする。In step S16, the previous character = character, that is, the "previous character" is updated, and the character "current character" is written at the address indicated by the write pointer. Further, the write pointer = write pointer + 1, that is, the write pointer is incremented.

【００４１】次のステップＳ１７では、圧縮処理を継続
するか否かを判定する。このステップＳ１７の判定結果
がＹＥＳすなわち圧縮処理を継続する場合には、上記ス
テップＳ２に戻り、ステップＳ２〜ステップＳ１７の処
理を繰り返し行う。そして、このステップＳ１７の判定
結果がＮＯすなわち圧縮処理を終了する場合には、次の
ステップＳ１８に移ってフラッシュ動作を行ってから、
圧縮処理を終了する。In the next step S17, it is determined whether or not to continue the compression processing. If the decision result in the step S17 is YES, that is, if the compression process is continued, the process returns to the step S2, and the processes in the steps S2 to S17 are repeated. When the result of the determination in step S17 is NO, that is, when the compression process is completed, the process proceeds to the next step S18, where the flash operation is performed.
The compression processing ends.

【００４２】ここで、ステップＳ１０，Ｓ１３，Ｓ１８
におけるフラッシュ動作では、ランレングスが１以上で
あればランレングスを表１にしたがって符号化して出力
し、ランレングスを「０」にする。Here, steps S10, S13, S18
If the run length is 1 or more, the run length is encoded according to Table 1 and output, and the run length is set to "0".

【００４３】また、この圧縮／伸長部１２による伸長処
理では、図４のフローチャートに示すように、伸長処理
を開始すると、先ず、ステップＳ２１において０埋めに
よりスライド辞書を初期化するとともにライトポイン
タ、ベースポインタ及びランレングスを０にして、次の
ステップＳ２２で圧縮データを入力する。In the decompression processing by the compression / decompression unit 12, as shown in the flowchart of FIG. 4, when the decompression processing is started, first, in step S21, the slide dictionary is initialized by padding with zeros, and the write pointer and the base are initialized. The pointer and the run length are set to 0, and compressed data is input in the next step S22.

【００４４】次のステップＳ２３では、入力された圧縮
データについてＭＳＢが１であるか否かを判定する。In the next step S23, it is determined whether or not the MSB of the input compressed data is 1.

【００４５】このステップＳ２３の判定結果がＹＥＳす
なわち入力した圧縮データのＭＳＢが１である場合に
は、ステップＳ２４において次の８ビットを文字として
出力し、ステップＳ２５でベースポインタ＝ベースポイ
ンタ＋１すなわちベースポインタをインクリメントして
から、ステップＳ３０に移る。If the decision result in the step S23 is YES, that is, if the MSB of the input compressed data is 1, the next 8 bits are outputted as a character in a step S24, and in a step S25, the base pointer = the base pointer + 1, that is, the base After incrementing the pointer, the process moves to step S30.

【００４６】また、上記ステップＳ２３の判定結果がＮ
Ｏすなわち入力した圧縮データのＭＳＢが１でない場合
には、ステップＳ２６に移って表１にしたがってデコー
ドして、デコードした結果について次のステップＳ２７
においてインターバルであるか否かを判定する。The result of the determination in step S23 is N
If O, that is, if the MSB of the input compressed data is not 1, the process proceeds to step S26, where decoding is performed in accordance with Table 1, and the result of the decoding is determined in step S27.
It is determined whether or not it is an interval.

【００４７】このステップＳ２７の判定結果がＹＥＳす
なわちデコードした結果がインターバルである場合に
は、次のステップＳ２８においてベースポインタ＋イン
ターバルの文字を出力し、ステップＳ２９でベースポイ
ンタ＝ベースポインタ＋１すなわちベースポインタをイ
ンクリメントしてから、ステップＳ３０に移る。このス
テップＳ３０では、ライトポインタの示すアドレスに文
字を書き込む。さらに、ライトポインタ＝ライトポイン
タ＋１すなわちライトポインタをインクリメントして、
ステップＳ３５に移る。If the decision result in the step S27 is YES, that is, if the decoded result is the interval, the character of the base pointer + interval is outputted in the next step S28, and in the step S29, the base pointer = the base pointer + 1, that is, the base pointer Then, the process proceeds to step S30. In this step S30, characters are written to the address indicated by the write pointer. Further, the write pointer = write pointer + 1, that is, the write pointer is incremented,
Move to step S35.

【００４８】また、上記ステップＳ２７の判定結果がＮ
Ｏすなわちデコードした結果がインターバルでない場合
には、次のステップＳ３１に移ってベースポインタ＋１
の文字を出力し、ステップＳ３２でベースポインタ＝ベ
ースポインタ＋１すなわちベースポインタをインクリメ
ントして、ランレングス＝ランレングス−１すなわちラ
ンレングスをデクリメントする。次のステップＳ３３で
は、ライトポインタの示すアドレスに文字を書き込み、
さらに、ライトポインタ＝ライトポインタ＋１すなわち
ライトポインタをインクリメントする。The result of step S27 is N
If O, that is, if the decoded result is not an interval, the process proceeds to the next step S31, where the base pointer +1
Is output, and in step S32, the base pointer = base pointer + 1, ie, the base pointer is incremented, and the run length = run length−1, ie, the run length is decremented. In the next step S33, characters are written to the address indicated by the write pointer,
Further, the write pointer = write pointer + 1, that is, the write pointer is incremented.

【００４９】次のステップＳ３４では、ランレングスが
０であるか否かを判定する。In the next step S34, it is determined whether or not the run length is 0.

【００５０】このステップＳ３４の判定結果がＹＥＳす
なわちランレングスが０である場合には、次のステップ
Ｓ３５に進む。If the decision result in the step S34 is YES, that is, if the run length is 0, the flow advances to the next step S35.

【００５１】また、このステップＳ３４の判定結果がＹ
ＥＳすなわちランレングスが０でない場合には、上記ス
テップＳ３１に戻って、ステップＳ３１〜Ｓ３４の処理
をランレングスが０になるまで繰り返し行う。The result of the determination in step S34 is Y
If the ES, that is, the run length is not 0, the process returns to step S31, and the processing of steps S31 to S34 is repeated until the run length becomes 0.

【００５２】ステップＳ３５では、伸長処理を継続する
か否かを判定する。このステップＳ３５の判定結果がＹ
ＥＳすなわち伸長処理を継続する場合には、上記ステッ
プＳ２２に戻り、ステップＳ２〜ステップＳ３５の処理
を繰り返し、次の圧縮データについて伸長処理を順次行
う。そして、このステップＳ３５の判定結果がＮＯの場
合には、伸長処理を終了する。In step S35, it is determined whether or not to continue the decompression processing. If the determination result of step S35 is Y
If the ES, that is, the decompression process is continued, the process returns to step S22, and the processes of steps S2 to S35 are repeated, and the decompression process is sequentially performed on the next compressed data. Then, if the decision result in the step S35 is NO, the decompression process ends.

【００５３】ここで、このテープストリーマドライブ１
０における圧縮／伸長部１２による圧縮処理の具体例に
ついて説明する。この具体例の説明では、スライド辞書
とポインタは、図５の（Ａ）に示す状態になっているも
のと仮定する。すなわち、スライド辞書には、文字列
「ＡＢＣＤＥＦＧＨＩＪ」が書き込まれており、ベース
ポインタはスライド辞書のアドレス０を示しているもの
とする。そして、文字列「ＣＤＥＪ」を符号化するもの
とする。Here, the tape streamer drive 1
A specific example of the compression processing by the compression / decompression unit 12 at 0 will be described. In the description of this specific example, it is assumed that the slide dictionary and the pointer are in the state shown in FIG. That is, the character string “ABCDEFGHIJ” is written in the slide dictionary, and the base pointer indicates the address 0 of the slide dictionary. Then, the character string “CDEJ” is encoded.

【００５４】最初にインターバルを使った符号化につい
て説明する。First, encoding using intervals will be described.

【００５５】先ず、文字列「ＣＤＥＪ」の最初の文字
「Ｃ」をステップＳ２で入力して、上記ステップＳ３〜
ステップＳ６によりスライド辞書で文字「Ｃ」を検索す
る。スライド辞書による検索は、スライド辞書の全アド
レスについて、ベースポインタで示されるアドレスの右
隣から１巡するように行う。文字「Ｃ」は、スライド辞
書のアドレス２で見つかる。ベースポインタはアドレス
０にあるので、ステップＳ７では２−０＝２によりイン
ターバル２が算出される。インターバル２はステップＳ
１０，Ｓ１１で表１にしたがって「００１０」に符号化
される。そして、図５の（Ｂ）に示すように、ステップ
Ｓ１２でベースポインタをアドレス２に移動する。First, the first character "C" of the character string "CDEJ" is input in step S2, and the above steps S3 to S3 are performed.
In step S6, the character "C" is searched in the slide dictionary. The search using the slide dictionary is performed for all addresses of the slide dictionary so as to make one round from the right side of the address indicated by the base pointer. The letter "C" is found at address 2 of the slide dictionary. Since the base pointer is at the address 0, the interval 2 is calculated in step S7 by 2-0 = 2. Interval 2 is step S
In step S11, the data is encoded into "0010" according to Table 1. Then, as shown in FIG. 5B, the base pointer is moved to address 2 in step S12.

【００５６】次に、文字列「ＣＤＥＪ」の２番目の文字
「Ｄ」をステップＳ２で入力して、上記ステップＳ３〜
ステップＳ６によりスライド辞書で文字「Ｄ」を検索す
る。文字「Ｄ」は、スライド辞書のアドレス３で見つか
る。ベースポインタはアドレス２にあるので、ステップ
Ｓ７では３−２＝１によりインターバル１が算出され
る。インターバル１はステップＳ１０，Ｓ１１で表１に
したがって「０１」に符号化される。そして、図５の
（Ｃ）に示すように、ステップＳ１２でベースポインタ
をアドレス３に移動する。Next, the second character "D" of the character string "CDEJ" is input in step S2,
In step S6, the character "D" is searched in the slide dictionary. The letter "D" is found at address 3 in the slide dictionary. Since the base pointer is at address 2, in step S7, interval 1 is calculated by 3-2 = 1. Interval 1 is encoded to “01” in accordance with Table 1 in steps S10 and S11. Then, as shown in FIG. 5C, the base pointer is moved to address 3 in step S12.

【００５７】次に、文字列「ＣＤＥＪ」の３番目の文字
「Ｅ」をステップＳ２で入力して、上記ステップＳ３〜
ステップＳ６によりスライド辞書で文字「Ｅ」を検索す
る。文字「Ｅ」は、スライド辞書のアドレス４で見つか
る。ベースポインタはアドレス３にあるので、ステップ
Ｓ７では４−３＝１によりインターバル１が算出され
る。インターバル１はステップＳ１０，Ｓ１１で表１に
したがって「０１」に符号化される。そして、図５の
（Ｄ）に示すように、ステップＳ１２でベースポインタ
をアドレス４に移動する。Next, the third character "E" of the character string "CDEJ" is input in step S2,
In step S6, the character "E" is searched in the slide dictionary. The letter "E" is found at address 4 in the slide dictionary. Since the base pointer is at address 3, in step S7, interval 1 is calculated by 4-3 = 1. Interval 1 is encoded to “01” in accordance with Table 1 in steps S10 and S11. Then, as shown in FIG. 5D, the base pointer is moved to address 4 in step S12.

【００５８】さらに、文字列「ＣＤＥＪ」の４番目の文
字「Ｊ」をステップＳ２で入力して、上記ステップＳ３
〜ステップＳ６によりスライド辞書で文字「Ｊ」を検索
する。文字「Ｊ」は、スライド辞書のアドレス９で見つ
かる。ベースポインタはアドレス４にあるので、ステッ
プＳ７では９−４＝５によりインターバル５が算出され
る。インターバル５はステップＳ１０，Ｓ１１で表１に
したがって「０００１０１」に符号化される。そして、
図５の（Ｅ）に示すように、ステップＳ１２でベースポ
インタをアドレス９に移動する。Further, the fourth character "J" of the character string "CDEJ" is inputted in step S2, and the above-mentioned step S3
Step S6 searches the slide dictionary for the character "J". The letter "J" is found at address 9 in the slide dictionary. Since the base pointer is at address 4, in step S7, interval 5 is calculated from 9-4 = 5. Interval 5 is encoded into “000101” in accordance with Table 1 in steps S10 and S11. And
As shown in FIG. 5E, the base pointer is moved to address 9 in step S12.

【００５９】このように文字列「ＣＤＥＪ」は、インタ
ーバルを使って「２，１，１，５」で表され、表１にし
たがって「００１０，０１，０１，０００１０１」に符
号化される。As described above, the character string “CDEJ” is represented by “2, 1, 1, 5” using intervals, and is encoded into “0010, 01, 01, 00001” according to Table 1.

【００６０】そして、インターバルを使った符号化では
インターバル１が連続することがあるので、これをラン
レングスに置き換えてから、ステップＳ１０，Ｓ１１で
表１にしたがって符号化する。In the encoding using the interval, since the interval 1 may be continuous, it is replaced with a run length, and then the encoding is performed according to Table 1 in steps S10 and S11.

【００６１】すなわち、上記文字列「ＣＤＥＪ」の場
合、インターバルを使って「２，１，１，５」で表され
るので、これを「２，ランレングス２，５」として、表
１にしたがって「００１０，０１０１，０００１０１」
に符号化する。表１に示すように、ランレングス２−５
は、インターバル１を使って表す。That is, since the character string "CDEJ" is represented by "2, 1, 1, 5" using intervals, it is defined as "2, run length 2, 5" and according to Table 1. "0010,0101,000101"
To be encoded. As shown in Table 1, run length 2-5
Is represented using interval 1.

【００６２】このようにして圧縮した符号「００１０，
０１０１，０００１０１」は１４ビットであり、文字の
符号長を８ビットとすると、１４ビット／（８ビット×
４）＝３１．２５％に圧縮されたことになる。The code “0010,
“0101,000101” is 14 bits, and if the code length of the character is 8 bits, 14 bits / (8 bits ×
4) It means that it has been compressed to 31.25%.

【００６３】ここで、表１に示すようにインターバルは
１５までしか定義していない。インターバルが１６以上
になる場合にはインターバルを符号化する代わりに、文
字を符号語として出力する。インターバルと、ランレン
グスを区別するために１を付加する。例えば文字「Ｚ」
ならＺのＡＳＣＩＩコードは０ｘ５Ａ（０１０１１０１
０）なので、「１０１０１１０１０」を符号語とする。Here, as shown in Table 1, only up to 15 intervals are defined. If the interval is 16 or more, a character is output as a code word instead of encoding the interval. One is added to distinguish the interval from the run length. For example, the letter "Z"
Then, the ASCII code of Z is 0x5A (01010110)
0), "1010111010" is used as a codeword.

【００６４】また、辞書にない文字を符号語として出力
する場合には、ＢＳＴＷ符号やハフマン符号に変換する
ことによって、全体の圧縮率を高めることができた。When a character not in the dictionary is output as a code word, the overall compression ratio can be increased by converting the code word into a BSTW code or a Huffman code.

【００６５】このテープストリーマドライブ１０では、
データの圧縮処理及び伸長処理をそれぞれ１パスの処理
で行うことができ、しかも、５１２バイト程度の小さな
容量のスライド辞書で、十分な圧縮率を得ることができ
た。テキストファイルで５６％、プログラムの実行ファ
イルで４６％程度の圧縮率であった。したがって、テー
プ１巻き当たりの容量を見かけ上大きくすることができ
る。また、１パスで処理できるので、ソフトウエア化が
可能であり、ハードウエアにより容易に実現することが
できる。In this tape streamer drive 10,
Data compression processing and decompression processing can be performed by one-pass processing, respectively, and a sufficient compression rate could be obtained with a slide dictionary having a small capacity of about 512 bytes. The compression ratio was about 56% for text files and about 46% for program execution files. Therefore, the capacity per tape winding can be increased in appearance. In addition, since processing can be performed in one pass, software can be used and can be easily realized by hardware.

【００６６】[0066]

【発明の効果】以上のように、本発明によれば、１パス
で処理が可能でしかも小さな辞書で十分な圧縮率を得る
ことができ、ハードウエア化が容易なデータ圧縮方法、
データ圧縮・伸長方法、データ圧縮装置及びデータ圧縮
・伸長装置を提供することができる。As described above, according to the present invention, a data compression method which can be processed in one pass, can obtain a sufficient compression ratio with a small dictionary, and can be easily implemented in hardware,
A data compression / expansion method, a data compression device, and a data compression / expansion device can be provided.

[Brief description of the drawings]

【図１】本発明を適用したテープストリーマドライブの
構成を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration of a tape streamer drive to which the present invention has been applied.

【図２】上記テープストリーマドライブにおける圧縮／
伸長部の機能的な構成を示すブロック図である。FIG. 2 shows compression / compression in the tape streamer drive.
FIG. 3 is a block diagram illustrating a functional configuration of an extension unit.

【図３】上記圧縮／伸長部により実行される圧縮処理の
手順を示すフローチャートである。FIG. 3 is a flowchart illustrating a procedure of a compression process performed by the compression / decompression unit.

【図４】上記圧縮／伸長部により実行される伸長処理の
手順を示すフローチャートである。FIG. 4 is a flowchart showing a procedure of a decompression process executed by the compression / decompression unit.

【図５】上記圧縮／伸長部による圧縮処理の具体例の説
明に供する図である。FIG. 5 is a diagram provided for describing a specific example of a compression process performed by the compression / decompression unit.

[Explanation of symbols]

１０テープストリーマドライブ、１１ＳＣＳＩイン
ターフェイス、１２圧縮／伸長処理部、１３ＩＦ／Ｅ
ＣＣコントローラ、１４バッファメモリ、１５記録
用ＲＦ処理部、１６再生用ＲＦ処理部、１７ロータ
リートランス、１８Ａ，１８Ｂ記録ヘッド、１９Ａ，
１９Ｂ再生ヘッド、２０ホストコンピュータ、１２
１スライド辞書部、１２２ベースポインタ部、１２
３インターバル算出部、１２４ガンマ符号化部10 tape streamer drive, 11 SCSI interface, 12 compression / decompression processing unit, 13 IF / E
CC controller, 14 buffer memory, 15 RF processing unit for recording, 16 RF processing unit for reproduction, 17 rotary transformer, 18A, 18B recording head, 19A,
19B playback head, 20 host computer, 12
1 slide dictionary unit, 122 base pointer unit, 12
3 Interval calculator, 124 gamma encoder

Claims

[Claims]

1. Data compression by a slide dictionary method in which read data is registered as a dictionary, a longest matching character string is searched from the dictionary, and an original code is replaced by a length matching a matching position and output. A dictionary lookup in the encoding from the position of the base pointer, encoding the interval from the base pointer to the first matched character, and setting the base pointer to the character before encoding the next character. A data compression method characterized by moving to a coincident position.

2. The data compression method according to claim 1, wherein when encoding said interval, a small interval is encoded into a short code word.

3. The data compression method according to claim 1, wherein a short code is output by comparing a word length when encoding the interval with a word length of a character.

4. When encoding and outputting a character, selectively switching between a method of encoding and compressing an interval from the base pointer to the first matching character and another compression method. The data compression method according to claim 3.

5. The data compression method according to claim 1, wherein when interval 1 is continuous, the interval 1 is encoded as run length.

6. The data compression method according to claim 1, wherein when the base pointer is moved, the character entered immediately before and the character to be encoded are moved to a position where the two characters match.

7. A data compression by a slide dictionary method in which read data is registered as a dictionary, a longest matching character string is searched from the dictionary, and an original code is replaced by a length matching a matching position and output. In a decompression method, at the time of compression, dictionary reference in encoding is performed from the position of the base pointer, the interval from the base pointer to the first matching character is encoded, and before encoding the next character, Encoding is performed by repeating the process of moving the base pointer to the position where the character matches, and at the time of decompression, for the encoded character, reference is made to the dictionary in decoding from the position of the base pointer for decoding, and matching is performed first from the base pointer. The base pointer to the position where the character matched before decoding the next character. Data compression and decompression method characterized by decoding by repeating the process of.

8. A data compression method using a slide dictionary method in which the read data is registered as a dictionary, the longest matching character string is searched from the dictionary, and the original code is replaced with a length matching the matching position and output. A data compression device for performing processing, a slide dictionary unit in which read data is registered as a dictionary; a base pointer for designating a position where a previous character matched as a reference position of the slide dictionary unit in encoding; A data compression device comprising: an interval calculation unit that calculates an interval from a base pointer to a first matching character; and an encoding unit that encodes the interval calculated by the interval calculation unit.

9. The data compression apparatus according to claim 8, wherein said encoding section encodes a small interval into a short codeword when encoding said interval.

10. The data compression apparatus according to claim 8, wherein the encoding unit compares a word length when encoding the interval with a word length of a character and outputs a short code.

11. The data compression apparatus according to claim 8, wherein, when the interval 1 is continuous, the encoding unit encodes the interval 1 as a run length.

12. When moving the base pointer,
9. The data compression apparatus according to claim 8, wherein the character compression unit moves to a position where the immediately preceding character and the character to be encoded match.

13. Data compression by a slide dictionary method in which read data is registered as a dictionary, a longest matching character string is searched from the dictionary, and an original code is replaced with a length matching a matching position and output. A decompression device, which references a dictionary in encoding from the position of a base pointer, encodes an interval from the base pointer to the first matching character, and encodes the base pointer before encoding the next character. Compression processing means for performing encoding by repeating a process of moving to a position where a character is matched; and referring to a dictionary for decoding of the encoded character from the position of the base pointer, and the first matched character from the base pointer. Before decoding the next character, and moving the base pointer to the position where the character matched before decoding the next character. Data compression and decompression device, characterized in that it comprises a decompression processing means for decoding by repeating.