JPH0659857A

JPH0659857A - Data compressor

Info

Publication number: JPH0659857A
Application number: JP20998692A
Authority: JP
Inventors: Ryuichi Shiomi; 隆一塩見
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1992-08-06
Filing date: 1992-08-06
Publication date: 1994-03-04

Abstract

PURPOSE:To realize a high compression rate by employing a multidimensional space for a character storage section so as to give a directivity to coincident character strings thereby generating much more codes in the same size of the character storage section. CONSTITUTION:A character string coincidence discrimination section 12 discriminates a position, a length and a direction of a Japanese word character group in the inside of a character storage section 11 coincident with an inputted Japanese character string 16 longest from the head thereof and outputs the discriminated result. A coding section 13 encodes an output of the character string coincidence discrimination section 12 into a binary number. The Huffman coding may be employed for the coding. The data compression is realized by the coding. A 1st decoding section 14 receives a coded character string 17. The 1st decoding section 14 decodes the coded character string 17 into the position, the length and the direction of the Japanese character group in the inside of the character storage section 11. A 2nd decoding section 15 decodes the decoded character string from the character storage section 11 based on the position, the length and the direction of the Japanese character group in the inside of the character storage section 11.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、入力文字列を符号化文
字列に符号化し，符号化文字列を入力文字列に復号化す
るデータ圧縮装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data compression apparatus for encoding an input character string into an encoded character string and decoding the encoded character string into an input character string.

【０００２】[0002]

【従来の技術】近年、大量の情報を処理する必要からデ
ータ圧縮符号化復号化装置が利用されている。圧縮符号
化復号化装置としては、ＬＺＳＳ法，ＬＺＷ法（「必読
・データ圧縮のアルゴリズムと実践」，ＴｈｅＢａｓ
ｉｃ，１９８９年３月号ｐ１−６５）を用いた装置が挙
げられる。図５にこれらの装置の構成図の例を示し、そ
の動作を順を追って説明する。これらの装置において
は、入力された入力文字列５７と文字格納部５１内部の
第二文字列とを文字列一致判定部５２が比較し、前記比
較結果を符号化部５３が符号化文字５８に符号化する。
前記各符号化文字５８が前記符号化毎、文字格納部更新
部５６は前記入力文字列５７と前記比較結果を用いて前
記文字格納部５１内部の第二文字列を更新する。第一復
号化部５４は、前記符号化文字５８を前記文字格納部５
１内部の第二文字列と前記入力文字列５７とを前記文字
列一致判定部５２が比較した前記比較結果に復号化を行
ない、第二復号化部５５は前記比較結果を復号化文字列
５９に復号化を行なう。前記各符号化文字５８が復号化
される毎、前記文字格納部更新部５６は前記比較結果と
前記復号化文字列５８を用いて前記文字格納部５１内部
の第二文字列を更新する。2. Description of the Related Art In recent years, a data compression encoding / decoding apparatus has been used because it is necessary to process a large amount of information. As the compression encoding / decoding device, the LZSS method, the LZW method (“Mandatory and data compression algorithms and practices”, The Bas
ic, March 1989, p1-65). FIG. 5 shows an example of a block diagram of these devices, and their operation will be described step by step. In these devices, the character string coincidence determination unit 52 compares the input character string 57 that has been input with the second character string inside the character storage unit 51, and the encoding unit 53 outputs the comparison result to the encoded character 58. Encode.
Each time the encoded characters 58 are encoded, the character storage unit updating unit 56 updates the second character string inside the character storage unit 51 using the input character string 57 and the comparison result. The first decoding unit 54 stores the encoded characters 58 in the character storage unit 5.
The second character string in 1 and the input character string 57 are decoded by the comparison result obtained by the character string matching determination unit 52, and the second decoding unit 55 decodes the comparison result into the decoded character string 59. Decrypt to. Each time each encoded character 58 is decoded, the character storage unit updating unit 56 updates the second character string inside the character storage unit 51 using the comparison result and the decoded character string 58.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、上記の
ような処理で高い圧縮率を実現するためには、大容量の
文字格納部を必要とする問題点を有している。本発明は
前記問題点を解決するため、文字格納部を多次元の空間
とし、一致参照する文字群を多次元方向に参照し、同じ
大きさの文字格納部で圧縮対象の文字列に対し、より多
くの種類の文字列を置換えることにより、高い圧縮率を
実現できるデータ圧縮装置を提供することを目的とす
る。However, in order to realize a high compression rate by the above processing, there is a problem that a large-capacity character storage unit is required. In order to solve the above problems, the present invention uses a character storage unit as a multidimensional space, refers to a group of characters for coincidence reference in a multidimensional direction, and in a character storage unit of the same size, for a character string to be compressed, An object of the present invention is to provide a data compression device that can realize a high compression rate by replacing more types of character strings.

【０００４】[0004]

【課題を解決するための手段】本発明のデータ圧縮装置
は、文字群を格納する文字格納部と、入力文字列と前記
文字格納部内部の前記文字群が一致している位置と長さ
と方向を判定する文字列一致判定部と、前記文字列位置
と長さと方向を符号化文字列に符号化する符号化部と、
前記符号化文字列を前記文字格納部内部の前記文字群の
位置と長さと方向に復号化する第一復号化部と、前記文
字格納部内部の前記文字群の位置と長さと方向から復号
化文字列に復号化する第二復号化部を備えている。A data compression apparatus according to the present invention includes a character storage unit for storing a character group, a position, a length and a direction in which an input character string and the character group in the character storage unit match each other. A character string matching determination unit that determines, and an encoding unit that encodes the character string position, length, and direction into an encoded character string,
A first decoding unit for decoding the encoded character string in the position, length and direction of the character group inside the character storage unit; and decoding from the position, length and direction of the character group in the character storage unit A second decoding unit for decoding into a character string is provided.

【０００５】[0005]

【作用】本発明は、文字格納部を多次元化することによ
り、高い圧縮率で符号化復号化を行なうことができる。The present invention makes it possible to perform encoding / decoding at a high compression rate by making the character storage unit multidimensional.

【０００６】[0006]

【実施例】以下本発明の一実施例のデータ圧縮装置いつ
いて、図面を参照しながら説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS A data compression apparatus according to an embodiment of the present invention will be described below with reference to the drawings.

【０００７】図１は本発明の実施例におけるデータ圧縮
装置の構成を示すものである。図１において、１１は文
字群を格納する文字格納部、１２は入力文字列と文字格
納部１１内部の文字群が一致している先頭位置と一致し
ている文字長さと一致している文字列の方向を判定する
文字列一致判定部、１３は文字格納部１１内部の文字群
の位置と長さと方向を符号化する符号化部、１４は符号
化文字列を文字格納部１１内部の文字群の位置と長さと
方向に復号する第一復号化部、１５は文字格納部１１内
部の文字群の位置と長さと方向を符号化される前の文字
列に復号する第２復号化部、１６は符号化される入力文
字列、１７は符号化された符号化文字列、１８は前記符
号化文字列から復号化された前記入力文字列である。FIG. 1 shows the configuration of a data compression apparatus according to an embodiment of the present invention. In FIG. 1, 11 is a character storage unit that stores a character group, 12 is a character string that matches the input character string and the start position where the character groups inside the character storage unit 11 match, and the character length that matches Of the character string in the character storage unit 11, 13 is an encoding unit for encoding the position, length and direction of the character group in the character storage unit 11, and 14 is a character group in the character storage unit 11 for the encoded character string. , A second decoding unit 15 for decoding the position, length and direction of the character group in the character storage unit 11 into a character string before being encoded, 16 Is an input character string to be encoded, 17 is an encoded coded character string, and 18 is the input character string decoded from the coded character string.

【０００８】以上のように構成されたデータ圧縮装置に
おいて、入力された日本語文字列の符号化を図２を用い
て、符号化文字列の復号化を図４を用いて、具体的な例
として図３を用いて説明する。In the data compression apparatus configured as described above, a specific example is shown in FIG. 2 for encoding the input Japanese character string and in FIG. 4 for decoding the encoded character string. Will be described with reference to FIG.

【０００９】図２は図１で示したデータ圧縮装置の符号
化時の動作を示すフローチャートである。FIG. 2 is a flow chart showing the operation of the data compression apparatus shown in FIG. 1 at the time of encoding.

【００１０】まず、入力日本語文字列１６は文字列一致
判定部１２に入力される（２０１）。文字列一致判定部
１２は、入力日本語文字列１６の先頭から最長一致する
文字格納部１１内部の日本語文字群の先頭位置と文字が
一致している長さと文字が一致している方向を判定する
（２０２）。図３（Ｂ）は入力された入力文字列１６、
図３（Ａ）は文字格納部１１内部の文字群の例である。
文字格納部１１内部の文字群は出現頻度の高い文字の連
続が格納されている。入力文字列「見ていながら無視し
た」は、文字格納部１１内部の文字群の文字位置（４，
４）、文字一致長さ２、文字一致方向＋ｘで「見て」が
一致している。文字列一致判定部１２はこの情報を出力
する。次に、文字列一致判定部１２は入力文字列１６の
残り「いなが」を、同様の手順で文字位置（１４，
３）、文字一致長さ３文字一致方向−ｙを出力する。更
に文字列一致判定部１２は「ら」と文字格納部１１内部
の文字群を比較する。しかし、文字群に「ら」は存在し
ないため、存在しないという情報と、「ら」の文字コー
ド（例えばＪＩＳコード）を出力する。文字列一致判定
部１２は、残りの「無視した」に対しても同様の手順で
「無視」と「した」の２つの一致位置と長さと方向を出
力する。図３（Ｄ）は入力文字列「見ていながら無視し
た」に対する文字列一致判定部１２の出力をまとめたも
のである。First, the input Japanese character string 16 is input to the character string matching determination unit 12 (201). The character string match determination unit 12 determines the length of the input Japanese character string 16 and the direction in which the character and the start position of the Japanese character group in the character storage unit 11 that match the longest match. A determination is made (202). FIG. 3B shows the input character string 16,
FIG. 3A is an example of a character group inside the character storage unit 11.
The character group inside the character storage unit 11 stores a series of characters that appear frequently. The input character string “Ignored while looking” is the character position (4, 4) of the character group in the character storage unit 11.
4), the character matching length is 2, and the character matching direction is + x. The character string matching determination unit 12 outputs this information. Next, the character string match determination unit 12 determines the remaining "Inaga" of the input character string 16 in the same manner as the character position (14,
3), the character matching length 3 and the character matching direction -y are output. Further, the character string matching determination unit 12 compares “ra” with the character group inside the character storage unit 11. However, since "ra" does not exist in the character group, the information that it does not exist and the character code of "ra" (for example, the JIS code) are output. The character string matching determination unit 12 outputs the two matching positions of “ignored” and “done”, the length, and the direction in the same manner for the remaining “ignored”. FIG. 3D summarizes the output of the character string match determination unit 12 for the input character string “Ignored while looking”.

【００１１】符号化部１３は文字列一致判定部１２の出
力を符号化する（２０５，２０６，２０７）。図３
（Ｃ）は文字位置、文字一致長さ、文字一致方向を符号
化する符号表の例であり、文字位置をＸ座標４ビット、
Ｙ座標３ビット、文字長さ３ビット、方向２ビットで符
号化することを表わしている。The encoding unit 13 encodes the output of the character string matching determination unit 12 (205, 206, 207). Figure 3
(C) is an example of a code table that encodes a character position, a character matching length, and a character matching direction.
It represents that the Y coordinate is 3 bits, the character length is 3 bits, and the direction is 2 bits.

【００１２】図３の例で符号化部１３は文字列一致判定
部１２の最初の出力、文字位置（４４，４）、文字一致
長さ２、文字方向＋Ｘを各符号表に基づき０１００１０
００１０００に符号化する。同様の手順で残りも出力を
行なうが、文字格納部に存在しない文字「ら」にたいし
ては、存在しないことを表現する符号１１１１１１１１
１１１１と「ら」の文字コード（例えばＪＩＳコード）
を符号として出力する（２０８，２０９）。In the example of FIG. 3, the encoding unit 13 outputs the first output of the character string match determination unit 12, the character position (44, 4), the character match length 2, and the character direction + X based on each code table.
It is encoded to 001000. The rest is also output by the same procedure, but for the character "ra" that does not exist in the character storage unit, a code 11111111 expressing that it does not exist.
Character code of 1111 and "ra" (eg JIS code)
Is output as a code (208, 209).

【００１３】一般に漢字１文字は１６ビットで表現され
る。上記符号化の結果、入力文字「見ていながら無視し
た」は１６０ビットから７６ビットに符号化され圧縮さ
れた。とくに、「いなが」を１２ビットに符号化できた
のは、文字格納部が２次元化されていた効果である。Generally, one kanji character is represented by 16 bits. As a result of the above encoding, the input character "Ignored while looking" was encoded and compressed from 160 bits to 76 bits. In particular, the fact that "Inaga" could be encoded in 12 bits is an effect of the two-dimensional character storage unit.

【００１４】さらに、この符号化にハフマン符号化を用
いることにより、符号化文字を更に圧縮する事が可能で
ある。Further, by using Huffman coding for this coding, it is possible to further compress the coded characters.

【００１５】図４は図１で示したデータ圧縮装置の復号
化時の動作を示すフローチャートである。FIG. 4 is a flow chart showing the decoding operation of the data compression apparatus shown in FIG.

【００１６】まず、符号化された符号化文字列１７が第
一復号化部１４に入力される（４１）。第一復号化部１
４は、符号化文字列１７を、文字格納部１１内部の日本
語文字群の頭位置と長さと方向に復号する（４４，４
５，４６）。図３の１２ビットの符号化文字列１７の最
初４ビットと次の３ビットから、符号表（３３）に基づ
き文字位置を復号化する。次の３ビットから長さを復号
化し、残りの２ビットから方向を復号化する。ただし、
１２ビットが１１１１１１１１１１１１の場合、次に文
字コードそのものが格納されているので、それを取り出
す（４８）。First, the encoded coded character string 17 is input to the first decoding unit 14 (41). First decryption unit 1
4 decodes the encoded character string 17 in the head position, length and direction of the Japanese character group in the character storage unit 11 (44, 4).
5,46). The character position is decoded from the first 4 bits and the next 3 bits of the 12-bit encoded character string 17 of FIG. 3 based on the code table (33). Decode the length from the next 3 bits and the direction from the remaining 2 bits. However,
If the 12 bits are 111111111111, the character code itself is stored next, and it is extracted (48).

【００１７】第二復号化部１５は、第一復号化部１４が
復号化した文字格納部１１内部の日本語文字群の位置と
長さと方向を用いて、文字格納部１１から入力文字列を
復号化する（４７）。図３の例では、文字位置（４，
４）、長さ２、方向＋ｘから「見て」を復号化する。同
様の手順で、残りの文字を復号化するが、第一復号化部
から文字群内に存在しない文字という出力があった場
合、次に送られて来る出力は文字コードであるので、そ
れをそのまま出力する。The second decoding unit 15 uses the position, length and direction of the Japanese character group inside the character storage unit 11 decoded by the first decoding unit 14 to input the input character string from the character storage unit 11. Decrypt (47). In the example of FIG. 3, character positions (4,
4) Decode "look" from length 2, direction + x. Follow the same procedure to decode the remaining characters, but if there is an output from the first decoding unit that is a character that does not exist in the character group, the output sent next is the character code. Output as is.

【００１８】なお、実施例では入力文字列は日本語であ
ったが、英語等の他言語や画像の画素情報を直列に並べ
たデジタル数値列についても同様の効果が得られる。In the embodiment, the input character string is Japanese, but the same effect can be obtained for other languages such as English and a digital numerical value sequence in which pixel information of an image is arranged in series.

【００１９】さらに、実施例では文字格納部を２次元化
しているが３次元化４次元化することにより高い圧縮率
を実現することができる。Furthermore, in the embodiment, the character storage unit is made into a two-dimensional structure, but by making it into a three-dimensional structure and a four-dimensional structure, a high compression rate can be realized.

【００２０】また、実施例では、文字格納部内部の文字
群は固定であったが、圧縮時に順次更新してもよい。Further, in the embodiment, the character group inside the character storage unit is fixed, but it may be sequentially updated at the time of compression.

【００２１】[0021]

【発明の効果】以上のように本発明のデータ圧縮装置
は、文字群を格納する文字格納部と、入力文字列と前記
文字格納部内部の前記文字群が一致している位置と長さ
と方向を判定する文字列一致判定部と、前記文字群の位
置と長さと方向を符号化文字列に符号化する符号化部
と、前記符号化文字列を前記文字格納部内部の前記文字
群の位置と長さと方向に復号化する第一復号化部と、前
記文字格納部内部の前記文字群の位置と長さと方向から
復号化文字列に復号化する第二復号化部を設けることに
より、高い圧縮率で符号化復号化することを可能にす
る。As described above, the data compression apparatus of the present invention has a character storage unit for storing a character group, a position, a length, and a direction in which an input character string and the character group in the character storage unit match each other. A character string matching determination unit, an encoding unit that encodes the position, length, and direction of the character group into an encoded character string, and the encoded character string is the position of the character group inside the character storage unit. By providing a first decoding unit for decoding in the length and direction and a second decoding unit for decoding the position, length and direction of the character group in the character storage unit into a decoded character string, It enables encoding and decoding at a compression rate.

【００２２】更に、符号化部にハフマン符号化を、第一
復号化部にハフマン復号化を用いることにより符号化文
字を更に高い圧縮率を実現することを可能とする。Further, by using Huffman coding in the coding unit and Huffman decoding in the first decoding unit, it is possible to realize a higher compression ratio of the coded character.

【００２３】また、文字格納部内部の文字群を固定する
ことにより、符号化文字列の任意の符号化文字開始位置
からの復号化を可能とする。By fixing the character group in the character storage unit, it is possible to decode the encoded character string from an arbitrary encoded character start position.

[Brief description of drawings]

【図１】本発明の実施例におけるデータ圧縮装置の構成
図FIG. 1 is a configuration diagram of a data compression device according to an embodiment of the present invention.

【図２】本発明の実施例における符号化処理の流れを表
わすフローチャートFIG. 2 is a flowchart showing a flow of an encoding process in the embodiment of the present invention.

【図３】（Ａ）は本発明の実施例における文字格納部１
１内部の文字群を示す図（Ｂ）は本発明の実施例における入力文字列１６を示す
図（Ｃ）は本発明の実施例における符号化部１３が出力す
る符号の一覧表を示す図（Ｄ）は本発明の実施例における文字列一致判定部１２
の出力を示す図FIG. 3A is a character storage unit 1 according to an embodiment of the present invention.
1 shows a character group inside 1 (B) shows an input character string 16 in the embodiment of the present invention (C) shows a list of codes output by the encoding unit 13 in the embodiment of the present invention D) is the character string matching determination unit 12 in the embodiment of the present invention.
Showing the output of

【図４】本発明の実施例における復号号化処理の流れを
表わすフローチャートFIG. 4 is a flowchart showing a flow of a decoding process in the embodiment of the present invention.

【図５】従来の情報符号化復号化装置の構成図FIG. 5 is a configuration diagram of a conventional information encoding / decoding device.

[Explanation of symbols]

１１文字格納部１２文字列一致判定部１３符号化部１４第一復号化部１５第二復号化部１６入力文字列１７符号化文字列１８復号化文字列 11 character storage unit 12 character string match determination unit 13 encoding unit 14 first decoding unit 15 second decoding unit 16 input character string 17 encoded character string 18 decoded character string

Claims

[Claims]

1. A character storage unit for storing a character group, and a direction in which the input character string and the character group in the character storage unit are in coincidence with a start position in which the character group is coincident with a character length are determined. A character string matching determination unit, an encoding unit that encodes the position, length, and direction of the character group into an encoded character string, and the encoded character string is the position and length of the character group inside the character storage unit. And a second decoding unit for decoding the character group in the character storage unit from the position, length and direction of the character group into a decoded character string. Data compressor device.

2. The character string match determination unit outputs information indicating that the character in the input character string is not in the character group in the character storage unit, and the encoding unit information-encodes the state. The first code and the second code that encodes the characters that did not exist in the character storage unit are output, and the first decoding unit decodes the second code into the first code and the characters in the input character string. 2. The data compression device according to claim 1, wherein the second decoding unit decodes the character in the input character string immediately after the first code as a character in the decoded character string.