JPH056260A

JPH056260A - Japanese data compressing system

Info

Publication number: JPH056260A
Application number: JP15658791A
Authority: JP
Inventors: Yoshihisa Aotani; 嘉久青谷; Yoshifumi Okada; 好史岡田
Original assignee: NEC Corp; NEC AccessTechnica Ltd
Current assignee: NEC Platforms Ltd; NEC Corp
Priority date: 1991-06-27
Filing date: 1991-06-27
Publication date: 1993-01-14

Abstract

PURPOSE:To efficiently compress Japanese data composed of two byte codes according to a run length encoding system by executing the preprocessing of compression to compress continuous character strings at every two strings. CONSTITUTION:In the preprocessing of compression, a data train A to be compressed is successively read for each character (high-order byte) and compared, the codes of the continuous characters and the number of continuous characters are recorded together with a special character showing the compression, and preprocessing data B of compression are prepared. Next, compressed data C are prepared by compressing low-order bytes. Thus, compressibility can be improved in comparison with data C' compressed by normal run length encoding.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、データ圧縮方式に関
し、特に２バイトから構成される日本語データの圧縮方
式に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data compression system, and more particularly to a compression system for Japanese data composed of 2 bytes.

【０００２】[0002]

【従来の技術】従来のこの種のデータ圧縮方式は、１バ
イトキャラクタを１つの単位としてキャラクタがいくつ
か連続している場合には、それを圧縮を示す特殊文字と
繰り返し文字データと繰り返し回数とに変換することに
よりデータの縮小を実現していた。2. Description of the Related Art In the conventional data compression system of this type, when a character is a sequence of one byte character and several characters are consecutive, a special character indicating compression, repeated character data, and the number of repetitions are used. Data was reduced by converting to.

【０００３】図２に、従来のデータ圧縮方式を示す。被
圧縮データ１は、通常のランレングス（ｕｎ−ｌｅｎｇ
ｔｈ）圧縮処理部２で圧縮され、圧縮データ２を得てい
る。FIG. 2 shows a conventional data compression method. The compressed data 1 is a normal run length (un-length).
th) Compressed by the compression processing unit 2 to obtain compressed data 2.

【０００４】図３に、圧縮処理部２での上記変換のアル
ゴリズムを示すフローチャートを示。以下図３に従って
説明する。初めに、文字カウンタＣｃと繰り返しカウン
タＣｒが０に設定される（ステップＳ１，Ｓ２）。原デ
ータから１文字を読み出した後（ステップＳ３）、文字
カウンタＣｃが１だけ加算される（ステップＳ４）。文
字カウンタの値はこの読み出した文字と比較される（ス
テップＳ５）。最初のサイクルではこの比較は必ず真と
なり、原データが４個以上の繰り返し文字から構成され
ているかを調べるために読み出された文字は、バッファ
に格納される（ステップＳ６）。２回目以降のサイクル
では、原データから読み出された文字がバッファに格納
されている文字と比較される（ステップＳ７）。原文字
が格納されている文字と等しければ、４個以上の同じ文
字が繰り返されているので圧縮される。文字が格納され
ている文字と等しいとき、繰り返しカウンタＣｒは１つ
加算され（ステップＳ８）、そして他の文字が原データ
から読み出される。原文字が格納されている文字と等し
くなければ、繰り返しカウンタＣｒは４と比較される
（ステップＳ９）。より少なければ、３文字しか同じ文
字が繰り返されていないので圧縮は行われない。このよ
うにして繰り返しカウンタＣｒが４以上の時に、圧縮形
式が作成される（ステップＳ１０）。FIG. 3 is a flowchart showing an algorithm of the above conversion in the compression processing section 2. This will be described below with reference to FIG. First, the character counter Cc and the repeat counter Cr are set to 0 (steps S1 and S2). After reading one character from the original data (step S3), the character counter Cc is incremented by 1 (step S4). The value of the character counter is compared with the read character (step S5). In the first cycle, this comparison is always true, and the character read to check whether the original data is composed of four or more repeated characters is stored in the buffer (step S6). In the second and subsequent cycles, the character read from the original data is compared with the character stored in the buffer (step S7). If the original character is the same as the stored character, it is compressed because four or more same characters are repeated. When the character is equal to the stored character, the repeat counter Cr is incremented by 1 (step S8), and another character is read from the original data. If the original character is not equal to the stored character, the repeat counter Cr is compared with 4 (step S9). If less, no compression is performed because only three characters are repeated. In this way, when the repeat counter Cr is 4 or more, a compression format is created (step S10).

【０００５】[0005]

【発明が解決しようとする課題】この従来のデータ圧縮
方式では、ＡＳＣＩＩコード等の１バイトコード文字か
らなるデータ列において、連続して同じキャラクタが出
現する場合にデータサイの縮小をはかることが可能であ
るが、シフトＪＩＳコード等の２バイトコード文字から
なるデータ列においては、たとえ同じキャラクタが連続
して出現しても、隣合う１バイトデータは等しくないた
めデータサイズの縮小ができないという問題点があっ
た。In this conventional data compression method, it is possible to reduce the data size when the same character appears consecutively in a data string consisting of 1-byte code characters such as ASCII code. However, in a data string consisting of 2-byte code characters such as shift JIS code, even if the same character appears consecutively, the adjacent 1-byte data is not equal, so the data size cannot be reduced. was there.

【０００６】本発明の目的は、このような問題点を解決
した日本語データ圧縮方式を提供すとにある。An object of the present invention is to provide a Japanese data compression method that solves such problems.

【０００７】[0007]

【課題を解決するための手段】本発明の日本語データ圧
縮方式は、同じキャラクタが連続するデータの圧縮を示
す特殊文字と繰り返し文字データと繰り返し回数とに変
換することにより、データサイズを縮小するデータ圧縮
方式に、特に、上位・下位の２バイトからなる日本語デ
ータにおいて、その上位バイトについて着目し、１バイ
トおきに（上位バイト）同じキャクタが連続するデータ
列を上位バイトの繰り返しを示す特殊文字と繰り返す文
字と繰り返し回数と下位バイトのみのデータ列とに変換
する前処理を付加したことを特徴とする。According to the Japanese data compression method of the present invention, the data size is reduced by converting into a special character indicating the compression of data in which the same character is continuous, repeated character data, and the number of repetitions. In the data compression method, especially in Japanese data consisting of upper and lower 2 bytes, paying attention to the upper byte, every other byte (upper byte) is a special character that indicates the repetition of the upper byte in a data string in which the same character continues. It is characterized by adding a pre-processing for converting a character, a character to be repeated, the number of repetitions, and a data string having only lower bytes.

【０００８】また本発明の日本語データ圧縮方式は、連
続するデータが同じキャラクタである場合に、圧縮を示
す特殊文字と繰り返し文字データと繰り返し回数とに変
換することにより、データサイズを縮小するというデー
タ圧縮方式に、特に、上位・下位の２バイトからなる日
本語データにおいて、その上位バイトのみに着目し、そ
れが前もって定めたコードと一致しかつ連続する場合に
それを前記のデータ圧縮方式と同様に圧縮を示す特殊文
字と繰り返し文字データと繰り返し回数とに変換し、そ
の後に圧縮したキャラクタ分の下位バイトを連ねる前処
を付加したことを特徴とする。Further, the Japanese data compression method of the present invention reduces the data size by converting special characters indicating compression, repeated character data, and the number of repetitions when consecutive data are the same character. Regarding the data compression method, particularly in Japanese data consisting of upper and lower 2 bytes, paying attention to only the upper byte, and when it matches and is continuous with the code defined in advance, it is referred to as the above data compression method. Similarly, it is characterized in that it is converted into a special character indicating compression, repeated character data, and the number of times of repetition, and then a front end for connecting the lower bytes of the compressed character is added.

【０００９】[0009]

【実施例】次に本発明の実施例について図面を参照して
説明する。Embodiments of the present invention will now be described with reference to the drawings.

【００１０】図１は本発明の一実施例を示すブロック図
である。被圧縮データ１は、例えばシフトＪＩＳを用い
た２バイト１キャラクタの日本語データである。これを
本発の圧縮前処理部１２で圧縮した後、従来技術でも述
べた通常のランレングス圧縮を圧縮処理部１３で行う、
その結果できあがった圧縮されたデータが圧縮データ４
である。FIG. 1 is a block diagram showing an embodiment of the present invention. The compressed data 1 is 2-byte 1-character Japanese data using shift JIS, for example. After compressing this by the compression pre-processing unit 12 of the present invention, the normal run-length compression described in the related art is performed by the compression processing unit 13.
The resulting compressed data is compressed data 4
Is.

【００１１】圧縮前処理部２の詳細としてアリゴリズム
を図４に示す。圧縮前処理部の一実施例を図４に従って
説明する。初めに、圧縮動作中下位バイトを蓄えるため
のスタックをクリアするとともに（ステップＳ１１）、
文字カウンウタＣｃおよび繰り返しカウンタＣｒを０に
リセットする（ステップＳ１２，Ｓ１３）。被圧縮デー
タ１から１文字（上位バイト）を読み出した後（ステッ
プＳ１４）、さらにもう１文字（下位バイト）を読み込
み（ステップＳ１５）、文字カウンタを１だけ加算する
（ステップＳ１６）。次に文字カウンタの値が１の場合
は（ステップＳ１７）、まず連続するキャラクタとして
比較用のレジスタに読み込んだ上位バイトがセットされ
（ステップＳ１８）、次の文字を読み込む処理へ戻る。FIG. 4 shows an algorithm as a detail of the pre-compression processing unit 2. An embodiment of the compression preprocessing unit will be described with reference to FIG. First, while clearing the stack for storing the lower byte during the compression operation (step S11),
The character counter Cc and the repeat counter Cr are reset to 0 (steps S12 and S13). After reading one character (upper byte) from the compressed data 1 (step S14), another character (lower byte) is read (step S15), and the character counter is incremented by 1 (step S16). Next, when the value of the character counter is 1 (step S17), the upper byte read in the register for comparison is set as a continuous character (step S18), and the process returns to the process of reading the next character.

【００１２】２回目以降のサイクルでは、被圧縮データ
１１から読み出された文字（上位バイト）がレジスタに
入っている文字と比較される（ステップＳ１９）。この
比較で同じキャラクタの場合は、４個以上の同じ文字が
１つおきに連続しているので圧縮される。文字がレジス
タの文字と等しいとき、繰り返しカウンタは１つ加算さ
れ（ステップＳ２０）、再びループを繰り返す。現読み
出し文字（上位）と等しくなければ（ステップＳ２
１）、繰り返しカウンタは４と比較され、より少なけ
ば、圧縮効果が得られないので圧縮処理は行わない。４
以上の時は、圧縮効果が得られるので圧縮形式に変換す
る（ステップＳ２２）。ここで、圧縮形式は、圧縮を示
す特殊文字１バイトと、繰り返し文字１バイトと、繰り
返しカウンタの値（１バイト）と、スタックに蓄えられ
た下位バイト列とで構成される。下位バイト列長は繰り
返しカウンタの値に等しい。In the second and subsequent cycles, the character (upper byte) read from the compressed data 11 is compared with the character stored in the register (step S19). In the case of the same character in this comparison, four or more same characters are consecutive and are therefore compressed. When the character is equal to the character in the register, the repeat counter is incremented by 1 (step S20) and the loop is repeated again. If not equal to the current read character (upper) (step S2
1), the repetition counter is compared with 4, and if the number is smaller, the compression effect cannot be obtained, so the compression processing is not performed. Four
In the above cases, the compression effect can be obtained, so the data is converted to the compression format (step S22). Here, the compression format is composed of a special character 1 byte indicating compression, a repeat character 1 byte, a value of a repeat counter (1 byte), and a lower byte string stored in the stack. The lower byte string length is equal to the repeat counter value.

【００１３】図５に圧縮前後のデータ列の一例を示す。
被圧縮データ列Ａは、圧縮前処理部２により、圧縮デー
タＢに圧縮される。FIG. 5 shows an example of a data string before and after compression.
The compressed data sequence A is compressed into compressed data B by the pre-compression processing unit 2.

【００１４】圧縮前処理部１２で前処理された圧縮デー
タＢは、通常のランレングス圧縮処理部１３で図５の圧
縮データＣに圧縮して、図１の圧縮データ１４を得る。
図５には、比較のために、図２の通常のランレングス圧
縮処理部２で、被圧縮データＡを圧縮して得られた圧縮
データＣ′を示す。従来圧縮できなかった文字列が本実
施例により圧縮可能となることがわかる。The compressed data B pre-processed by the compression pre-processing unit 12 is compressed into the compressed data C of FIG. 5 by the normal run-length compression processing unit 13 to obtain the compressed data 14 of FIG.
For comparison, FIG. 5 shows compressed data C ′ obtained by compressing the compressed data A by the normal run-length compression processing unit 2 of FIG. It can be seen that a character string that could not be compressed conventionally can be compressed by this embodiment.

【００１５】ところで、復元処理は特殊文字を検出した
とき、特殊文字の後に続く連続文字と連続カウントを読
み込み、連続カウンタの値だけ連続文字を上位バイトと
して下位バイトを読み込むたびに上位バイトを付加して
２バイト文字を出力することによって実現する。By the way, in the restoration process, when a special character is detected, the continuous character and the continuous count following the special character are read, and the upper byte is added every time the lower byte is read by setting the continuous character as the upper byte by the value of the continuous counter. It is realized by outputting a 2-byte character.

【００１６】このようにして、圧縮前処理で１つおきに
連続するキャラクタ列を圧縮することで、連続する２バ
イト文字の日本語データを圧縮できる。In this way, by compressing every other consecutive character string in the pre-compression processing, consecutive 2-byte character Japanese data can be compressed.

【００１７】[0017]

【発明の効果】以上説明したように本発明は、圧縮前処
理として１つおきに連続するキャラクタを検出して上位
バイトの数を圧縮することで、シフトＪＩＳコードにお
けるひらがなや英数字の組合せの文章を圧縮できる効果
と、さらに圧縮前処理で下位バイトが連続することから
シフトＪＩＳコードの連続した２バイト文字も従来のラ
ンレングス符号化で圧縮できる効果がある。As described above, the present invention detects every other consecutive characters as the pre-compression processing and compresses the number of high-order bytes, so that the combination of hiragana and alphanumeric characters in the shift JIS code can be changed. There is an effect that the sentence can be compressed, and further, since the lower byte is continuous in the pre-compression process, a continuous 2-byte character of the shift JIS code can be compressed by the conventional run length encoding.

[Brief description of drawings]

【図１】本発明の一実施例のブロック図である。FIG. 1 is a block diagram of an embodiment of the present invention.

【図２】従来例のブロック図である。FIG. 2 is a block diagram of a conventional example.

【図３】ランレングス符号化のアルゴリズムのフローチ
ャートである。FIG. 3 is a flowchart of an algorithm for run length encoding.

【図４】圧縮前処理のフローチャートである。FIG. 4 is a flowchart of pre-compression processing.

【図５】本発明の効果を説明するための図である。FIG. 5 is a diagram for explaining the effect of the present invention.

[Explanation of symbols]

１，１１被圧縮データ２，１３圧縮処理部３，１４圧縮データ１２圧縮前処理部 1,11 Compressed data 2, 13 Compression processing unit 3,14 Compressed data 12 Compression pre-processing unit

Claims

[Claims]

1. A data compression method for reducing the data size by converting a special character indicating the compression of data in which the same character is continuous, repeated character data, and the number of repetitions, particularly from the upper and lower 2 bytes. In Japanese data, paying attention to the upper byte, every other byte (upper byte) a data string in which the same character continues, a special character that indicates the repetition of the upper byte, a repeating character, the number of repetitions, and a data string containing only the lower byte Japanese data compression method with pre-processing added for conversion to and.

2. A data compression method of reducing the data size by converting special characters indicating compression, repeated character data, and the number of times of repetition when consecutive data are the same character, particularly in the upper order. In Japanese data consisting of the lower 2 bytes, pay attention to only the upper byte, and if it matches and continues with the code set in advance, repeat it with a special character indicating compression as in the above data compression method. A Japanese data compression method in which a character data and the number of repetitions are converted, and then a prefix is added to connect the lower bytes of the compressed character.