JPH04280517A

JPH04280517A - Data compression and restoring system

Info

Publication number: JPH04280517A
Application number: JP4360491A
Authority: JP
Inventors: Shigeru Yoshida; 茂吉田; Yoshiyuki Okada; 佳之岡田; Yasuhiko Nakano; 泰彦中野; Hirotaka Chiba; 広隆千葉
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1991-03-08
Filing date: 1991-03-08
Publication date: 1992-10-06

Abstract

PURPOSE:To attain efficient data compression and restoration by an arithmetic coding system to multi-value data in a data compression system applying arithmetic coding to a data such as a character code or a picture data. CONSTITUTION:The data compression and restoring system in which the inputted character string is coded by an arithmetic code and data compression and restoration are implemented is provided with an SOR coding section (1) rearranges a character string with a higher frequency of occurrence to have a smaller registration number in a dictionary every time a character string is entered and coding the character string according to the registration number in the character string and an arithmetic coding section (2) receiving a code outputted from the SOR coding section (1) and applying the arithmetic coding to the character string and is featured to encode the input character string with the SOR code and encode the code with the arithmetic code.

Description

[Detailed description of the invention]

【０００１】0001

【産業上の利用分野】本発明は，文字コード，画像デー
タ等のデータを算術符号化するデータ圧縮方式に関する
。文字コード，画像データ等の様々な種類のデータがコ
ンピュータで扱われるようになるのにともない，取り扱
われるデータ量も増大している。そのような大量のデー
タの処理においては，記憶容量を減らしたり，遠隔地へ
の伝送を可能とするためには，データ中の冗長な部分を
省いて圧縮して記憶するもしくはデータ転送等を行うよ
うにすることが望まれる。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data compression method for arithmetic encoding data such as character codes and image data. As computers begin to handle various types of data such as character codes and image data, the amount of data handled is also increasing. When processing such large amounts of data, in order to reduce storage capacity or enable transmission to remote locations, it is necessary to omit redundant parts of the data and compress it for storage or data transfer. It is desirable to do so.

【０００２】様々な種類のデータ（文字コード，画像デ
ータ等）に適用できるデータ圧縮方式として，ユニバー
サル符号による方式がある。ユニバーサル符号化方式に
は様々な方法があるが，代表的な方法として算術符号化
方式がある。本発明は，多値データを算術符号化方式に
より，効率的にデータ圧縮および復元する方式を提供す
ることを目的とする。[0002] As a data compression method that can be applied to various types of data (character codes, image data, etc.), there is a method using universal codes. There are various universal encoding methods, but a typical one is the arithmetic encoding method. An object of the present invention is to provide a method for efficiently compressing and restoring multivalued data using an arithmetic coding method.

【０００３】0003

【従来の技術】算術符号化方式は，情報源の文字の出現
頻度が分かっている場合に，最大の効率で圧縮できるも
のである。ユニバーサル符号においては，文書データに
おける文字コードのみならず画像データ等のデータの圧
縮にも適用でき，以下の説明においては，データの１ワ
ード単位を文字とよび，データの任意ワードがつながっ
たものを文字列と呼ぶ。BACKGROUND OF THE INVENTION Arithmetic encoding systems can achieve maximum compression when the frequency of appearance of characters in an information source is known. Universal codes can be applied not only to character codes in document data, but also to compression of data such as image data.In the following explanation, one word of data is referred to as a character, and a string of arbitrary words of data is referred to as a character. It's called a string.

【０００４】多値算術符号について，図７により説明す
る。図７（Ａ）は入力文字がａ，ｂ，ｃ，ｄの４文字よ
りなる場合の各シンボル（文字）の出現頻度を示す。図
（Ｂ）はシンボルと出現頻度の累積を示す。図（Ｃ）は
，算術符号化を示す。多値算術化は，入力データの文字
列を数直線〔０，１〕の一点に対応付けて，一個の符号
で表す符号化法である。符号化の例として，簡単のため
，ａ，ｂ，ｃ，ｄの４個の文字だけよりなる情報源を考
える。Multivalued arithmetic codes will be explained with reference to FIG. FIG. 7(A) shows the appearance frequency of each symbol (character) when the input characters consist of four characters a, b, c, and d. Figure (B) shows the symbols and the cumulative appearance frequency. Figure (C) shows arithmetic encoding. Multi-value arithmetic is an encoding method that associates a character string of input data with one point on a number line [0, 1] and represents it with one code. As an example of encoding, for the sake of simplicity, consider an information source consisting of only four characters, a, b, c, and d.

【０００５】図（Ａ）に示されるように，各シンボルの
出現頻度（全出現回数を各シンボル毎に出現回数で割っ
て，確率としてある）が図示のようであったとする。次
に，頻度順にシンボルの並び替えを行い，図（Ｂ）に示
すように，頻度の高い方から順に累積確率を算出する（
頻度の一番高いｃはｂ，ｄ，ａの出現頻度（確率）を累
積しｂはｄ，ａの出現頻度を累積し，ｄはａの出現頻度
を累積する）。Assume that the appearance frequency of each symbol (probability is calculated by dividing the total number of appearances by the number of appearances for each symbol) is as shown in FIG. Next, the symbols are rearranged in order of frequency, and the cumulative probabilities are calculated in descending order of frequency, as shown in Figure (B).
c, which has the highest frequency, accumulates the appearance frequency (probability) of b, d, and a, b accumulates the appearance frequency of d and a, and d accumulates the appearance frequency of a).

【０００６】そこで，シンボル（文字）ｃが入力される
と，図（Ｃ）に示されるように，区間〔０，１〕をｃの
累積頻度により分割した区間５０が選択される。次にシ
ンボルａが入力されると，区間５０を各シンボルａの累
積出現頻度（累積出現頻度の最大値は１に規格化する）
により求めたシンボルａに対応する区間５１を選択する
。次に，シンボルｄが入力されると，区間５１よりシン
ボルｄの累積頻度で分割した区間５２が選択され，区間
５２の任意値（区間５２の上端と下端の間の任意の値）
を符号として出力する。[0006] Therefore, when symbol (character) c is input, as shown in Figure (C), an interval 50 obtained by dividing the interval [0, 1] according to the cumulative frequency of c is selected. Next, when symbol a is input, interval 50 is the cumulative appearance frequency of each symbol a (the maximum value of cumulative appearance frequency is normalized to 1)
The section 51 corresponding to the symbol a obtained by is selected. Next, when symbol d is input, an interval 52 divided by the cumulative frequency of symbol d is selected from interval 51, and an arbitrary value of interval 52 (any value between the upper and lower ends of interval 52) is selected.
Outputs as a sign.

【０００７】区間の下端は次の式により求める。新たな
部分区間の下端＝現部分区間の下端＋現部分区間幅×出
現文字の累積確率新たな部分区間幅＝現部分区間幅×出
現文字の確率また，符号語の復元は，符号語が各文字の
確率に分けたどの区間に含まれるか，逐次再分割しなが
ら調べることによって，元の文字（シンボル）列を復元
することができる。The lower end of the interval is determined by the following equation. Lower end of new subinterval = Lower end of current subinterval + Current subinterval width × Cumulative probability of appearing characters New subinterval width = Current subinterval width × Probability of appearing characters Also, in codeword restoration, each codeword is The original string of characters (symbols) can be restored by sequentially re-dividing and checking which interval the characters are included in based on their probabilities.

【０００８】多値算術符号化によれば，出現頻度が高い
文字列程，最終区間の幅が大きくなり，短い符号により
表せるため，データ量が圧縮される。また，この方法は
，符号語のビット表現に最小単位の制約がなく，最小単
位を１ビット以下に取ることができるため，効率の高い
圧縮ができる。According to multilevel arithmetic coding, the more frequently a character string appears, the larger the width of the final section becomes, and it can be represented by a shorter code, thereby compressing the amount of data. Furthermore, in this method, there is no restriction on the minimum unit for the bit representation of the code word, and the minimum unit can be set to one bit or less, so highly efficient compression can be achieved.

【０００９】図８は算術符号化の装置構成を示す。図に
おいて，６０は頻度順並び替え部，６１は頻度をカウン
トするカウンタ，６２は頻度順に並び替えられたシンボ
ルと識別番号よりなる辞書，６３はカウンタから出力さ
れる累積出現頻度と頻度並び替え部から出力される識別
番号とにより算術符号化する算術符号部である。FIG. 8 shows the configuration of an arithmetic encoding device. In the figure, 60 is a frequency sorting unit, 61 is a counter that counts frequencies, 62 is a dictionary consisting of symbols and identification numbers sorted in frequency order, and 63 is a cumulative appearance frequency output from the counter and a frequency sorting unit. This is an arithmetic code section that performs arithmetic coding using the identification number output from the .

【００１０】図の構成の動作を説明する。入力文字列よ
り，１文字入力する毎に，カウンタ６１は出現頻度をカ
ンウトする。頻度並び替え部６０は出現頻度順にシンボ
ルの並び替えを行い，辞書６２を書き替える。そして，
算術符号部６３は，入力された文字の累積頻度に区間幅
と累積出現確率に基づいて，区間の上端と下端を求め，
区間の任意の値を符号として出力する。The operation of the configuration shown in the figure will be explained. Each time one character is input from the input character string, the counter 61 counts the frequency of appearance. The frequency sorting unit 60 sorts the symbols in order of appearance frequency and rewrites the dictionary 62. and,
The arithmetic code unit 63 calculates the upper and lower ends of the interval based on the cumulative frequency of the input characters, the interval width and the cumulative appearance probability, and calculates the upper and lower ends of the interval.
Outputs any value in the interval as a sign.

【００１１】図９および図１０に算術符号化のフローを
示す。初期値は，上端＝１，下端＝０，区間幅＝１．０
とする。ｃｕｍ〔ｉ〕は識別番号ｉ（出現頻度を大きさ
の順に並べ替えたｉ番目）の出現頻度を表す。ｃｕｍ　
　ｆｒｅｑ〔ｉ〕はｉ番目の累積頻度であり，ｃｕｍ　
　ｆｒｅｑ〔ｉ〕はｃｕｍ〔ｉ〕からｃｕｍ〔Ｎ〕まで
の累積出現頻度を表す。ｃｕｍ　　ｆｒｅｑ〔１〕＝１
，ｃｕｍ　　ｆｒｅｑ〔Ｎ〕＝０に正規化する。FIGS. 9 and 10 show the flow of arithmetic encoding. Initial values are upper end = 1, lower end = 0, section width = 1.0
shall be. cum[i] represents the appearance frequency of identification number i (the i-th appearance frequency sorted in order of magnitude). cum
freq[i] is the i-th cumulative frequency, cum
freq[i] represents the cumulative appearance frequency from cum[i] to cum[N]. cum freq[1]=1
, cum freq[N]=0.

【００１２】図９は出現文字についての履歴を考慮しな
い場合（一重履歴）の多値算術符号化のフローを示す。Ｓ１　　（１）辞書Ｄの各スロットに全ての単一文字を
割りつける。Ｄ（ｉ）＝ｉ（ｉ＝１，２，・・・Ａ），
但し，Ａはアルファベットの大きさ（アルファベットお
よび記号等の数）で，最大２５６をとる。（２）　　各文字に番号を割りつける。Ｉ（ｉ）＝ｉ（
ｉ＝１，２，・・・，Ａ）。（３）　　出現頻度を初期化する。ｆｒｅｑ（ｉ）＝１
（ｉ＝１，２，・・・，Ａ）。（４）　　累積出現頻度を初期化する。ｃｕｍ　　ｆｒ
ｅｑ（ｉ）＝Ａ−１（ｉ＝１，２，・・・，Ａ，累積出
現頻度は，全ての値を異ならしめておく必要があるため
このように初期化する。）Ｓ２　　一文字ｋを入力する。Ｓ３　　ｊ＝Ｉ（ｋ），ｉ＝Ｄ（ｊ）により入力した文
字のスロット（区間）を求め，番号ｊを算術符号化する
。FIG. 9 shows the flow of multilevel arithmetic encoding when the history of appearing characters is not considered (single history). S1 (1) Assign all single characters to each slot of dictionary D. D(i)=i(i=1,2,...A),
However, A is the size of the alphabet (the number of alphabets and symbols, etc.), which can be up to 256. (2) Assign a number to each character. I(i)=i(
i=1, 2, ..., A). (3) Initialize the appearance frequency. freq(i)=1
(i=1, 2,...,A). (4) Initialize the cumulative appearance frequency. cum fr
eq(i) = A-1 (i = 1, 2, ..., A, the cumulative frequency of appearance is initialized like this because all values must be different.) S2 Input one character k do. S3 Find the slot (section) of the input character using j=I(k) and i=D(j), and arithmetic encode the number j.

【００１３】Ｓ４　　出現頻度順にスロットに対応する
文字を入れ替える。番号がｊより小さいシンボルのうち
で，ｊ番目のシンボルと出現頻度が異なるシンボルを求
める。即ち，ｆｒｅｑ（ｒ）！＝ｆｒｅｑ（ｊ）となる
最大のｒ（＜ｊ）を見つける（！は異なることを意味す
る）。さらに，出現頻度のｒ番目のスロットの文字の識
別番号がＩ（ｒ）＝ｓ，累積出現頻度のスロットのｒ番
目に対応する文字の識別番号がｔであるとする（（Ｄ（
ｒ））＝ｔ）。そこで，ｒ番目とｊ番目を入れ替える。即ち，Ｉ（ｊ）＝Ｓ（識別番号ｓの文字），Ｄ（ｊ）＝
ｔ，Ｉ（ｒ）＝ｊ，Ｄ（ｒ）＝ｉ（＝Ｄ（ｊ））。Ｓ５　　　　出現頻度，累積出現頻度を１インクリメン
トする。但し，累積出現頻度はｒ以下の全てのスロット
について１インクリメントする。Ｓ２以下を繰り返す。S4: Replace the characters corresponding to the slots in order of appearance frequency. Among the symbols whose number is smaller than j, find a symbol whose appearance frequency is different from the j-th symbol. That is, freq(r)! Find the largest r(<j) such that = freq(j) (! means different). Furthermore, suppose that the identification number of the character in the r-th slot of appearance frequency is I(r)=s, and the identification number of the character corresponding to the r-th slot of cumulative appearance frequency is t ((D(
r))=t). Therefore, the r-th and j-th are exchanged. That is, I(j)=S (character of identification number s), D(j)=
t, I(r)=j, D(r)=i(=D(j)). S5 Increment the appearance frequency and cumulative appearance frequency by 1. However, the cumulative appearance frequency is incremented by 1 for all slots below r. Repeat S2 and subsequent steps.

【００１４】図１０は，出現文字に二重履歴を考慮した
場合のフローを示す。二重履歴を考慮する場合には，連
続する二文字の組合わせを考慮し，各文字毎に辞書を作
成し，さらに各文字毎に続く文字の出現頻度を二番目の
文字毎に記録するようにする。例えば，文字ａの辞書に
は，ａに続いて出現した文字（ａｂに対するｂ，ａｃに
対するｃ等）ａに続く文字毎に出現頻度を求め，記録す
るようにする。FIG. 10 shows a flow when double history is taken into account for characters that appear. When considering double history, consider the combination of two consecutive characters, create a dictionary for each character, and record the frequency of occurrence of the character following each character for each second character. Make it. For example, in a dictionary for the character a, the appearance frequency of each character following a (such as b for ab, c for ac, etc.) is determined and recorded.

【００１５】そこで，辞書Ｄのスロットには，文字ｐに
対して続く文字ｉ毎に，Ｄ（ｐ，ｉ）（ｐ，ｉ＝１，２
，・・・，Ａ）（Ａはアルファベットの大きさ（数））
を割り付ける。また，各文字に割りつける番号はＩ（ｐ
，ｉ）（ｐ，ｉ＝１，２，・・・，Ａ）とする。上記の
点以外，図９のフローと異ならないので，説明は省略す
る。Therefore, in the slot of dictionary D, for each character i following character p, D(p, i) (p, i=1, 2
,...,A) (A is the size of the alphabet (number))
Assign. Also, the number assigned to each character is I(p
, i) (p, i=1, 2, . . . , A). Other than the above points, the flow is no different from the flow in FIG. 9, so the explanation will be omitted.

【００１６】[0016]

【発明が解決しようとする課題】上記のように，算術符
号化においては，文字が入力される度に出現頻度の並び
替えを行う必要があり，この処理は長時間を要するもの
であった。そのため，従来の算術符号化は，高い圧縮率
は得られるが，時間的には能率の悪いものであった。本
発明は，算術符号化を高速に行うデータ圧縮および復元
方式を提供することを目的とする。[Problems to be Solved by the Invention] As mentioned above, in arithmetic encoding, it is necessary to rearrange the frequency of appearance each time a character is input, and this process takes a long time. Therefore, although conventional arithmetic coding can achieve a high compression ratio, it is inefficient in terms of time. An object of the present invention is to provide a data compression and restoration method that performs arithmetic encoding at high speed.

【００１７】[0017]

【課題を解決するための手段】本発明は，出現頻度の高
い文字が，常に小さい番号で符号化される自己組織化規
則に基づく符号化方法（以後，ＳＯＲ符号と称する）に
より，まず入力文字列を符号化し，ＳＯＲ符号を算術符
号化するようにした。[Means for Solving the Problems] The present invention first uses an encoding method (hereinafter referred to as SOR code) based on self-organizing rules in which frequently appearing characters are always encoded with a small number to encode input characters. The sequence is encoded, and the SOR code is arithmetic encoded.

【００１８】本発明の説明をするのに先立って，ＳＯＲ
符号について説明する。ＳＯＲ符号化方法は，入力デー
タ中に出現した文字列単位に辞書に登録し，繰り返し現
れた文字列を辞書における同じ文字列の登録番号により
表して符号化し，登録番号を出現し易い文字列程小さい
番号になるように，文字入力毎に動的に辞書を更新する
ようにしたものである。[0018] Prior to explaining the present invention, the SOR
The symbols will be explained. In the SOR encoding method, each character string that appears in input data is registered in a dictionary, and repeated character strings are represented and encoded by the registration number of the same character string in the dictionary, and the registration number is assigned to the character strings that appear more frequently. The dictionary is dynamically updated each time a character is input so that the number becomes smaller.

【００１９】図１１は自己組織化符号化規則に基づく符
号化方法（ＳＯＲ符号化方法）を示す。図（Ａ）は自己
組織化法のうちＭＴＦ（Ｍｏｖｅ　Ｔｏ　Ｆｒｏｎｔ　
）法を示す。入力文字はａ，ｂ，ｃ，ｄの４通りであっ
て，図示の入力データ‘ａｂａｃｂｃ・・・ａｄ’を符
号化する場合を考える。（１）第１番目の文字ａを符号１ａで出力し，辞書の番
号１に登録する。（２）第２番目の文字ｂを符号２ｂで出力し，辞書には
，前に登録番号１で文字ａを登録番号２に下げ，ｂを登
録番号１で登録する。（３）第３番目の文字ａを符号２で出力する。そして，
辞書には，ａを登録番号１で登録し，ｂを登録番号２と
する。FIG. 11 shows an encoding method (SOR encoding method) based on self-organizing encoding rules. Figure (A) shows the MTF (Move To Front) self-assembly method.
) show the law. There are four types of input characters, a, b, c, and d, and a case will be considered in which the illustrated input data 'abacbc...ad' is encoded. (1) Output the first character a as code 1a and register it as number 1 in the dictionary. (2) Output the second character b with the code 2b, lower the character a with the registration number 1 to the registration number 2, and register b with the registration number 1 in the dictionary. (3) Output the third character a with code 2. and,
In the dictionary, a is registered with registration number 1, and b is registered with registration number 2.

【００２０】（４）第４番目の入力文字ｃを符号３ｃで
出力する。そして，辞書には，ｃを登録番号１，ａを登
録番号２，ｂを登録番号３で登録する。（５）第５番目の文字ｂを３で出力する。そして，辞書
には，ｂを登録番号１，ｃを登録番号２，ａを登録番号
３で登録する。以下同様に，出現した文字を辞書の登録番号により符号
化するとともに，辞書を更新し，出現した文字を登録番
号１で登録する。(4) Output the fourth input character c with the symbol 3c. Then, c is registered with registration number 1, a with registration number 2, and b with registration number 3 in the dictionary. (5) Output the fifth character b as 3. Then, in the dictionary, b is registered with registration number 1, c with registration number 2, and a with registration number 3. Similarly, the characters that appear are encoded using the registration number of the dictionary, the dictionary is updated, and the characters that appear are registered with the registration number 1.

【００２１】図１１（ｂ）　は，自己組織化法のうちの
ＴＲ（Ｔｒａｎｓｐｏｓｅ）法を示す。入力文字がａ，
ｂ，ｃ，ｄの４通りで，図示のような文字列‘ａｂａｃ
ｂ・・・ａｄ’を符号化する場合について示す。（１）第１番目の文字ａを符号１ａで出力し，ａを登録
番号１で辞書に登録する。（２）第２番目の文字ｂを符号２ｂで出力し，ｂを辞書
に登録番号２で登録する。（３）第３番目の文字ａを符号１で出力する。（４）第４番目の文字ｃを符号３ｃで出力し，ｃを辞書
に登録番号３で登録する。FIG. 11(b) shows the TR (Transpose) method among the self-assembly methods. The input character is a,
There are four types, b, c, and d, and the character string 'abac' as shown in the diagram
The case where b...ad' is encoded will be described. (1) Output the first character a with the code 1a, and register a with the registration number 1 in the dictionary. (2) Output the second character b with code 2b, and register b in the dictionary with registration number 2. (3) Output the third character a with the code 1. (4) Output the fourth character c with code 3c and register c in the dictionary with registration number 3.

【００２２】（５）第５番目の文字ｂを符号２で出力し
，辞書におけるｂの登録番号を１つ繰上げ，そのａの登
録番号１つ繰り下げる。（６）第６番目の文字ｃを符号３で出力し，ｃの辞書に
おける番号を１つ繰上げ，ａを１つ繰り下げる。（７）第７番目の文字ａを符号３で出力し，ａの辞書に
おける番号を１つ繰り上げ，ｃを１繰り下げる。(5) Output the fifth character b with code 2, increment the registration number of b in the dictionary by one, and decrement the registration number of a by one. (6) Output the sixth character c with the code 3, increment the number of c in the dictionary by one, and decrement a by one. (7) Output the seventh character a with code 3, increment a's number in the dictionary by one, and decrement c by one.

【００２３】以下同様に，出現した文字を辞書の登録番
号により符号化するとともに，出現した文字の登録番号
を１つ繰り上げて登録する。上記のような，自己組織化
法によれば，並び替えにより頻繁に現れる文字程，登録
番号が小さくなり，小さい登録番号程，短いビット数で
表すようにすれば，出力符号のデータ量を削減すること
ができる。また，ＳＯＲ符号化方法では，出現頻度の高
い文字程，辞書の先頭に近い位置に置かれているので，
ＳＯＲ符号を算術符号化する場合には，算術符号化にお
ける出現頻度の並び替え処理の負担を軽減することがで
きる。[0023] Similarly, the characters that appear are encoded using the registration numbers in the dictionary, and the registration numbers of the characters that appear are incremented by one and registered. According to the above-mentioned self-organization method, the more frequently appearing characters are rearranged, the smaller the registration number will be, and the smaller the registration number, the smaller the number of bits it will represent, which will reduce the amount of data in the output code. can do. In addition, in the SOR encoding method, the more frequently appearing characters are placed closer to the beginning of the dictionary, so
When the SOR code is arithmetic encoded, it is possible to reduce the burden of sorting the appearance frequencies in the arithmetic encoding.

【００２４】図１により，本発明の基本構成を示す。図
において，１はＳＯＲ符号化部，２は多値算術符号化部
，３は文字列を登録番号に対応付けて登録した辞書，４
は累積頻度をカウントするカウンタである。５は入力文
字列について，辞書を検索することおよび新たに出現し
た文字列について辞書に登録する検索・登録部，６は文
字列の辞書における登録番号を並び替える辞書並び替え
部である。FIG. 1 shows the basic configuration of the present invention. In the figure, 1 is an SOR encoding unit, 2 is a multi-level arithmetic encoding unit, 3 is a dictionary in which character strings are registered in correspondence with registration numbers, and 4
is a counter that counts cumulative frequency. Reference numeral 5 designates a search/registration unit that searches a dictionary for input character strings and registers newly appearing character strings in the dictionary, and 6 a dictionary rearrangement unit that rearranges the registration numbers of character strings in the dictionary.

【００２５】[0025]

【作用】図１の構成の動作を説明する。ＳＯＲ符号化部
１に入力された，文字列は検索・登録部５で辞書３を参
照される。そして，辞書に同じ文字列があれば，その登
録番号をＳＯＲ符号として出力する。また，入力された
文字列が未登録であれば，検索・登録部５はその文字列
を辞書３に登録し，入力文字列を生データとして出力す
る。さらに，辞書並び替え部６は，ＭＴＲ方もしくはＴ
Ｒ方等の自己組織化の規則に従って，辞書を更新し，辞
書における文字列の登録番号の並び替えを行う。[Operation] The operation of the configuration shown in FIG. 1 will be explained. The character string input to the SOR encoding unit 1 is referred to the dictionary 3 by the search/registration unit 5. If the same character string exists in the dictionary, its registration number is output as an SOR code. Furthermore, if the input character string is unregistered, the search/registration unit 5 registers the character string in the dictionary 3 and outputs the input character string as raw data. Furthermore, the dictionary sorting unit 6 is configured to use MTR or T
The dictionary is updated according to self-organization rules such as R-way, and the registration numbers of character strings in the dictionary are rearranged.

【００２６】ＳＯＲ符号化部１から出力されるＳＯＲ符
号は，多値算術符号化部２に入力される。一方，カウン
タ４は各文字列の出現頻度および累積頻度をカウントし
，多値算術符号化部２に入力する。多値算術符号化部２
は，入力されたＳＯＲ符号を文字列の番号および累積出
現頻度等に基づいて，区間幅，入力さた文字列に対応す
る区間の上端，下端の数値を求め，区間の間の任意の数
値を符号として出力する。The SOR code output from the SOR encoder 1 is input to the multilevel arithmetic encoder 2. On the other hand, the counter 4 counts the appearance frequency and cumulative frequency of each character string, and inputs the count to the multilevel arithmetic encoding unit 2. Multilevel arithmetic encoding unit 2
calculates the interval width, the upper and lower ends of the interval corresponding to the input character string based on the input SOR code, the number of the character string, the cumulative appearance frequency, etc., and calculates any numerical value between the intervals. Output as sign.

【００２７】[0027]

【実施例】図２〜図６により本発明の実施例を説明する
。本発明では，ＳＯＲ符号化において，辞書を参照して
得られた文字列の登録番号を，小さい登録番号程短い符
号長で表す可変長符号とする。図２は，本発明の登録番
号の可変長化符号化方法を示す。図（ａ）　は，可変長
符号の例を示す。シンボルａ，ｂ，ｃ，ｄの出現確率を
０．５，０．２５，０．１５，０．１であるとすると，
ａにたいしては符号「１」，ｂにたいては符号「０１」
，ｃに対しては「００１」，ｄに対しては「０００」の
ように，出現度の高いシンボルに対しては，短い符号，
出現頻度の低いシンボルに対しては，長い符号を割り当
てるように可変長符号により符号化することによりデー
タの圧縮率を向上させることができる。[Embodiment] An embodiment of the present invention will be explained with reference to FIGS. 2 to 6. In the present invention, in SOR encoding, the registration number of a character string obtained by referring to a dictionary is used as a variable length code in which the smaller the registration number, the shorter the code length. FIG. 2 shows a variable length encoding method for registration numbers according to the present invention. Figure (a) shows an example of a variable length code. Assuming that the appearance probabilities of symbols a, b, c, and d are 0.5, 0.25, 0.15, and 0.1,
Code “1” for a, code “01” for b
, "001" for c and "000" for d, short codes are used for symbols that appear frequently.
The data compression rate can be improved by encoding symbols with a variable length code such that long codes are assigned to symbols that appear less frequently.

【００２８】可変長符号の例として，イライアス（Ｅｌ
ｉａｓ）符号がある。図（ｂ）　は，イライアス（Ｅｌ
ｉａｓ）符号を示す。イライアスのγ符号は，番号を表
す２進数に対して桁数を表すｐｒｅｆｉｘ（接頭語）を
つけたもので，接頭語は２進数の桁数より１引いた数の
０を付加するものである。例えば，図示のように，番号
１は符号「１」で接頭語なしとし，番号２に対しては２
の２進数「１０」に接頭語として，１桁（桁数２−１＝
１）の０を「１０」の前に付加する。番号３は２進数「
１１」でり，桁数は２であるので，接頭語として１桁だ
け０を付加する。同様に，番号４は２進数「１００」で
，３桁であるから，０を２桁だけ接頭語として付加する
。As an example of a variable length code, Elias (El
ias) There is a code. Figure (b) shows Elias (El
ias) indicates the code. Elias' gamma code is a binary number that represents a number with a prefix that represents the number of digits, and the prefix adds a zero equal to the number of digits in the binary number minus one. . For example, as shown in the figure, the number 1 is the code "1" without any prefix, and the number 2 is 2
As a prefix to the binary number "10", add one digit (number of digits 2-1 =
1) Add 0 in front of "10". Number 3 is a binary number ``
11'' and the number of digits is 2, so add one digit 0 as a prefix. Similarly, the number 4 is the binary number "100" and has three digits, so two digits of 0 are added as a prefix.

【００２９】このようなイライアスの可変長符号によれ
ば，符号語をビット詰めしても，一意に復号することが
可能になる。イライアスのδ符号は，イライアスのγ符
号を用いて，符号化するものである。イライアスのδ符
号においては，γ符号を上位ビットし，ガンマ符号にお
ける番号から１引いた数に等しい下位ビットを付加する
。そして下位ビットの２進数表現により，上位ビットの
番号からの番号を表すようにする。例えば，γ符号の番
号２を示す「０１０」を用いて，番号２と３を表すよう
にする。それぞれ，上位ビットは「０１０」とし，番号
２から１引いた数が１であるから，１ビットの「０」を
「０１０」に付加することにより番号２，１ビットの「
１」を「０１０」に付加することにより番号３を表すよ
うにする。同様に，番号４，５，６，７は，イライアス
のγ符号における番号３の「０１１」を上位ビットとし
て用い，付加するビット数は３−１＝２ビットとし，「
００」，「０１」，「１０」，「１１」を付加すること
により，番号４は「０１１００」，番号５は「０１１０
１」，番号６は「０１１１０」，番号７は「０１１１」
と表す。以下同様に図示のように可変長符号により表す
。According to such an alias variable length code, it is possible to uniquely decode the code word even if the code word is packed with bits. Elias' δ code is encoded using Elias' γ code. In Elias' δ code, the γ code is used as the upper bit, and lower bits equal to the number subtracted by 1 from the number in the gamma code are added. Then, the binary representation of the lower bits represents the number starting from the number of the upper bits. For example, "010" indicating number 2 of the γ code is used to represent numbers 2 and 3. The upper bit of each is "010", and subtracting 1 from number 2 is 1, so by adding 1 bit "0" to "010", number 2 and 1 bit "
The number 3 is represented by adding "1" to "010". Similarly, numbers 4, 5, 6, and 7 use "011" of number 3 in Elias's γ code as the upper bit, and the number of bits to be added is 3 - 1 = 2 bits, and "
By adding "00", "01", "10", and "11", number 4 becomes "01100" and number 5 becomes "0110".
1”, number 6 is “01110”, number 7 is “0111”
Expressed as Similarly, as shown in the figure, the information is represented by variable length codes.

【００３０】図３は本発明において，算術符号化の前に
符号化するＳＯＲ符号のフローを示す。Ｓ１　　（１）辞書を初期化する。Ｄ（ｉ）＝０（ｉ＝１，２，・・・Ａ）ただし，Ａはア
ルファベットの大きさ（アルファベット，記号等の数，
で最大２５６）。（２）各文字に番号を割りつける。Ｉ（ｉ）＝０（ｉ＝１，２，・・・，Ａ）（３）辞書使
用スロット数ｎ＝０Ｓ２　　一文字ｋ入力する。Ｓ３　　文字ｋの番号ｊを識別する（ｊ＝Ｉ（ｋ））。Ｓ４　　文字ｋの番号ｊで０であるか判定する。ｊ＝０であれば，文字ｋは辞書に未登録な文字であるの
で，Ｓ５に進む。ｊ＝０でなければ，Ｓ７に進む。Ｓ５　　ｊ＝０であれば，ｎ＝ｎ＋１として，ｎを１イ
ンクリメントし，番号ｎをイライアス（Ｅｌｉａｓ）符
号で符号化する。生データｋを符号化する。Ｓ６　　辞書に文字ｋを番号ｎで登録する（Ｄ（ｎ）＝
ｋ，Ｉ（ｋ）＝ｎ）。Ｓ７　　Ｓ３でｊ＝０でなければ，文字ｋは番号ｊで辞
書に登録済であるので，番号ｊをイライアス符号で符号
化する。FIG. 3 shows the flow of the SOR code that is encoded before arithmetic encoding in the present invention. S1 (1) Initialize the dictionary. D(i) = 0 (i = 1, 2, ... A) However, A is the size of the alphabet (number of alphabets, symbols, etc.)
maximum of 256). (2) Assign a number to each character. I(i) = 0 (i = 1, 2, ..., A) (3) Number of dictionary slots used n = 0 S2 Input one character k. S3 Identify number j of character k (j=I(k)). S4 Determine whether number j of character k is 0. If j=0, the character k is not registered in the dictionary, so the process advances to S5. If j=0, the process advances to S7. S5 If j=0, set n=n+1, increment n by 1, and encode number n with Elias code. Encode raw data k. S6 Register character k with number n in the dictionary (D(n)=
k, I(k)=n). S7 If j is not 0 in S3, character k has already been registered in the dictionary with number j, so number j is encoded with an alias code.

【００３１】Ｓ８　　ＭＴＦまたはＴＲにより辞書の更
新をし，Ｓ２に戻って，次の文字ｋを入力する（ＭＴＦ
はＳ９に，ＴＲはＳ１０に従う）。Ｓ９　　Ｓ８におけるＭＴＦの処理である。ｓ＝ｊから
２までの文字の番号Ｉ（ｓ）および登録番号Ｄ（ｓ）を
一づつ繰り下げる（Ｉ（Ｄ（ｓ））＝Ｉ（Ｄ（ｓ−１）
）。そして，文字ｋを番号１で辞書に登録する（Ｉ（１
）＝ｋ，Ｄ（ｋ）＝１）。Ｓ１０　　Ｓ８におけるＴＲの処理である。ｊ＝１でな
ければ（ｊ！＝１），文字ｋの番号および辞書の登録番
号を辞書における１つ前の登録番号の文字番号および登
録番号と置き換える（ｒ＝Ｄ（ｊ−１），Ｄ（ｊ−１）
＝ｋ，Ｄ（ｊ）＝ｒ，Ｉ（ｋ）＝ｊ−１，Ｉ（ｒ）＝ｊ
）。本発明では、上記のフローで求めた入力文字列のＳ
ＯＲ符号を算術符号化する。[0031] S8 Update the dictionary using MTF or TR, return to S2, and input the next character k (MTF
follows S9, and TR follows S10). S9 This is the MTF processing in S8. Decrease the number I(s) of the characters from s=j to 2 and the registration number D(s) by one (I(D(s))=I(D(s-1)
). Then, register the character k with the number 1 in the dictionary (I(1
)=k, D(k)=1). S10 This is the TR processing in S8. If j is not 1 (j!=1), replace the number of the character k and the registration number in the dictionary with the character number and registration number of the previous registration number in the dictionary (r = D (j - 1), D (j-1)
=k, D(j)=r, I(k)=j-1, I(r)=j
). In the present invention, S of the input character string obtained in the above flow is
Arithmetic encode the OR code.

【００３２】図４〜図６は，本発明の算術符号化のフロ
ーを示す。図４は，出現文字の履歴を考慮しない場合の
フローである。Ｓ１　　（１）辞書Ｄに全ての単一文字を割りつける。Ｄ（ｉ）＝ｉ（ｉ＝１，２，・・・，Ａ），ただし，Ａ
はアルファベットの大きさ（アルファベットおよび記号
等の数）であり，最大２５６である。（２）　　各文字に番号を割り付ける。Ｉ（ｉ）＝ｉ（ｉ＝１，２，・・・，Ａ）。（３）　　各文字の出現頻度（ａｐｐ（ｉ））を初期化
する。ａｐｐ（ｉ）＝０（ｉ＝１，２，・・・，Ａ）（４）　
　累積頻度（ｃｕｍ　　ｆｒｅｑ（ｉ））を初期化する
。ｃｕｍ　　　　ｆｒｅｑ（ｉ）＝Ａ−ｉ（ｉ＝１，・・
・，Ａ）。（５）　　辞書使用スロット数ｎ＝０とする。FIGS. 4 to 6 show the flow of arithmetic encoding according to the present invention. FIG. 4 shows a flow when the history of appearing characters is not considered. S1 (1) Allocate all single characters to dictionary D. D(i)=i(i=1,2,...,A), where A
is the size of the alphabet (the number of alphabets and symbols, etc.), which is 256 at maximum. (2) Assign a number to each character. I(i)=i(i=1,2,...,A). (3) Initialize the appearance frequency (app(i)) of each character. app(i)=0(i=1,2,...,A)(4)
Initialize the cumulative frequency (cum freq(i)). cum freq(i)=A−i(i=1,...
・,A). (5) Number of dictionary slots used n=0.

【００３３】Ｓ２　　一文字ｋを入力する。Ｓ３　　文字ｋの番号を求め，算術符号化する（算術符
号化のフローは図９におけるものと同じであるので，説
明は省略する）。Ｓ４　　ＭＴＦまたはＴＲにより辞書を更新する（ＭＴ
ＦはＳ６，ＴＲはＳ７に示す）。Ｓ５　　ｊより小さい番号の文字の累積頻度を１インク
リメントする（ｊ＞０の範囲でｊ＝ｊ−１，ｃｕｍ　　
ｆｒｅｑ（ｊ）＝ｃｕｍ　　ｆｒｅｑ（ｊ）＋１）。そ
して，Ｓ２以降の処理を繰り返す。Ｓ６　　ＭＴＦの処理である。図３におけるＭＴと同じ
であるので説明は省略する。Ｓ７　　ＴＲの処理である。ｊ番目の文字の出現頻度が
０であれば（（ａｐｐ（ｊ）＝０），ａｐｐ（ｊ）＝１
とし，ｎ＝ｎ＋１とする。登録番号Ｄ（ｎ）＝ｒとし，
辞書における登録番号の置き替えを行う（Ｄ（ｊ−１）
＝ｋ，Ｄ（ｊ）＝ｒ，Ｉ（ｋ）＝ｊ−１，Ｉ（ｒ）＝ｊ
）。ａｐｐ（ｊ）＝０でなければ，ｒ＝Ｄ（ｊ−１）と
して辞書における登録番号の置き替えを行う。S2 Input one character k. S3 Find the number of character k and perform arithmetic encoding (the flow of arithmetic encoding is the same as that in FIG. 9, so the explanation will be omitted). S4 Update dictionary by MTF or TR (MT
F is shown in S6, TR is shown in S7). S5 Increment the cumulative frequency of characters with numbers smaller than j by 1 (j=j-1, cum in the range j>0)
freq(j)=cum freq(j)+1). Then, the processing from S2 onwards is repeated. S6 MTF processing. Since it is the same as MT in FIG. 3, the explanation will be omitted. S7 TR processing. If the appearance frequency of the j-th character is 0 ((app(j)=0), app(j)=1
and n=n+1. Let registration number D(n)=r,
Replace the registration number in the dictionary (D(j-1)
=k, D(j)=r, I(k)=j-1, I(r)=j
). If app(j) is not 0, the registration number in the dictionary is replaced by setting r=D(j-1).

【００３４】図５は本発明の算術符号化において，一重
履歴を考慮し，連続する二文字の組合せを考慮して，算
術符号化する場合のフローを示す。図は辞書の構成を，
各文字毎に辞書を作成するとともに，さらに各文字毎に
続く各文字についての辞書を作成するものである。従っ
て，Ｄ（ｐ，ｉ）＝ｉ（ｐ，ｉ＝１，２，・・・Ａ），
ｉ（ｐ，ｉ）＝ｉ（ｐ，ｉ＝１，２，・・・，Ａ）とす
る。出現頻度，累積頻度については，連続する二文字の
うちの一番目の文字についてのみカウントするものとし
た場合を示す（ａｐｐ（ｉ），ｃｕｍ　　ｆｒｅｑ（ｉ
），（ｉ＝１，２，・・・，Ａ）として，出現頻度，累
積頻度を番号ｉのみで求める）。それ以外のフローは履
歴を考慮しない図４における場合と同じであるので，説
明は省略する。FIG. 5 shows the flow of arithmetic coding according to the present invention, taking single history into account and considering combinations of two consecutive characters. The diagram shows the structure of the dictionary.
A dictionary is created for each character, and a dictionary is created for each character following each character. Therefore, D(p,i)=i(p,i=1,2,...A),
Let i(p,i)=i(p,i=1,2,...,A). Regarding the appearance frequency and cumulative frequency, the case where only the first character of two consecutive characters is counted is shown (app(i), cum freq(i
), (i = 1, 2, ..., A), the appearance frequency and cumulative frequency are determined using only the number i). The rest of the flow is the same as in the case of FIG. 4 in which the history is not taken into account, so the explanation will be omitted.

【００３５】図６は算術符号化において，一重履歴を考
慮し，さらに，出現頻度，累積出現頻度についても，一
重履歴を考慮した場合を示す。図のフローは，出現頻度
をＩ（ｐ，ｉ），累積出現頻度をｃｕｍ　　ｆｒｅｑ（
ｐ，ｉ）（ｉ＝１，２，・・・，Ａ）として求める点以
外で図４のフローと異ならないので，説明は省略する。FIG. 6 shows a case in which a single history is taken into consideration in arithmetic coding, and a single history is also taken into consideration for appearance frequency and cumulative appearance frequency. The flow in the figure is as follows: The appearance frequency is I(p, i), and the cumulative appearance frequency is cum freq(
p, i) (i=1, 2, . . . , A) is the same as the flow shown in FIG. 4, so the explanation will be omitted.

【００３６】[0036]

【発明の効果】本発明によれば，算術符号化部に入力さ
れるデータは，ほぼ出現頻度順に並び替えられたデータ
が入力されるので，算術符号化における出現頻度の計数
処理が簡単化され，算術符号化の処理速度を向上させる
ことが可能になる。[Effects of the Invention] According to the present invention, the data input to the arithmetic encoding unit is inputted data that has been sorted approximately in the order of appearance frequency, so the process of counting the appearance frequency in arithmetic encoding is simplified. , it becomes possible to improve the processing speed of arithmetic encoding.

[Brief explanation of the drawing]

【図１】本発明の基本構成を示す図である。FIG. 1 is a diagram showing the basic configuration of the present invention.

【図２】登録番号の可変長符号化方法を示す図である。FIG. 2 is a diagram showing a variable length encoding method for registration numbers.

【図３】ＳＯＲ符号化のフローを示す図である。FIG. 3 is a diagram showing the flow of SOR encoding.

【図４】本発明の符号化方法（１）　（履歴のない場合
）を示す図である。FIG. 4 is a diagram showing the encoding method (1) of the present invention (when there is no history).

【図５】本発明の符号化方法（２）　（一重履歴の場合
）を示す図である。FIG. 5 is a diagram showing the encoding method (2) (in the case of single history) of the present invention.

【図６】本発明の符号化方法（３）　（一重履歴の場合
）を示す図である。FIG. 6 is a diagram showing the encoding method (3) (in the case of single history) of the present invention.

【図７】多値算術符号の原理説明図である。FIG. 7 is a diagram illustrating the principle of a multi-level arithmetic code.

【図８】従来の算術符号化の装置構成を示す図である。FIG. 8 is a diagram showing the configuration of a conventional arithmetic encoding device.

【図９】従来の多値算術符号化のフロー（一重履歴の場
合）を示す図である。FIG. 9 is a diagram showing a flow of conventional multilevel arithmetic coding (in the case of single history).

【図１０】従来の多値算術符号化のフロー（二重履歴の
場合）を示す図である。FIG. 10 is a diagram showing a flow of conventional multilevel arithmetic coding (in the case of double history).

【図１１】自己組織化規則に基づく符号化（ＳＯＲ符号
化）を示す図である。FIG. 11 is a diagram showing encoding based on self-organizing rules (SOR encoding).

[Explanation of symbols]

１　　：ＳＯＲ符号化部，２　　：多値算術符号化部，３　　：辞書，４　　：カウンタ，５　　：検索・登録部，６　　：辞書ならび替え部 1: SOR encoding unit, 2: Multi-level arithmetic coding unit, 3: Dictionary, 4: Counter, 5: Search and registration department, 6: Dictionary sorting section

Claims

[Claims]

[Claim 1] In a method of encoding input character strings using arithmetic codes and compressing and decompressing the data, each time a character string is input, the character strings are rearranged so that the more frequently appearing character strings, the lower the registration number in the dictionary. , an SOR encoding unit (1
) and the SOR output from the SOR encoder (1)
an arithmetic encoding unit (2) that inputs a code and performs arithmetic encoding; and after encoding the input character string with the SOR code,
A data compression method characterized in that the code is encoded using an arithmetic code.

[Claim 2] The arithmetic code according to claim 1 is processed by SOR.
A compressed data restoration method characterized by converting into a code and then restoring a character string based on the SOR code.

[Claim 3] In claim 1, the registration number of the input character string in the dictionary is incremented by 1 toward the smaller value, and
A data compression method that decrements the registration number of the character string that was incremented by one.

[Claim 4] In claim 1, the registration position of the input character string in the dictionary is moved to the beginning of the dictionary, and the character string registered from the beginning of the dictionary to before the registration position where the character string was registered is A data compression method characterized by decrementing each registration number by 1.