JP3115066B2

JP3115066B2 - Dictionary search method

Info

Publication number: JP3115066B2
Application number: JP03324706A
Authority: JP
Inventors: 佳之岡田; 茂吉田; 泰彦中野; 広隆千葉
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1991-12-09
Filing date: 1991-12-09
Publication date: 2000-12-04
Anticipated expiration: 2015-12-04
Also published as: JPH05158987A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明はデータ圧縮における辞書
検索方法に係わり、特に既に符号化済みの文字列を相異
なる部分文字列に分け、該部分文字列を辞書に登録して
おき、入力文字列と最長に一致する部分文字列を辞書か
ら検索し、該最長一致文字列の番号を指定して符号化す
るデータ圧縮における辞書検索方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a dictionary search method in data compression, and in particular, divides an already encoded character string into different partial character strings, registers the partial character strings in a dictionary, and stores input character strings in a dictionary. The present invention relates to a dictionary search method in data compression that searches a dictionary for a partial character string that matches the longest string and a string, and specifies and codes the number of the longest matching character string.

【０００２】近年、文字コード、ベクトル情報、画像な
どの様々な種類のデータがコンピュータで扱われるよう
になっており、扱われるデータ量も急速に増加してきて
いる。大量のデータを扱う時は、データの中の冗長な部
分を省いてデータ量を圧縮することで、記憶容量を減ら
したり、速く伝送したりできるようになる。様々なデー
タを１つの方式でデータ圧縮できる方法としてユニバー
サル符号化が提案されている。In recent years, various types of data such as character codes, vector information, and images have been handled by computers, and the amount of data handled has been rapidly increasing. When dealing with a large amount of data, by compressing the amount of data by omitting redundant portions in the data, it becomes possible to reduce the storage capacity or to transmit data at high speed. Universal coding has been proposed as a method that can compress various data in one system.

【０００３】[0003]

【従来の技術】このユニバーサル符号は、情報保存型の
データ圧縮方法であり、データ圧縮時に情報源の統計的
な性質を予め仮定しないため、種々のタイプ（文字コー
ド、オブジェクトコードなど）のデータに適用すること
ができる。文書画像では、文字の輪郭等や文字間隔に類
似性があり、又、網点画像は網点周期性、網点形状の同
一性等が類似している。この類似性の持つ冗長性をユニ
バーサル符号により削減し、有効な圧縮を行うことがで
きる。尚、以下では、情報理論で用いられている呼称を
踏襲し、データの１ワード単位を文字と呼び、データが
任意ワードつながったものを文字列と呼ぶことにする。2. Description of the Related Art This universal code is an information preserving type data compression method. Since statistical properties of an information source are not assumed in advance at the time of data compression, various types (character codes, object codes, etc.) can be used for data. Can be applied. Document images have similarities in character outlines and character intervals, and halftone images have similar dot periodicity and halftone dot shape. Redundancy having this similarity can be reduced by the universal code, and effective compression can be performed. In the following, following the name used in information theory, one word unit of data will be called a character, and data connected with an arbitrary word will be called a character string.

【０００４】ユニバーサル符号の代表的な方法として、
ジブ−レンペル(Ziv-Lempel)符号がある。例えば、宗像
「Ziv-Lempelのデータ圧縮法」、情報処理、Vol.26,No.
1,1985年参照。このZiv-Lempel符号では、ユニバーサ
ル型と、増分分解型(Incremental parsing) の2つの
アルゴリズムが提案されており、ユニバーサル型アルゴ
リズムを用いた実用的な方法として、ＬＺＳＳ符号(T.
C. Bell,"Better OMP/LText Compression", IEEE Tran
s. on Commun., Vol. COM-34, No.12, Dec.1986)があ
り、又、増分分解型アルゴリズムを用いた実用的な方法
として、ＬＺＷ（Lempel- Ziv- Welch)符号がある(T.A.
Welch, " A Technique for High-Performance Data Co
mpression" , Computer, June 1984)。これらの符号の
内、高速処理できることと、アルゴリズムの簡単さから
ＬＺＷ符号が記憶装置のファイル圧縮などで使われるよ
うになっている。[0004] As a typical method of the universal code,
There is a Ziv-Lempel code. For example, Munakata "Ziv-Lempel Data Compression Method", Information Processing, Vol. 26, No.
See 1,1985. In this Ziv-Lempel code, two algorithms of a universal type and an incremental decomposition type (Incremental parsing) have been proposed.As a practical method using the universal type algorithm, an LZSS code (T.
C. Bell, "Better OMP / LText Compression", IEEE Tran
s. on Commun., Vol. COM-34, No. 12, Dec. 1986), and a practical method using an incremental decomposition type algorithm is an LZW (Lempel-Ziv-Welch) code ( TA
Welch, "A Technique for High-Performance Data Co
mpression ", Computer, June 1984) Among these codes, the LZW code is used for file compression of a storage device because of its high-speed processing and the simplicity of the algorithm.

【０００５】ＬＺＷ符号化ＬＺＷ符号化においては、書き換え可能な辞書を設け、
入力文字列を相異なる文字列に分け、この文字列を出現
した順に番号を付けて辞書に登録すると共に、現在入力
している文字列を辞書に登録してある最長一致文字列の
辞書番号だけで表して符号化する。[0005] LZW encoding In LZW encoding, a rewritable dictionary is provided,
Divides the input character string into different character strings, assigns numbers to the character strings in the order in which they appear, and registers them in the dictionary, and stores only the dictionary number of the longest matching character string registered in the dictionary with the currently input character string. And encode it.

【０００６】図８はＬＺＷ符号化説明図、図９は辞書構
成の説明図、図１０はＬＺＷ符号化処理の流れ図であ
る。尚、説明を簡単にするために、ａ，ｂ，ｃ３文字か
らなる文字列をＬＺＷ符号化してデータ圧縮するものと
する。予め、全文字につき一文字からなる文字列（ａ，
ｂ，ｃ）に登録番号を付して辞書に初期登録すると共
に、辞書の登録数Ｎを文字種数Ｍとする（Ｍ→Ｎ）。・
・ステップ１０１FIG. 8 is an explanatory diagram of LZW encoding, FIG. 9 is an explanatory diagram of a dictionary configuration, and FIG. 10 is a flowchart of LZW encoding processing. For the sake of simplicity, it is assumed that a character string composed of three characters a, b, and c is subjected to LZW encoding and data compression. In advance, a character string (a,
b, c) is given a registration number and is initially registered in the dictionary, and the number N of registrations in the dictionary is set to the number M of character types (M → N).・
・ Step 101

【０００７】かかる状態で、最初の文字Ｋを入力し、該
文字の登録番号を参照番号ωとし、これを語頭文字列(p
refix string)とする（ステップ１０２）。ついで、入
力データの次の文字Ｋを読み込み（ステップ１０３）、
ステップ１０２で求めた語等文字列ωにステップ１０３
で読み込んだ文字Ｋを加えた文字列（ωＫ）が現在の辞
書にあるか否かを検索する（ステップ１０４）。In such a state, the first character K is inputted, the registration number of the character is set as the reference number ω, and this is referred to as the initial character string (p
refix string) (step 102). Next, the next character K of the input data is read (step 103),
Step 103 is applied to the word string ω obtained in step 102.
A search is performed to determine whether or not the character string (ωK) to which the character K read in (1) is added in the current dictionary (step 104).

【０００８】文字列（ωＫ）が辞書に存在すれば、文字
列（ωＫ）をωに置き換え（ステップ１０５）、しかる
後、入力データが終了したか判断し（ステップ１０
６）、データが終了してなければステップ１０３に戻り
以降の処理を繰返し、文字列（ωＫ）が辞書から捜せな
くなるまで最大一致長文字列の検索を続ける。一方、ス
テップ１０６において、入力データが終了していれば、
参照番号ωを符号語 code（ω）として出力して（ステ
ップ１０７）、符号化処理を終了する。If the character string (ωK) exists in the dictionary, the character string (ωK) is replaced with ω (step 105), and thereafter, it is determined whether the input data has been completed (step 10).
6) If the data is not completed, the process returns to step 103 and the subsequent processing is repeated, and the search for the maximum matching length character string is continued until the character string (ωK) cannot be searched from the dictionary. On the other hand, if the input data is completed in step 106,
The reference number ω is output as the code word code (ω) (step 107), and the encoding process ends.

【０００９】最長一致文字列の検索が続行して、ステッ
プ１０４において、文字列（ωＫ）が辞書に存在しなく
なれば、参照番号ωを符号語 code（ω）として出力
し、又、文字列（ωＫ）を辞書アドレスＮに登録し、更
にステップ１０３で読み込んだ文字Ｋを参照番号ωに置
き換えると共に、辞書アドレスＮを１インクリメントす
る（ステップ１０８）。次いで、ステップ１０６により
入力データが終了したか判断し、判断結果に応じて以降
の処理を繰り返す。If the search for the longest matching character string continues and the character string (ωK) does not exist in the dictionary in step 104, the reference number ω is output as a code word code (ω), and the character string (ω ωK) is registered in the dictionary address N, the character K read in step 103 is replaced with the reference number ω, and the dictionary address N is incremented by 1 (step 108). Next, it is determined in step 106 whether the input data has been completed, and the subsequent processing is repeated according to the determination result.

【００１０】図８及び図９を参照してＬＺＷ符号化を具
体的に説明すると、以下のようになる。すなわち、図８
の入力データを左から右に向けて１文字づつ読み込む。
最初の文字ａを読み込んだ時、辞書にはａの他に一致す
る文字列はないから、ａの登録番号「１」（参照番号ω
＝１）を符号語（code（ω））として出力する。そし
て、拡張した文字列ａｂに登録番号４を付けて辞書に登
録する。実際の登録は文字列「１ｂ」の形となる。続い
て、２番目の文字ｂが入力文字列の先頭になる。辞書に
はｂの他に一致する文字列がないので、ｂの登録番号
（参照番号）２を符号語として出力し、拡張した文字列
ｂａを実際には２ａの形で登録番号５を付けて辞書に登
録する。[0010] The LZW encoding will be specifically described below with reference to FIGS. 8 and 9. That is, FIG.
Is read one character at a time from left to right.
When the first character a is read, there is no matching character string other than a in the dictionary, so the registration number “1” of a (reference number ω
= 1) as a codeword (code (ω)). Then, the extended character string ab is assigned a registration number 4 and registered in the dictionary. The actual registration is in the form of a character string "1b". Subsequently, the second character b becomes the head of the input character string. Since there is no matching character string other than b in the dictionary, the registration number (reference number) 2 of b is output as a code word, and the expanded character string ba is actually assigned a registration number 5 in the form of 2a. Register in the dictionary.

【００１１】以上により、３番目の文字ａが入力文字列
の先頭になる。辞書には先頭文字ａが存在するから、該
文字の登録番号１に次の文字ｂを付した文字列「１ｂ」
が存在するか調べる。文字列「１ｂ」が存在するから、
該文字列の登録番号４に次の文字ｃを付した文字列「４
ｃ」が存在するか調べる。文字列「４ｃ」は存在しない
から、最長一致文字列「１ｂ」の登録番号「４」を符号
語として出力し、拡張した文字列「４ｃ」を登録番号６
に辞書登録し、以後同様に符号化と辞書登録を繰り返し
て全入力文字のＬＺＷ符号化処理を実行する。As described above, the third character a becomes the head of the input character string. Since the first character a exists in the dictionary, a character string “1b” obtained by adding the next character b to the registration number 1 of the character
Check if exists. Since the character string "1b" exists,
A character string “4” in which the following character c is added to the registration number 4 of the character string
c "is present. Since the character string “4c” does not exist, the registration number “4” of the longest matching character string “1b” is output as a code word, and the expanded character string “4c” is registered number 6
Then, the encoding and dictionary registration are repeated in the same manner to execute the LZW encoding process for all input characters.

【００１２】図１１はＬＺＷ復号化処理の流れ図であ
り、復号化処理では、符号化の逆の操作が行われる。す
なわち、復号化に際しては、符号化と同様に、全文字に
つき一文字からなる文字列（ａ，ｂ，ｃ）に登録番号を
付して辞書に初期登録すると共に、辞書の登録数Ｎを文
字種数Ｍとする（Ｍ→Ｎ）。・・ステップ２０１ついで、最初の符号CODEを読み込み、該符号CODEをOLDc
odeとする。又、最初の符号は既に辞書に登録された一
文字の登録番号のいずれかに該当することから、入力符
号CODE(＝登録番号)が示す文字Ｋを出力する。又、出力
した文字Ｋは後の例外処理のためにcharとして設定す
る。・・以上ステップ２０２FIG. 11 is a flowchart of the LZW decoding process. In the decoding process, the reverse operation of the encoding is performed. That is, at the time of decoding, as in the case of encoding, a character string (a, b, c) consisting of one character for every character is assigned a registration number and initially registered in the dictionary, and the number N of registered dictionary is changed to the number of character types. Let M be (M → N). ..Step 201 Then, the first code CODE is read and the code OLDc is read.
ode. Further, since the first code corresponds to one of the registration numbers of one character already registered in the dictionary, the character K indicated by the input code CODE (= registration number) is output. The output character K is set as char for later exception processing. ..Step 202 above

【００１３】しかる後、次の符号CODEを読み込んでNEWc
odeとしてセットすると共に(ステップ２０３）、符号CO
DE(＝登録番号)が辞書に定義(登録)されているか否かを
チェックする(ステップ２０４）。通常、入力した符号C
ODE(＝登録番号)は前回までの処理で辞書に登録されて
いるから、ステップ２０４において「ＮＯ」となるか
ら、次に、符号CODE(＝登録番号)が指示する辞書の登録
文字列が（ωＫ）か判断する。すなわち、符号CODEが指
示する辞書の登録文字列が（ωＫ）のように、参照番号
ωと文字Ｋの結合文字列であるか判断する（ステップ２
０５）。Thereafter, the next code CODE is read and NEWc is read.
ode (step 203) and code CO
It is checked whether DE (= registration number) is defined (registered) in the dictionary (step 204). Normally, input code C
Since the ODE (= registration number) has been registered in the dictionary in the processing up to the previous time, "NO" is obtained in step 204. Next, the registered character string of the dictionary indicated by the code CODE (= registration number) is ( ωK). That is, it is determined whether the registered character string of the dictionary indicated by the code CODE is a combined character string of the reference number ω and the character K, such as (ωK) (step 2).
05).

【００１４】参照番号ωと文字Ｋの結合文字列であれ
ば、文字Ｋを一時的にスタックし、参照番号ωの符号語
code（ω）（実際にはcode（ω）＝ω）を新たなCODEと
し、かつ文字数Ｃを１カウントアップし（ステップ２０
６）、ステップ２０５に戻る。以後、ステップ２０５、
２０６の処理をCODEが示す登録文字列が一文字に至るま
で再帰的に繰り返す。If the character string is a combination of the reference number ω and the character K, the character K is temporarily stacked and the code word of the reference number ω is
code (ω) (actually code (ω) = ω) is set as a new CODE, and the number of characters C is incremented by 1 (step 20).
6) Return to step 205. Thereafter, step 205,
Step 206 is recursively repeated until the registered character string indicated by CODE reaches one character.

【００１５】ステップ２０５において、CODEが示す文字
列が一文字の場合には、すなわち、符号CODEが指示する
辞書の登録文字列が（Ｋ）の場合には、Ｋを出力し、し
かる後、スタックしたＣ個の文字列をＬＩＦＯ（Last i
n Fast Out)形式でポップアップして出力する。又、前
回の復号化において使用した符号OLDcodeに、今回復号
した文字列の先頭文字Ｋを付加した文字列（OLDcode，
Ｋ）を登録番号Ｎとして辞書に登録し、次にＮを１イン
クリメントする（Ｎ＋１→Ｎ）。更に、復号文字列の先
頭文字Ｋをcharとし、かつNEWcodeをOLDcodeとする。・
・以上ステップ２０７In step 205, if the character string indicated by the CODE is one character, that is, if the registered character string in the dictionary indicated by the code CODE is (K), K is output, and then the stack is executed. LIFO (Last i
n Pop up and output in Fast Out) format. A character string (OLDcode, OLDcode, OLDcode) obtained by adding the first character K of the character string decoded this time to the code OLDcode used in the previous decoding.
K) is registered in the dictionary as a registration number N, and then N is incremented by 1 (N + 1 → N). Further, the first character K of the decoded character string is set to char, and NEWcode is set to OLDcode.・
・ Step 207

【００１６】以後、符号入力が終了したか判断し（ステ
ップ２０８）、終了してなければステップ２０３に戻り
次の符号を読み込んで復号処理を繰り返す。ところで、
符号化処理においては、ある文字列の符号化と、該文字
列に次の先頭文字を付加した文字列の辞書登録とを同時
に行うため、次の符号化処理において直前に符号化した
文字列の符号語を使用できる。しかし、復号化処理にお
いては、直前に復号した文字列に、今回復号した文字列
の先頭文字列を付加した文字列を辞書登録するため、辞
書登録が符号化処理に比べて１回遅れる。このため、符
号化処理において、直前に符号化した文字列の符号語を
使用すると、復号化処理において、該符号語が登録（定
義）されていない場合を生じる。この場合がステップ２
０４においてCODEが定義されていない状態になり、「Ｙ
ＥＳ」となる。Thereafter, it is determined whether code input has been completed (step 208). If not completed, the process returns to step 203 to read the next code and repeat the decoding process. by the way,
In the encoding process, since the encoding of a certain character string and the dictionary registration of the character string in which the next leading character is added to the character string are performed at the same time, in the next encoding process, Codewords can be used. However, in the decoding process, since the character string obtained by adding the head character string of the currently decoded character string to the character string decoded immediately before is registered in the dictionary, dictionary registration is delayed by one time as compared with the encoding process. For this reason, if the code word of the character string coded immediately before is used in the encoding process, the code word may not be registered (defined) in the decoding process. This is step 2
In 04, no code is defined, and "Y
ES ”.

【００１７】例えば、図１２に示すように符号化に際し
て、文字列「ａ・・・ｚ」に対してOLDcodeを出力する
と共に、文字列「ａ・・・ｚａ」をNEWcodeとして辞書
登録し、次の文字列「ａ・・・ｚａ」をNEWcodeで出力
し、文字列「ａ・・・ｚａｂ」を辞書登録する。さて、
復号側で符号語NEWcodeを読み込んだ時、該符号語は復
号側で辞書登録されていないので、復号ができない。し
かし、NEWcodeとOLDcodeを比較すると、以下の関係 NEWcodeの文字列＝OLDcodeの文字列＋OLDcodeの文字列
の先頭文字（char) がある。このため、ステップ２０４で「ＮＯ」となれ
ば、セットされているcharをスタックすると共に、OLDc
odeをCODEとみなし、かつ、OLDcodeにcharを付加した文
字列をNEWcodeとし(ステップ２０９）、以後CODEを用い
てステップ２０５以降の処理を行う。For example, at the time of encoding, as shown in FIG. 12, an OLD code is output for a character string "a... Z", and the character string "a. Is output by NEWcode, and the character string "a ... zab" is registered in the dictionary. Now,
When the decoding side reads the code word NEWcode, the decoding cannot be performed because the code word is not registered in the dictionary on the decoding side. However, comparing NEWcode and OLDcode, there is the following relationship: NEWcode character string = OLDcode character string + OLDcode character string first character (char). For this reason, if "NO" in step 204, the set char is stacked and OLDc
The character string in which ode is regarded as CODE and OLDcode is added with char is set as NEWcode (step 209), and the processing from step 205 onward is performed using CODE thereafter.

【００１８】図１３を参照して復号化処理を具体的に説
明すると以下のようになる。最初の入力符号は「１」で
あり、一文字ａ，ｂ，ｃについては既に登録番号１、
２、３として辞書登録されているから（図９と同様）、
辞書の参照により符号「１」に一致する登録番号の文字
列ａに置き換えて出力する。次に、符号「２」について
も同様にして文字ｂに置き換えて出力する。この時、前
回処理した符号「１」と今回復号した最初の一文字ｂと
を組み合わせた「１ｂ」を新たな登録番号４に辞書登録
する。The decoding process will be described below in detail with reference to FIG. The first input code is “1”, and one character a, b, c is already registered with the registration number 1,
Since the dictionary is registered as 2, 3 (similar to FIG. 9),
By referring to the dictionary, it is replaced with the character string a of the registration number that matches the code “1” and output. Next, the code "2" is similarly replaced with the character b and output. At this time, “1b”, which is a combination of the previously processed code “1” and the first character “b” decoded this time, is registered in a new registration number 4 in the dictionary.

【００１９】３番目の符号「４」は辞書の検索により、
「１ｂ」から「ａｂ」と置き換えて文字列「ａｂ」を出
力する。同時に、前回処理した符号「２」と今回復号し
た１番目の文字ａとを組み合わせた文字列「２ａ（＝ａ
ｂ）」を新たな登録番号５に辞書登録する。以下、同様
に、復号処理を繰り返す。尚、図１１のステップ２０９
の例外処理は、第６番目の入力符号「８」の復号で生じ
る。符号「８」は復号時に辞書に定義されておらず、復
号できない。この場合には、前回処理した符号「５」に
前回復号した文字列「ｂａ」の最初の一文字ｂを加えた
文字列「５ｂ」を求め、更に「２ａｂ」、「ｂａｂ」と
置き換えられて出力される。そして、前回の符号語
「５」に今回復号した文字列の文字ｂを加えた文字列
「５ｂ」に登録番号「８」を付加して辞書登録する。The third code "4" is obtained by searching the dictionary.
The character string “ab” is output by replacing “1b” with “ab”. At the same time, the character string “2a (= a) is obtained by combining the code“ 2 ”processed last time and the first character a decoded this time.
b) is registered in a dictionary with a new registration number 5. Hereinafter, similarly, the decoding process is repeated. Step 209 in FIG.
Exception processing occurs in the decoding of the sixth input code “8”. The code “8” is not defined in the dictionary at the time of decoding and cannot be decoded. In this case, a character string "5b" is obtained by adding the first character b of the character string "ba" decoded last time to the code "5" processed last time, and further replaced with "2ab" and "bab". Is output. Then, a registration number "8" is added to the character string "5b" obtained by adding the character b of the character string decoded this time to the previous code word "5", and the dictionary is registered in the dictionary.

【００２０】以上のように、ユニバーサル符号は、符号
化対象の性質が未知でも、それを学習しながら符号化し
てゆく圧縮法であり、既出のデータ列を辞書に登録して
行き、同じデータ列が表れた時には、その登録番号を符
号化データ（符号語）として送出するというシンプルな
ものである。しかし、図１０の流れ図に従って符号化す
ると、１つの文字列を辞書検索する際、最悪、辞書全体
をサーチしなければならず、このため、符号化処理に時
間がかかる問題があった。そこで、従来は、辞書検索に
外部ハッシュ法（open hashingまたはchaining)を用い
て処理速度を上げている(例えば、オーム社刊、情報処
理学会編、情報処理ハンドブック参照)。As described above, the universal code is a compression method in which, even if the property of the object to be encoded is unknown, the encoding is performed while learning it. Is displayed, the registration number is transmitted as encoded data (codeword). However, when encoding is performed in accordance with the flowchart of FIG. 10, when searching a dictionary for one character string, the entire dictionary must be searched in the worst case, and there is a problem that the encoding process takes time. Therefore, conventionally, the processing speed is increased by using an external hashing method (open hashing or chaining) for dictionary search (for example, see Ohmsha, edited by Information Processing Society of Japan, Information Processing Handbook).

【００２１】外部ハッシュ法文字列からなる集合Ｓを考えた時、集合Ｓにおける文字
列ｘの格納位置のアドレスを文字列ｘより直接計算でき
る仕組になっていると高速の検索ができる。これを実現
するのがハッシュ法である。記憶場所（ハッシュ表）に
０〜（m-1)までのアドレスが付加されているとすると、
ハッシュ法では、関数ｈ：Ｓ→［０，１，２，・・・(m-1)］を１つ定めて、Ｓの文字列ｘのアドレスをh(x)で求め
る。関数ｈをハッシュ関数、値ｈ(x)をｘのハッシュ・
アドレスといっている。ハッシュ法は、通常、文字列の
集合Ｓの大きさがアドレス数ｍに比べて遥かに大きい場
合に用いられる。そこで、ハッシュ関数ｈをどのように
選んだとしても、集合Ｓにおける相異なる文字列ｘ₁，
ｘ₂に対してｈ(x₁)＝ｈ(x₂)となる場合が起こり得る。
これを衝突と呼び、衝突に対する対策の一つとして外部
ハッシュ法が用いられる。外部ハッシュ法は、図１４に
示すように、ハッシュアドレスｉ毎に連結リスト(name
next)ＬＳＴを用意し、ｈ(x)＝ｉとなるｘはその連結リ
ストの先頭から順に格納する。尚、同じハッシュアドレ
スを有するそれぞれのリストはバケット(bucket)と呼ば
れる。Considering a set S composed of external hash character strings, a high-speed search can be performed if the address of the storage position of the character string x in the set S can be directly calculated from the character string x. The hash method realizes this. If addresses from 0 to (m-1) are added to the storage location (hash table),
In the hash method, one function h: S → [0, 1, 2,... (M−1)] is determined, and the address of the character string x of S is obtained by h (x). The function h is a hash function, and the value h (x) is a hash function of x.
It is called an address. The hash method is usually used when the size of a character string set S is much larger than the number of addresses m. Therefore, no matter how the hash function h is selected, different character strings x ₁ ,
x ₂ with respect to _{h (x 1) = h (} x 2) and the case may occur made.
This is called a collision, and an external hash method is used as one of the measures against the collision. In the external hash method, as shown in FIG. 14, a linked list (name
next) An LST is prepared, and x where h (x) = i is stored sequentially from the head of the linked list. Note that each list having the same hash address is called a bucket.

【００２２】図１５はＬＺＷ符号の辞書作成及び辞書検
索に外部ハッシュ法を採用した時のハッシュ表（辞書）
のデータ構造であり、ある文字列ｘにより指定されるハ
ッシュアドレスｉに、文字列ｘに続く文字Ｋ（イクステ
ンションextension)と、文字列ｘに続く文字Ｋ以外の文
字を格納するアドレス（nextアドレス)と、文字Ｋに更
に続く文字の格納アドレス（firstアドレス）が記憶さ
れるようになっている。尚、firstアドレスは図１４の
索引dictionaryに対応し、nextアドレスは連結リスト(n
ame next) に対応する。FIG. 15 is a hash table (dictionary) when the external hash method is used for creating a dictionary of the LZW code and searching the dictionary.
In a hash address i specified by a certain character string x, an address (next address) for storing a character K (extension extension) following the character string x and a character other than the character K following the character string x ) And the storage address (first address) of the character further following the character K. The first address corresponds to the index dictionary in FIG. 14, and the next address is the linked list (n
ame next).

【００２３】図１６は外部ハッシュ法による辞書構造説
明図で、(a)は従来のＬＺＷ符号化による辞書、(b)は外
部ハッシュ法による辞書、(c)は外部ハッシュ法を用い
た辞書の木構造図であり、それぞれ図８に示す順序で、
ａ，ｂ，ｃの３文字よりなる入力文字列が発生した場合
である。図１６(b)のアドレスｉにはfirst欄、next欄、
extension欄が対応付けされており、図１５で示した構
造でデータを記憶するようになっている。すなわち、ア
ドレスｉのextension欄にはアドレスｉを指示する文字
列ｘに連結する文字Ｋが書き込まれ、next欄には文字列
ｘに連結する文字Ｋ以外の文字を格納するアドレスが書
き込まれ、first欄には文字Ｋに更に連結する文字の格
納アドレス（firstアドレス）が記憶されるようになっ
ている。例えば、アドレス４の文字ｂに着目すると、該
アドレス４はアドレス１の文字（１文字からなる文字
列）ａのfirstアドレスにより指示され、アドレス４のe
xtension欄には文字列ａに連結する文字ｂが書き込ま
れ、next欄には文字列ａに連結する別の文字ａを格納す
るアドレス１０が書き込まれ、first欄には文字ｂに更
に続く文字ｃのアドレス６が書き込まれている。FIGS. 16A and 16B are explanatory diagrams of a dictionary structure using the external hash method. FIG. 16A shows a dictionary using the conventional LZW encoding, FIG. 16B shows a dictionary using the external hash method, and FIG. It is a tree structure diagram, in the order shown in FIG.
This is a case where an input character string consisting of three characters a, b, and c occurs. In the address i of FIG. 16B, the first column, the next column,
An extension column is associated with the extension column, and stores data in the structure shown in FIG. That is, a character K linked to the character string x indicating the address i is written in the extension field of the address i, an address storing characters other than the character K linked to the character string x is written in the next field, The column stores a storage address (first address) of a character further linked to the character K. For example, if attention is paid to the character b of the address 4, the address 4 is designated by the first address of the character (a character string composed of one character) of the address 1 and the e of the address 4
The character b connected to the character string a is written in the xtension field, the address 10 for storing another character a connected to the character string a is written in the next field, and the character c further following the character b is written in the first field. Address 6 is written.

【００２４】初期時、アドレス１、２、３のextension
欄には全１文字列ａ，ｂ，ｃが初期登録され、その他の
欄は「空（＝０）」になっており、以後、後述する外部ハ
ッシュ法による符号処理が行われ、図１６(c)に示す木
構造状に辞書(図１６(b))が作成される。尚、(c)におい
て、□で囲んだ番号はアドレスである。以上により、例
えば、アドレス１の文字ａを参照すると、該文字ａに
は、アドレス４の文字ｂがfirst方向に連結し、該文字
ｂにはfirst方向に更にアドレス６の文字ｃが連結し、
更に、前記文字ａにはアドレス１０の文字ａが連結し、
アドレス１０の文字ａには順次アドレス１１、１２の文
字ａが連結していることが示される。また、アドレス２
の文字ｂに着目すると、該文字ｂにはアドレス５の文字
ａがfirst方向に連結し、以後、アドレス８、９の文字
ｂ，ａが順次連結していることが示される。更に、アド
レス３の文字ｃに着目すると、該文字ｃにはアドレス７
の文字ｂがfirst方向に連結していることが示される。Initially, the extensions of addresses 1, 2, and 3
In the column, all the one-character strings a, b, and c are initially registered, and the other columns are “empty (= 0)”, and thereafter, code processing by the external hash method described later is performed, and FIG. A dictionary (FIG. 16B) is created in a tree structure shown in c). In (c), the numbers enclosed by squares are addresses. Thus, for example, referring to the character a at the address 1, the character a at the address 4 is linked to the character b at the first direction, the character b is further linked to the character c at the address 6 at the first direction,
Further, the character a of the address 10 is linked to the character a,
It is shown that the character a at the address 10 is sequentially connected to the character a at the addresses 11 and 12. Address 2
Focusing on the character b, the character b of the address 5 is connected to the character b in the first direction, and thereafter, the characters b and a of the addresses 8 and 9 are sequentially connected. Further focusing on the character c at the address 3, the character c has the address 7
Is connected in the first direction.

【００２５】外部ハッシュ法による符号化処理図１７は外部ハッシュ法によるＬＺＷ符号化処理の流れ
図である。この符号化処理においては、外部ハッシュ法
により参照番号ｉの文字列に一文字を付加した文字列の
アドレスをハッシュアドレス（索引）として引く。連結
リストには、参照番号ｉの文字列に付加される文字を格
納するfirst,nextアドレスが格納してあり、該文字と入
力文字Ｋの一致を検査し、不一致ならば逐次連結リスト
を手繰ることによって、これまで出現した全ての一文字
付加文字列を検索することができる。もし、バケット中
に付加した文字列が存在しない場合には、最終的にリス
トの連結アドレスから０が得られ、該当する文字列が登
録されていないことを知ることができる。 Encoding Process Using External Hash Method FIG. 17 is a flowchart of LZW encoding process using the external hash method. In this encoding process, an address of a character string obtained by adding one character to the character string of the reference number i by the external hash method is subtracted as a hash address (index). The linked list stores first and next addresses for storing a character to be added to the character string of reference number i, and checks whether the character and the input character K match, and if they do not match, sequentially processes the linked list. As a result, all the one-character additional character strings that have appeared so far can be searched. If the added character string does not exist in the bucket, 0 is finally obtained from the linked address of the list, and it can be known that the corresponding character string is not registered.

【００２６】予め、全文字につき一文字からなる文字列
(a,b,c,・・・)を、辞書アドレス１〜Ｍのextension欄に初
期登録すると共に（Ｍは文字種数）、辞書の先頭アドレ
スｎを文字種数Ｍ＋１とする（Ｍ＋１→ｎ）。また、最
初の文字Ｋを入力して該文字を記憶するアドレス（該文
字の参照番号）をｉとし、これを語頭文字列(prefix st
ring)とする。更に、辞書における全アドレスのfirst欄
の内容first[1,NMAX]、next欄の内容next[1,NMAX]及び
アドレスＭ＋１〜ＮＭＡＸのextension欄の内容を全て
０に初期化する。・・ステップ３０１A character string consisting of one character for all characters in advance
(a, b, c,...) are initially registered in the extension fields of dictionary addresses 1 to M (M is the number of character types), and the leading address n of the dictionary is set to the number of character types M + 1 (M + 1 → n). Further, an address (a reference number of the character) at which the first character K is inputted and the character is stored is defined as i, and this is represented by a prefix string (prefix st).
ring). Further, the contents first [1, NMAX] of the first field of all addresses, the contents next [1, NMAX] of the next field, and the contents of the extension fields of the addresses M + 1 to NMAX are all initialized to 0. ..Step 301

【００２７】かかる状態で、次の文字Ｋを入力し（ステ
ップ３０２）、ωにｉを代入すると共に（ｉ→ω、Ｋの
直前までの文字列の参照番号をωとする）、ｊ＝０とす
る（ステップ３０３）。また、現アドレスｉの候補文字
ext(i)にfirst方向に連結する候補文字を格納するアド
レスを示すデータfirst(i)をｉとする（ステップ３０
４）。尚、現アドレスｉの候補文字ext(i)にfirst方向
に連結する文字がなければfirst(i)＝０であり、ｉ＝０
となる。In this state, the next character K is inputted (step 302), i is substituted into ω (i → ω, the reference number of the character string up to immediately before K is ω), and j = 0. (Step 303). Also, the candidate character of the current address i
The data first (i) indicating the address for storing the candidate character to be connected to ext (i) in the first direction is set to i (step 30).
4). If the candidate character ext (i) at the current address i does not include a character connected in the first direction, first (i) = 0 and i = 0
Becomes

【００２８】ついで、ｉ＝０であるか判断し、換言すれ
ば、first方向に連結する候補文字が存在するかチェッ
クし(ステップ３０５）、存在しなければステップ３０
３で保存した参照番号（アドレス）ωを符号語 code
（ω）として出力する（ステップ３０６）。Next, it is determined whether or not i = 0, in other words, it is checked whether or not there is a candidate character to be connected in the first direction (step 305).
The reference number (address) ω saved in step 3 is the code word code
(Ω) (step 306).

【００２９】しかる後、ｉ＝ｎとすると共に、ｎを1イ
ンクリメントし（ｎ＋１→ｎ）、更にステップ３０２で
入力した文字Ｋをアドレスｉのexstension欄に書き込む
(Ｋ→ext(i))。すなわち、続き文字Ｋを辞書登録する
（ステップ３０７）。次いで、ｊ＝０であるかチェック
し（ステップ３０８）、ｊ＝０であれば、ｉ→first
(ω)とする（ステップ３０９）。これにより、Ｋの直前
に入力した文字を記憶するアドレス(＝Ｋの直前に入力
した文字迄の参照番号ωが指示するアドレス)のfirst欄
にｉ（今回の文字Ｋを格納するアドレス）が書き込まれ
ることになる。Thereafter, i = n, n is incremented by 1 (n + 1 → n), and the character K input in step 302 is written in the extension field of the address i.
(K → ext (i)). That is, the subsequent character K is registered in the dictionary (step 307). Next, it is checked whether j = 0 (step 308). If j = 0, i → first
(ω) (step 309). As a result, i (address for storing the current character K) is written in the first column of the address for storing the character input immediately before K (= the address indicated by the reference number ω to the character input immediately before K). Will be.

【００３０】以後、ステップ３０２で入力した文字Ｋの
アドレスをｉとし（ステップ３１０）、データが終了し
たかチェックし（ステップ３１１）、終了していればｉ
→ωとした後、ωを符号語 code（ω）として出力して
（ステップ３１２）、符号化処理を終了し、データが終
了してなければステップ３０２に戻り以降の処理を繰り
返す。Thereafter, the address of the character K input in step 302 is set as i (step 310), and it is checked whether the data is completed (step 311).
After setting ω, ω is output as a code word code (ω) (step 312), and the encoding process is terminated. If the data is not completed, the process returns to step 302 and the subsequent processes are repeated.

【００３１】一方、ステップ３０５においてｉ≠０であ
れば、換言すればfirst方向に連結する候補文字が存在
すれば、該文字（アドレスｉのextension欄に書き込ま
れている文字ext(i)）がステップ３０２で入力した文字
Ｋと一致するか調べる（ステップ３１３）。一致してい
ればステップ３１１に飛び、データ終了していれば、ｉ
→ωとした後、ωを符号語 code（ω）として出力して
（ステップ３１２）、符号化処理を終了し、データが終
了してなければステップ３０２に戻り、更に次の文字を
入力して以降の最長一致文字列の検索処理を繰り返す。On the other hand, if i ≠ 0 in step 305, in other words, if there is a candidate character to be connected in the first direction, the character (the character ext (i) written in the extension field of the address i) is It is checked whether it matches the character K input in step 302 (step 313). If they match, the process jumps to step 311. If the data ends, i
After setting ω, ω is output as a code word code (ω) (step 312), and the encoding process is terminated. If the data is not completed, the process returns to step 302, and the next character is input. The subsequent search processing for the longest matching character string is repeated.

【００３２】ステップ３１３において、first方向に連
結する候補文字がステップ３０２で入力した文字Ｋと一
致してなければ、ｊにｉを代入すると共に、アドレスｉ
のnext欄に書き込まれているアドレスデータnext(i)を
新たなｉとし（ステップ３１４）、ステップ３０５に戻
る。尚、next方向に連結する文字がなければアドレスｉ
のnext欄には０が書き込まれており、ｉ＝０となる。In step 313, if the candidate character to be connected in the first direction does not match the character K input in step 302, i is substituted for j and the address i
The address data next (i) written in the “next” column is set as a new i (step 314), and the process returns to step 305. If there are no characters connected in the next direction, the address i
0 is written in the next column of, and i = 0.

【００３３】以後、ｉ≠０であればステップ３１３に移
行し同様の最長一致文字列の検索処理が繰り返えされ、
最早一致文字が存在しなくなるとステップ３０５におい
てｉ＝０となり、ステップ３０３で保存した参照番号
（アドレス）ωを符号語 code（ω）として出力し、前
述の処理を繰り返す。尚、ステップ３１４の処理の直後
のステップ３０５でｉ＝０が判断されると、ステップ３
０８においてｊ≠０となり、ｉ→next(ω)とされる（ス
テップ３１５）。これにより、Ｋの直前に入力した文字
迄の参照番号ωが指示するアドレスのnext欄にｉ（今回
の文字Ｋを格納するアドレス）が書き込まれることにな
る。Thereafter, if i ≠ 0, the flow shifts to step 313 to repeat the same longest matching character string search process.
When the matching character is no longer present, i = 0 in step 305, the reference number (address) ω stored in step 303 is output as the code word code (ω), and the above-described processing is repeated. Incidentally, if i = 0 is determined in step 305 immediately after the processing of step 314, step 3
At 08, j ≠ 0, and i → next (ω) is set (step 315). As a result, i (address for storing the current character K) is written in the next column of the address indicated by the reference number ω up to the character input immediately before K.

【００３４】以上要約すれば、新たな文字Ｋを入力した
時、それ迄の文字列に連結する候補文字をfirst方向に
求め、見つかればfirst方向に同様に求めて行き、見つ
からなくなればnext方向に調べ、見つかれば、再びfirs
t方向に調べて行き、以後同様な処理を繰り返して見つ
からなくなった時の参照番号ｉをωとして最長一致文字
列の符号語code(ω)を出力すると共に、アドレスｉに最
新の入力文字についてのfirst, next, extension等を登
録するものである。以上の流れ図に従って、図８の最上
段に示す文字列を符号化出力してゆくと、最下段の如く
文字列が辞書登録されて行き、図１８、図１９、図２０
の斜線で示すように辞書登録量が増加して行く。尚、図
１８(a)は初期化された後の状態である。In summary, when a new character K is input, candidate characters to be connected to the previous character string are obtained in the first direction. If found, the same characters are obtained in the first direction. Examine, if found, firs again
In the direction t, the same process is repeated to output the code word code (ω) of the longest matching character string with ω being the reference number i when no more characters are found. First, next, extension, etc. are registered. When the character string shown at the top of FIG. 8 is coded and output according to the above flow chart, the character string is registered in the dictionary as shown at the bottom, and FIG. 18, FIG. 19, and FIG.
, The dictionary registration amount increases. FIG. 18A shows a state after the initialization.

【００３５】図２１は従来の外部ハッシュ法による辞書
検索回路の構成図である。ＭＰＵ（マイクロ・プロセッ
サ・ユニット）１は入力文字Ｋを読み込んで一致検査部
２のレジスタ２ａに格納すると共に、辞書メモリ３より
候補文字Ｋ′とそれに繋がるfirstアドレスｆωとnext
アドレスｎωを読み出し、それぞれ読み込み部４のレジ
スタ４ａ，４ｂ，４ｃにラッチする。一致検査部２の比
較回路２ｂは入力文字Ｋとレジスタ４ａにラッチされた
候補文字Ｋ′が一致するか比較検査を行う。一致しない
場合には、コントローラ５をしてマルチプレクサ（ＭＰ
Ｘ）４ｄにより、レジスタ４ｃにラッチされているnext
アドレスｎωを選択させる。これにより、ＭＰＵ１はne
xtアドレスｎωで辞書検索を行い、新たな候補文字Ｋ′
とそれに繋がるfirstアドレスｆωとnextアドレスｎω
を読み出し、それぞれ読み込み部４のレジスタ４ａ，４
ｂ，４ｃにラッチして比較検査を行う。FIG. 21 is a block diagram of a conventional dictionary search circuit using the external hash method. The MPU (microprocessor unit) 1 reads the input character K and stores it in the register 2a of the match checking unit 2, and also stores the candidate character K 'and the first address fω and next connected to it from the dictionary memory 3.
The address nω is read and latched in the registers 4a, 4b, 4c of the reading unit 4, respectively. The comparison circuit 2b of the match checking unit 2 performs a comparison check whether the input character K matches the candidate character K 'latched in the register 4a. If they do not match, the controller 5 controls the multiplexer (MP
X) The next latched in the register 4c by 4d
The address nω is selected. Thereby, MPU1 ne
A dictionary search is performed at the xt address nω, and a new candidate character K ′ is searched.
And the first address fω and the next address nω connected to it
Are read, and the registers 4a and 4
b and 4c to perform a comparison test.

【００３６】一方、比較回路２ｂにおいて、入力文字Ｋ
と候補Ｋ′が一致した場合には、コントローラ５をして
マルチプレクサ４ｄにより、レジスタ４ｂにラッチされ
ているfirstアドレスｆω選択させる。これにより、Ｍ
ＰＵ１はfirstアドレスｆωで辞書検索を行い、新たな
候補文字Ｋ′とそれに繋がるfirstアドレスｆωとnext
アドレスｎωを読み出し、それぞれ読み込み部４のレジ
スタ４ａ，４ｂ，４ｃにラッチすると共に、次の入力文
字Ｋを読み取ってレジスタ２ａに格納し、以後上記の比
較検査を行う。On the other hand, in the comparison circuit 2b, the input character K
And the candidate K ', the controller 5 causes the multiplexer 4d to select the first address fω latched in the register 4b. This gives M
PU1 performs a dictionary search using the first address fω, and finds a new candidate character K ′ and the first address fω and the next
The address nω is read out and latched in the registers 4a, 4b, 4c of the reading unit 4, and the next input character K is read out and stored in the register 2a.

【００３７】以後、上記処理が行われ、比較回路２ｂで
一致が取れず、しかも、マルチプレクサ４ｄの出力が０
となれば、換言すれば連結検出部６において検索すべき
firstアドレスｆωとnextアドレスｎωがもうないと確
認されると、最長一致文字列の検索が終了し、この時点
で辞書検索をストップし、以後次の入力文字に対して最
長一致文字列の検索を行う。Thereafter, the above processing is performed, and no match is obtained in the comparison circuit 2b.
In other words, search should be performed in the connection detection unit 6
When it is confirmed that the first address fω and the next address nω are no longer present, the search for the longest matching character string is completed. At this point, the dictionary search is stopped, and the search for the longest matching character string is performed for the next input character. Do.

【００３８】[0038]

【発明が解決しようとする課題】以上のように、外部ハ
ッシュ法によるＬＺＷ符号化処理においては、ある文字
列の末尾に連結する候補文字Ｋ′のアドレスが指定さ
れ、該アドレスに候補文字Ｋ′とfirstアドレスとnext
アドレスが格納されているため、従来の外部ハッシュ法
によらないＬＺＷ符号化に比べて辞書検索を高速に行え
る利点がある。しかし、上記外部ハッシュ法による辞書
検索では、１度の辞書アクセスに対して１つの候補文字
Ｋ′と１組のfirstアドレスとnextアドレスしか読み出
すことができないため、候補文字が多い場合検索一致に
時間が掛かる問題がある。As described above, in the LZW encoding process using the external hash method, the address of the candidate character K 'to be connected to the end of a certain character string is specified, and the candidate character K' is added to the address. And first address and next
Since the address is stored, there is an advantage that the dictionary search can be performed at a higher speed than the conventional LZW coding that does not use the external hash method. However, in the dictionary search by the external hash method, only one candidate character K 'and one set of the first address and the next address can be read for one dictionary access. Is a problem.

【００３９】以上から本発明の目的は、外部ハッシュ法
による辞書検索を高速に行える辞書検索方法を提供する
ことである。本発明の別の目的は、外部ハッシュ法によ
るＬＺＷ符号化の辞書検索において、一度の辞書検索に
より複数の候補文字を読み出し、複数の候補文字と１つ
の入力文字とを一度に照合して辞書検索を高速に行える
辞書検索方法を提供することである。Accordingly, it is an object of the present invention to provide a dictionary search method capable of performing a high-speed dictionary search by the external hash method. Another object of the present invention is to perform a dictionary search by reading a plurality of candidate characters by a single dictionary search and collating the plurality of candidate characters with one input character at a time in a dictionary search of LZW encoding by an external hash method. The purpose of the present invention is to provide a dictionary search method that can perform the search at high speed.

【００４０】本発明の更に別の目的は、一度の辞書検索
により複数の候補文字と共に、複数のアドレスを読み出
し、複数の候補文字と１つの入力文字との比較照合結果
（第１候補と一致、第２候補と一致、いずれの候補とも
一致せず等）に基づいて次に参照すべき複数の候補文字
を直ちに前記所定アドレスから読み出して比較照合して
辞書検索を高速に行える辞書検索方法を提供することで
ある。Still another object of the present invention is to read out a plurality of addresses together with a plurality of candidate characters by a single dictionary search, and compare and match the plurality of candidate characters with one input character (coincidence with the first candidate, A plurality of candidate characters to be referred next are immediately read out from the predetermined address based on the second candidate or not matched with any of the candidates, and compared and compared to provide a dictionary search method capable of high-speed dictionary search. It is to be.

【００４１】[0041]

【課題を解決するための手段】図１は本発明の原理説明
図である。１１は検索済文字列に連結する複数の候補文
字が検索可能となるように複数のデータ要素を前記検索
済文字列が指定するアドレスに格納して符号化済みの部
分文字列を記憶する辞書メモリ、１２は入力文字を読み
込んだり、辞書メモリより候補文字、アドレス等を読み
出したり、新規文字列を辞書メモリに登録するＭＰＵ
（プロセッサ）、１３は辞書メモリより同時に読み出し
た複数のデータを記憶するレジスタ部、１４は１つの入
力文字と複数の候補文字との一致照合を行う比較照合
部、１５は比較結果に基づいて次の複数の候補文字のア
ドレスを選択するアドレス選択部（マルチプレクサＭＰ
Ｘ）である。前記複数のデータ要素は、例えば、 (1) 検索済文字列に連結する第１文字(ext₁)と、(2) 第
１文字迄の文字列の番号（ω₁）と、(3) 前記検索済文
字列に連結する文字であって第１文字とは別の第２文字
（ext ₂)と、 (4) 第２文字までの文字列の番号（ω₂）
と、(5) 前記検索済文字列に連結する第１、第２文字と
は別の第３文字の格納アドレス(next₂)と、(6) 第１文
字に連結する第４文字の格納アドレス(first₁)と、(7)
第２文字に連結する第５文字の格納アドレス(first₂)
と、(8) 第１、第２文字のうち幾つ記憶されているかを
示すフラグ(flag)を有している。FIG. 1 illustrates the principle of the present invention.
FIG. 11 is a plurality of candidate sentences connected to the searched character string
Search for multiple data elements so that characters can be searched
Part encoded and stored at the address specified by the
A dictionary memory for storing minute character strings;
Or read candidate characters, addresses, etc. from the dictionary memory.
MPU to output or register new character strings in dictionary memory
(Processor), 13 read out simultaneously from dictionary memory
A register section 14 for storing a plurality of pieces of data.
Comparison matching that matches characters and multiple candidate characters
, 15 based on the result of the comparison.
Address selection section (multiplexer MP)
X). The plurality of data elements include, for example, (1) a first character (ext₁) And (2)
Character string number up to one character (ω₁) And (3) the searched sentence
The second character that is connected to the character string and is different from the first character
(Ext _Two) And (4) the number of the character string up to the second character (ω_Two)
And (5) first and second characters connected to the searched character string.
Is the storage address of another third character (next_Two) And (6) First sentence
Storage address of the fourth character connected to the character (first₁) And (7)
The storage address of the fifth character connected to the second character (first_Two)
And (8) how many of the first and second characters are stored
It has a flag (flag) shown.

【００４２】[0042]

【作用】検索済文字列に連結する複数の候補文字が検索
可能となるように複数のデータ要素を前記検索済文字列
が指定する辞書メモリ１１のアドレスに記憶して辞書を
作成し、最長一致文字列の検索に際して、ＭＰＵ１２は
検索済文字列の次の１つの入力文字を読み込むと共に、
該検索済文字列に連結する複数の候補文字及び次の候補
文字の位置を指定するアドレスデータを含むデータ要素
を辞書メモリ１１より一括して読み出してレジスタ部１
３に格納する。比較照合部１４は、複数個の候補文字と
１つの入力文字とを比較して一致照合を行い、一致する
候補文字が存在する場合には、アドレスデータに基づい
て次の複数の候補文字を含むデータ要素を辞書メモリか
ら読み出してレジスタ部１３に格納し、以後次の入力文
字と次の複数の候補文字とを比較して最長一致検索処理
を続行する。このように、一度の辞書検索により複数の
候補文字を読み出し、複数の候補文字と１つの入力文字
とを一度に照合して辞書検索を行うようにしたから、辞
書検索を高速に行うことができる。又、一度の辞書検索
により複数の候補文字と共に、複数のアドレスを読み出
し、複数の候補文字と１つの入力文字との比較照合結果
（第１候補と一致、第２候補と一致、いずれの候補とも
一致せず等）に基づいて次に参照すべき複数の候補文字
を直ちに前記所定アドレスから読み出して比較照合して
辞書検索を高速に行うことができる。A plurality of data elements are stored in an address of the dictionary memory 11 specified by the searched character string so that a plurality of candidate characters connected to the searched character string can be searched, and a dictionary is created. When searching for a character string, the MPU 12 reads the next input character following the searched character string,
A plurality of candidate characters to be linked to the searched character string and data elements including address data designating the position of the next candidate character are collectively read out from the dictionary memory 11 and registered in the register unit 1.
3 is stored. The comparison / matching unit 14 compares a plurality of candidate characters with one input character to perform matching and, if there is a matching candidate character, includes the next plurality of candidate characters based on the address data. The data element is read from the dictionary memory and stored in the register unit 13. Thereafter, the next input character is compared with the next plurality of candidate characters, and the longest match search process is continued. As described above, a plurality of candidate characters are read out by one dictionary search, and the plurality of candidate characters are collated with one input character at a time to perform a dictionary search, so that a dictionary search can be performed at high speed. . Also, a plurality of addresses are read together with a plurality of candidate characters by a single dictionary search, and the result of comparison and comparison between the plurality of candidate characters and one input character (matching with the first candidate, matching with the second candidate, matching with any of the candidates) A plurality of candidate characters to be referred next can be immediately read out from the predetermined address based on the non-coincidence, and compared and collated to perform a dictionary search at high speed.

【００４３】更に、前記データ要素は、検索済文字列に
連結する第１文字(ext₁)と、前記検索文字に連結する文
字であって第１文字とは別の第２文字（ext₂)と、前記
検索文字列に連結する第１、第２文字とは別の第３文字
の格納アドレス(next₂)と、第１文字に連結する第４文
字の格納アドレス(first₁)と、第２文字に連結する第５
文字の格納アドレス(first₂)と、第１、第２文字のうち
幾つ記憶されているかを示すフラグ(flag)を少なくとも
有し、(1)１つの入力文字と複数の候補文字である前記
第１、第２文字の一致照合に際して、入力文字と第１、
第２文字が共に異なる場合にはアドレス（next₂)に基づ
き次のデータ要素を読み出して該データ要素が指示する
複数の候補文字と入力文字との一致照合を行い、(2)入
力文字と第１文字が一致する場合には、アドレス（firs
t₁)に基づいて次の入力文字に対するデータ要素を読み
出し、該データ要素が指示する複数の候補文字と次の入
力文字との一致照合を行い、(3)入力文字と第２文字が
一致する場合には、アドレス(first₂)に基づいて次の入
力文字に対するデータ要素を読み出し、該データ要素が
指示する複数の候補文字と次の入力文字との一致照合を
行って最長一致検索処理を続行する。このようにすれ
ば、一度の辞書検索により複数の候補文字と共に、複数
のアドレスを読み出し、複数の候補文字と１つの入力文
字との比較照合結果（第１候補と一致、第２候補と一
致、いずれの候補とも一致せず等）に基づいて次に参照
すべき複数の候補文字を直ちに前記所定アドレスから読
み出して比較照合を連続的に行え、辞書検索を高速に行
うことができる。Further, the data element includes a first character (ext ₁ ) connected to the searched character string and a second character (ext ₂ ) connected to the search character and different from the first character. A storage address (next ₂ ) of a third character different from the first and second characters linked to the search character string, a storage address of a fourth character linked to the first character (first ₁ ), Fifth concatenation of two characters
It has at least a character storage address (first ₂ ) and a flag (flag) indicating how many of the first and second characters are stored. (1) One input character and a plurality of candidate characters When matching the first and second characters, the input character and the first and second characters are compared.
If the second character is different, the next data element is read out based on the address (next ₂ ), and a plurality of candidate characters indicated by the data element are matched with the input character, and (2) the input character and the second If one character matches, the address (firs
The data element for the next input character is read based on t ₁ ), and a plurality of candidate characters specified by the data element are matched with the next input character, and (3) the input character matches the second character In this case, the data element for the next input character is read based on the address (first ₂ ), and a plurality of candidate characters indicated by the data element are matched and matched with the next input character to continue the longest match search processing. I do. With this configuration, a plurality of addresses are read together with a plurality of candidate characters by a single dictionary search, and the result of comparison and comparison between the plurality of candidate characters and one input character (matching the first candidate, matching the second candidate, A plurality of candidate characters to be referred to next are immediately read out from the predetermined address based on the result of the comparison, and the comparison and comparison can be continuously performed, and the dictionary search can be performed at high speed.

【００４４】[0044]

【実施例】図２は本発明に係わる辞書メモリの１つのア
ドレスに格納されるデータの構造説明図である。ある文
字列ｘにより指定されるアドレスｉ（＝ω₁）には、 (1) 文字列ｘの最終文字に連結する第１文字(ext₁)と、
(2) 第１文字迄の文字列の番号（ω₁）と、(3) 前記最
終文字に連結する文字であって第１文字とは別の第２文
字（ext₂)と、 (4) 第２文字までの文字列の番号
（ω₂）と、(5) 前記最終文字に連結する第１、第２文
字とは別の第３文字の格納アドレス(next₂)と、(6) 第
１文字に連結する第４文字の格納アドレス(first₁)と、
(7) 第２文字に連結する第５文字の格納アドレス(first
₂)と、(8) 第１、第２文字のうち幾つ記憶されているか
を示すフラグ(flag)が記憶されて、辞書が作成される。FIG. 2 is an explanatory view of the structure of data stored at one address of a dictionary memory according to the present invention. The address i (= ω ₁ ) specified by a certain character string x includes: (1) the first character (ext ₁ ) connected to the last character of the character string x;
(2) a character string number up to the first character (ω ₁ ), (3) a second character (ext ₂ ) which is a character connected to the last character and is different from the first character, (4) A character string number up to the second character (ω ₂ ), (5) a storage address (next ₂ ) of a third character different from the first and second characters connected to the last character, and (6) The storage address (first ₁ ) of the fourth character linked to one character,
(7) The storage address of the fifth character linked to the second character (first
₂ ) and (8) flags indicating how many of the first and second characters are stored are stored, and a dictionary is created.

【００４５】図３は本発明による辞書メモリの内容説明
図であり、(a)は符号化説明図、(b)は本発明の辞書であ
り、辞書メモリの各アドレスにはにはflag欄、first
₁欄、first₂欄、next₂欄、ext₁欄、ext₂欄、ω₁欄、ω₂
欄が設けられている。図３(a)の上段に示す順序でａ，
ｂ，ｃの３文字よりなる入力文字列が発生すると、後述
する符号化処理により符号語が中段に示すように出力さ
れ、又、下段に示すように文字列が辞書登録される。こ
の辞書登録において、文字列は図２のデータ構造で辞書
メモリの各アドレスに登録され、その内容は図３(b)に
示すようになり、図２の表記法により表現すると図３
(c)に示す木構造状になる。FIG. 3 is a diagram for explaining the contents of a dictionary memory according to the present invention. FIG. 3 (a) is a diagram for explaining encoding, and FIG. 3 (b) is a dictionary according to the present invention. first
Column ₁ , first ₂ column, next ₂ column, ext ₁ column, ext ₂ column, ω ₁ column, ω ₂
A column is provided. In the order shown in the upper part of FIG.
When an input character string consisting of three characters b and c is generated, a code word is output as shown in the middle part by an encoding process described later, and the character string is registered in a dictionary as shown in the lower part. In this dictionary registration, a character string is registered at each address of the dictionary memory in the data structure of FIG. 2, and its contents are as shown in FIG. 3 (b).
The tree structure shown in (c) is obtained.

【００４６】例えば、アドレス１の第１文字ａ（ext₁)
を参照すると、該第１文字ａはfirst ₁方向にアドレス４
（first₁アドレス）のext₁欄、ext₂欄に格納された文字
等に連結し、該第１文字ａまでの文字列（１文字列ａ）
の参照番号（＝１）がω₁欄に格納されていることが示
される。For example, the first character a (ext₁)
, The first character a is first ₁Address 4 in direction
(First₁Address) ext₁Field, ext_TwoCharacter stored in field
Etc., and the character string up to the first character a (one character string a)
Reference number (= 1) is ω₁Field indicates that
Is done.

【００４７】又、アドレス１のfirst₁欄で指示された第
４アドレスのext₁欄には、アドレス１のext₁欄の文字ａ
（１文字列ａの最終文字）に連結する第１文字ｂが書き
込まれ、第４アドレスのext₂欄には、アドレス１のext₁
欄の文字ａに連結する第２文字ａが書き込まれ、第１文
字ｂにはfirst₁方向にアドレス６（first₁アドレス）の
ext₁欄に格納された文字が連結し、第２文字ａにはfirs
t₂方向にアドレス１１（first₂アドレス）のext₁欄に格
納された文字が連結し、第１文字ｂまでの文字列（２文
字列ａｂ）の参照番号（＝４）がω₁欄に格納され、第
２文字ａ迄の文字列（２文字列ａａ）の参照番号（＝１
０）がω₂欄に格納されていることが示される。尚、第
４アドレスのnext₂欄は「空(=0)」であるから、アドレス
１の文字ａ（ext₁)には第１、第２文字ｂ，ａ以外に連
結する文字列は存在しないことがわかる。In the ext ₁ column of the fourth address designated in the first ₁ column of the address 1, the character a in the ext ₁ column of the address 1 is stored.
The first character b to be linked to (the last character of one character string a) is written, and the ext ₂ column of the fourth address contains the ext _{1 of} the address _1.
A second character a to be connected to the character a in the column is written, and the first character b is assigned an address 6 (first ₁ address) in the first ₁ direction.
The characters stored in the ext ₁ column are concatenated, and the second character a is firs
The characters stored in the ext ₁ column of address 11 (first ₂ address) are connected in the t ₂ direction, and the reference number (= 4) of the character string (2 character strings ab) up to the first character b is stored in the ω ₁ column. The reference number (= 1) of the stored character string up to the second character a (two character strings aa)
0) are indicated as being stored in omega ₂ column. Since the next ₂ column of the fourth address is “empty (= 0)”, there is no character string connected to the character a (ext ₁ ) of the address 1 other than the first and second characters b and a. You can see that.

【００４８】アドレス４のfirst₁欄で指示された第６ア
ドレスのext₁欄には、アドレス４のext₁欄の文字ｂ（２
文字列ａｂの最終文字）に連結する第１文字ｃが書き込
まれ、該第１文字ｃまでの文字列（３文字列ａｂｃ）の
参照番号（＝６）がω₁欄に格納されていることが示さ
れる。尚、第６アドレスのflag欄の内容は1-0であるか
ら、アドレス４の第１文字ｂ（ext₁)には文字ｃ以外に
連結する文字は存在しないことがわかる。又、第６アド
レスのfirst₁欄は「空(=0)」であるから、３文字列ａｂｃ
に連結する文字がないことがわかる。[0048] The ext ₁ column sixth address specified by the first ₁ column address 4, the ext ₁ column address 4 characters b (2
First character c for connecting to the last character) strings ab is written, first the reference number of the string to the character c (3 string abc) (= 6) is stored in the omega ₁ column Is shown. Since the contents of the flag column of the sixth address are 1-0, it is understood that there is no character to be connected to the first character b (ext ₁ ) of the address 4 other than the character c. Further, since the first ₁ column sixth address is "empty (= 0)", 3 strings abc
It can be seen that there is no character linked to.

【００４９】アドレス４のfirst₂欄で指示された第１１
アドレスのext₁欄には、アドレス４のext₂欄の文字ａ
（２文字列ａａの最終文字）に連結する第１文字ａが書
き込まれ、該第１文字ａにはfirst₁方向にアドレス１２
（first₁アドレス）のext₁欄に格納された文字が連結
し、該第１文字ａまでの文字列（３文字列ａａａ）の参
照番号（＝１１）がω₁欄に格納されていることが示さ
れる。尚、第６アドレスのflag欄の内容は1-0であるか
ら、アドレス４の第２文字ａ（ext₂)には他に連結する
文字は存在しないことがわかる。The eleventh designated in the first ₂ column of address 4
In the ext ₁ column of the address, the character a in the ext ₂ column of the address 4
The first letter a is written for coupling to (2 characters last character string aa), said first character address in the first ₁ direction a 12
The characters stored in the ext ₁ column of (first ₁ address) are linked, and the reference number (= 11) of the character string (3 character strings aaa) up to the first character a is stored in the ω ₁ column. Is shown. Since the contents of the flag column of the sixth address are 1-0, it can be seen that there is no other character to be connected to the second character a (ext ₂ ) of the address 4.

【００５０】アドレス１１のfirst₁欄で指示された第１
２アドレスのext₁欄には、アドレス１１のext₁欄の文字
ａ（３文字列ａａａの最終文字）に連結する第１文字ａ
が書き込まれ、第１文字ａまでの文字列（４文字列ａａ
ａａ）の参照番号（＝１２）がω₁欄に格納されている
ことが示される。尚、第１２アドレスのflag欄の内容は
1-0であるから、アドレス１１の第１文字ａ（ext₁)には
他に連結する文字は存在しないことがわかる。又、第１
２アドレスのfirst₁欄は「空(=0)」であるから、４文字列
ａａａａに連結する文字がないことがわかる。以下同様
に、アドレス２の文字ｂ，アドレス３の文字ｃに連結す
る文字列が辞書登録されている。The first specified in the first ₁ column of the address 11
The ext ₁ column 2 address, first letter a coupling to ext ₁ column of character a the address 11 (the last character of the three-character string aaa)
Is written, and the character string up to the first character a (4 character strings aa
aa) the reference number (= 12) is shown to be stored in the omega ₁ column. The contents of the flag column of the twelfth address are
Since it is 1-0, it is understood that the first character a (ext ₁ ) of the address 11 has no other characters to be linked. Also, the first
Since the first ₁ column of the two addresses is "empty (= 0)", it can be seen that there are no characters linked to the four character string aaa. Similarly, a character string linked to the character b at address 2 and the character c at address 3 is registered in the dictionary.

【００５１】図４及び図５は本発明による符号化処理
（辞書検索、辞書登録）の流れ図である。予め、辞書メ
モリのアドレス１〜Ｍのext₁欄（ext₁[1,M]）に文字コ
ード(a,b,c,・・・)を初期登録すると共に（Ｍは文字種
数）、ω₁欄（ω₁[1,M]）に文字コードに対応するアド
レス（参照番号）を初期登録し、更に、flag欄（flag
[1,M]）に1-0(ext₁欄のみに文字が登録されいることを
示す)を初期登録する。FIGS. 4 and 5 are flowcharts of the encoding process (dictionary search and dictionary registration) according to the present invention. In advance, character codes (a, b, c,...) Are initially registered in the ext ₁ column (ext ₁ [1, M]) of addresses 1 to M of the dictionary memory (M is the number of character types) and ω ₁ The address (reference number) corresponding to the character code is initially registered in the field (ω ₁ [1, M]), and further, the flag field (flag)
Initially register 1-0 (indicating that characters are registered only in the ext ₁ column) in [1, M]).

【００５２】又、辞書の先頭アドレスｎをＭ＋１とする
（Ｍ＋１→ｎ）。更に、辞書における全アドレスの (1)first₁欄の内容first₁[1,NMAX]、(2)first₂欄の内容
first₂[1,NMAX]、(3)next₂欄の内容next₂[1,NMAX]、(4)
ext₂欄の内容ext₂[1,NMAX]、(5)ω₂欄の内容ω₂[1,NMA
X]を全て０に初期化すると共に、アドレスＭ+1〜ＮＭＡ
Ｘの(6)ext₁欄の内容ext₁[M+1,NMAX]、(7)ω₁欄の内容
ω₁[N+1,NMAX]を全て０に初期化し、又、(8)flag欄flag
[N+1,NMAX]を全て0-0(ext₁欄、ext₂欄に文字が登録され
いないことを示す)に初期化する。The head address n of the dictionary is set to M + 1 (M + 1 → n). In addition, the contents of the (1) first ₁ column of all addresses in the dictionary first ₁ [1, NMAX], (2) the contents of the first ₂ column
_{first 2 [1, NMAX],} (3) the content of the next ₂ column _{next 2 [1, NMAX],} (4)
ext ₂ column content ext ₂ [1, NMAX], (5) ω ₂ column content ω ₂ [1, NMA
X] are all initialized to 0, and addresses M + 1 to NMA
Initialize the contents ext ₁ [M + 1, NMAX] of the (6) ext ₁ column and (7) the contents ω ₁ [N + 1, NMAX] of the ω ₁ column of X to 0, and (8) flag Column flag
[N + 1, NMAX] are all initialized to 0-0 (indicating that no characters are registered in the ext ₁ and ext ₂ columns).

【００５３】更に、検索切り替えパラメータＴ及び登録
切り替えパラメータＵをそれぞれ０にし、又、最初の入
力文字Ｋを入力して該文字の参照番号をｉとし、これを
語頭文字列(prefix string)とする。尚、Ｔ＝０は、fir
st₁欄のアドレスデータが示すアドレスから次の複数の
候補文字を読み出すこと及び符号語出力に際して第１候
補文字の参照番号を出力することを意味し、Ｔ＝１は、
first₂欄のアドレスデータが示すアドレスから次の複数
の候補文字を読み出すこと及び符号語出力に際して第１
候補文字の参照番号を出力することを意味する。また、
Ｕ＝０は、辞書登録時に文字をext₁欄に登録すること
を、Ｕ＝１は、ext₂欄に登録することを意味する。・・
以上ステップ４０１Further, the search switching parameter T and the registration switching parameter U are set to 0, respectively, and the first input character K is input, the reference number of the character is set to i, and this is referred to as a prefix string. I do. Note that T = 0 is fir
means that from st ₁ column address indicated by the address data and outputs the reference number of the candidate character when it and the code word output reading the following plurality of candidate characters, T = 1, the
reading the next plurality of candidate characters from the address indicated by the address data in the first ₂ column;
This means to output the reference number of the candidate character. Also,
U = 0 is to register characters in ext ₁ Box during dictionary registration, U = 1 means that register with ext ₂ column.・・
Step 401

【００５４】かかる状態で、次の入力文字Ｋを入力し
（ステップ４０２）、ついで、ｉをωに代入すると共に
（ｉ→ω、入力文字Ｋの直前の文字迄の参照番号をωと
する）、ｊ＝０とする（ステップ４０３）。ついで、Ｔ
＝０かチェックし（ステップ４０４）、Ｔ＝０であれ
ば、直前の文字を格納するアドレスｉにおけるfirst₁ア
ドレス(first₁(i))を新たなｉとし（ステップ４０
５）、Ｔ＝１であれば、直前の文字を格納するアドレス
ｉにおけるfirst₂アドレス(first₂(i))を新たにｉとす
る（ステップ４０６）。In this state, the next input character K is input (step 402), and i is substituted for ω (i → ω, and the reference number up to the character immediately before the input character K is ω). , J = 0 (step 403). Then T
= 0 (step 404), and if T = 0, the first ₁ address (first ₁ (i)) at the address i storing the immediately preceding character is set as a new i (step 40).
5) If T = 1, the first ₂ address (first ₂ (i)) at the address i for storing the immediately preceding character is newly set to i (step 406).

【００５５】しかる後、ｉ＝０であるかチェックする
（ステップ４０７）。ｉ≠０であれば、第ｉアドレスの
ext₁欄の第１候補文字ext₁(i)が入力文字Ｋと一致する
かチェックし（ステップ４０８）、一致すればＴ＝０と
し（ステップ４０９）、ステップ４１０に飛ぶ。尚、一
致しない場合には後述するステップ４２１に飛び、入力
文字と第２候補文字との比較照合を行う。Thereafter, it is checked whether i = 0 (step 407). If i ≠ 0, the i-th address
It is checked whether the first candidate character ext ₁ (i) in the ext ₁ column matches the input character K (step 408), and if it matches, T = 0 (step 409) and the process jumps to step 410. If they do not match, the process jumps to step 421 to be described later, where the input character is compared with the second candidate character.

【００５６】ステップ４０９でＴ＝０とした後、ｉをω
に代入する（ステップ４１０）。すなわち、入力文字と
一致する第１候補文字を記憶するアドレスｉをωとす
る。ついで、データが終了したチェックする（ステップ
４１１）。データが終了してなければステップ４０２に
戻り、次の文字Ｋを入力して以降の処理を繰り返し、最
長一致文字列の検索を行う。一方、データが終了してい
れば、Ｔ＝０かチェックし（ステップ４１２）、Ｔ＝０
であればステップ４１０で保持した第１候補文字の参照
番号ωを符号語 code（ω）として出力し（ステップ４
１３）、符号化処理を終了する。After setting T = 0 in step 409, i is changed to ω
(Step 410). That is, the address i at which the first candidate character that matches the input character is stored is ω. Next, it is checked that the data has been completed (step 411). If the data is not completed, the process returns to step 402, the next character K is input, and the subsequent processing is repeated to search for the longest matching character string. On the other hand, if the data is completed, it is checked whether T = 0 (step 412), and T = 0
If so, the reference number ω of the first candidate character held in step 410 is output as a code word code (ω) (step 4
13), end the encoding process.

【００５７】一方、ステップ４０８で、第ｉアドレスの
ext₁欄の第１候補文字ext₁(i)が入力文字Ｋと一致しな
ければ、第ｉアドレスのflag(i)が1-0か、すなわち、第
２候補文字が存在するかチェックする(ステップ４２
１）。第２候補文字が存在すれば、第ｉアドレスのext₂
欄の第２候補文字ext₂(i)が入力文字Ｋと一致するかチ
ェックし（ステップ４２２）、一致すればＴ＝１とし
（ステップ４２３）、ステップ４１０に飛ぶ。尚、一致
しない場合には後述するステップ４２５に飛び、入力文
字と更に別の候補文字との比較照合を行う。On the other hand, at step 408, the i-th address
If the first candidate character ext ₁ (i) in the ext ₁ column does not match the input character K, it is checked whether flag (i) of the i-th address is 1-0, that is, whether a second candidate character exists ( Step 42
1). If the second candidate character exists, ext _{2 of} the i-th address
It is checked whether the second candidate character ext ₂ (i) in the column matches the input character K (step 422), and if it matches, T = 1 (step 423), and the routine jumps to step 410. If they do not match, the process jumps to step 425 to be described later, where the input character is compared with another candidate character.

【００５８】ステップ４２３でＴ＝１とした後、ｉをω
に代入する（ステップ４１０）。すなわち、入力文字と
一致する第２候補文字を記憶するアドレスｉをωとす
る。ついで、データが終了したチェックする（ステップ
４１１）。データが終了してなければステップ４０２に
戻り、次の文字Ｋを入力して以降の処理を繰り返し、最
長一致文字列の検索を行う。一方、データが終了してい
れば、Ｔ＝０かチェックし（ステップ４１２）、Ｔ＝１
であれば第２候補文字の参照番号ω₂（ω）を符号語cod
e（ω₂(ω)）として出力し（ステップ４１４）、符号化
処理を終了する。After setting T = 1 in step 423, i is changed to ω
(Step 410). That is, the address i for storing the second candidate character that matches the input character is ω. Next, it is checked that the data has been completed (step 411). If the data is not completed, the process returns to step 402, the next character K is input, and the subsequent processing is repeated to search for the longest matching character string. On the other hand, if the data is completed, it is checked whether T = 0 (step 412), and T = 1
, The reference number ω ₂ (ω) of the second candidate character is replaced with the code word cod
Output as e (ω ₂ (ω)) (step 414), and end the encoding process.

【００５９】又、ステップ４２１で、第ｉアドレスのfl
ag(i)が1-0であれば、換言すれば、第２候補文字が存在
しなければ、入力文字と一致する候補文字は存在しない
ことになり、Ｕ＝１とし（ステップ４２４）、以後、ス
テップ４２６以降の符号語の出力及び辞書登録処理を行
う。一方、ステップ４２２で第ｉアドレスのext₂欄の第
２候補文字ext₂(i)が入力文字Ｋと一致しなければ、換
言すれば、第１、第２候補文字が入力文字と一致しなけ
れば、ｉをｊに代入すると共に、Ｕ＝０とし、かつ、第
１、第２候補文字以外の候補文字の格納アドレスnext
₂(i)を新たなｉとする（next₂(i)→ｉ）。尚、別の候補
文字（next₂方向に連結する候補文字）が存在しない場
合にはnext₂(i)＝０となり、ｉ＝０となる。・・ステッ
プ４２５At step 421, the fl of the i-th address
If ag (i) is 1-0, in other words, if the second candidate character does not exist, there is no candidate character that matches the input character, and U = 1 (step 424). Then, the output of the code word and the dictionary registration processing after step 426 are performed. On the other hand, if the second candidate character ext ₂ (i) in the ext ₂ column of the i-th address does not match the input character K in step 422, in other words, the first and second candidate characters must match the input character. For example, i is substituted for j, U = 0, and the storage address next of candidate characters other than the first and second candidate characters
₂ (i) as a new _{i (next 2 (i) →} i). If there is no other candidate character (candidate character linked in the next ₂ direction), next ₂ (i) = 0 and i = 0. ..Step 425

【００６０】以後、ステップ４０７に戻って、ｉ＝０か
チェックし、ｉ≠０であれば、別の候補文字が存在する
から前述のステップ４０８以降の処理を繰り返す。しか
し、別の候補文字が存在しなければ、ｉ＝０となり、以
後、ステップ４２６以降の符号語の出力及び辞書登録処
理を行う。入力文字と一致する候補文字が存在しなくな
れば、ステップ４２６でＴ＝０かチェックする。Ｔ＝０
であれば、ステップ４０３で保持した第１候補文字の参
照番号ωを符号語 code（ω）として出力し（ステップ
４２７）、Ｔ＝１であれば第２候補文字の参照番号ω₂
（ω）を符号語 code（ω₂(ω)）として出力する（ステ
ップ４２８）。Thereafter, the flow returns to step 407 to check whether i = 0, and if i ≠ 0, there is another candidate character, so that the processing from step 408 onward is repeated. However, if another candidate character does not exist, i = 0, and thereafter, the output of the code word and the dictionary registration processing after step 426 are performed. If there are no more candidate characters matching the input character, it is checked in step 426 whether T = 0. T = 0
If so, the reference number ω of the first candidate character held in step 403 is output as a codeword code (ω) (step 427), and if T = 1, the reference number ω ₂ of the second candidate character
(Ω) is output as a code word code (ω ₂ (ω)) (step 428).

【００６１】符号語を出力後、ｉをｐに代入し、又、ｎ
をｉに代入し、更にｎを1インクリメントし（ステップ
４２９）、Ｕ＝０であるかチェックする（ステップ４３
０）。尚、flagが1-0で、第１候補文字のみが対象アド
レスに記憶されている場合のみ、Ｕ＝１となる。ステッ
プ４３０でＵ＝１であれば、入力文字Ｋを第ｐアドレス
(直前の入力文字が記憶されていたアドレス）のext₂欄
に書き込み(Ｋ→ext₂(p))、そのflag欄に1-1を書き込む
(1-1→flag(ｐ))。これにより、直前の文字に入力文字
Ｋが連結していることが登録される。・・・ステップ４
３１After outputting the code word, i is substituted for p, and n
Is substituted for i, and n is further incremented by 1 (step 429), and it is checked whether U = 0 (step 43).
0). Note that U = 1 only when the flag is 1-0 and only the first candidate character is stored at the target address. If U = 1 in step 430, the input character K is assigned to the p-th address.
(K → ext ₂ (p)) is written in the ext ₂ column of (the address where the last input character was stored), and 1-1 is written in the flag column.
(1-1 → flag (p)). This registers that the input character K is linked to the immediately preceding character. ... Step 4
31

【００６２】ついで、ｉを第ｐアドレス(直前の文字が
記憶されていたアドレス）のω₂欄に書き込む(ｉ→ω
₂(ｐ))。これにより、今回の入力文字Ｋ迄の文字列の参
照番号がω₂欄に登録されたことになる（ステップ４３
２）。以後、今回の文字Ｋの参照番号をｉにし、又、ｉ
をωに代入し、更に、Ｔ，Ｕを０にし（ステップ４３
３）、しかる後、データが終了したチェックする（ステ
ップ４１１）。データが終了してなければステップ４０
２に戻り、次の入力文字を読み込んで以降の処理を繰り
返す。一方、データが終了していれば、Ｔ＝０かチェッ
クし（ステップ４１２）、Ｔ＝０であるからステップ４
３３で保持した最終文字のωを符号語 code（ω）とし
て出力して（ステップ４１３）、符号化処理を終了す
る。Then, i is written in the ω ₂ column of the p-th address (the address where the last character was stored) (i → ω
₂ (p)). Thereby, the reference number of the string up to the current input character K is registered in the omega ₂ column (Step 43
2). Hereinafter, the reference number of the character K is set to i, and
Is substituted for ω, and T and U are set to 0 (step 43).
3) Then, it is checked that the data is completed (step 411). If the data is not completed, step 40
Returning to step 2, the next input character is read, and the subsequent processing is repeated. On the other hand, if the data has been completed, it is checked whether T = 0 (step 412).
The final character ω held in 33 is output as the code word code (ω) (step 413), and the encoding process ends.

【００６３】一方、ステップ４３０でＵ＝０であれば、
今回の入力文字Ｋを第ｉアドレス(何も記憶されていな
い新たなアドレス）のext₁欄に書き込み(Ｋ→ext
₁(i))、そのflag欄に1-０を書き込む(1-0→flag(i))。
これにより、それ迄の文字列の最終文字（直前の入力文
字）に今回の入力文字Ｋを連結した文字列が登録され
る。・・・ステップ４４１ついで、ｊ＝０かチェックする（ステップ４４２）。
尚、ステップ４２５の処理後にｉ＝０となれば、すなわ
ち、第１、第２候補が存在し、いずれとも一致せず、ne
xt₂欄が０の場合にｊ≠０となり、それ以外はｊ＝０と
なる。On the other hand, if U = 0 in step 430,
This input character K is written in the ext ₁ column of the i-th address (a new address in which nothing is stored) (K → ext
₁ (i)), 1-0 is written in the flag column (1-0 → flag (i)).
As a result, a character string in which the current input character K is linked to the last character of the previous character string (the immediately preceding input character) is registered. Step 441 Then, it is checked whether j = 0 (step 442).
If i = 0 after the process of step 425, that is, the first and second candidates exist, they do not match with each other, and ne
j ≠ 0 when the xt ₂ column is 0, and j = 0 otherwise.

【００６４】ｊ≠０の場合には、ｉ（今回の文字の格納
アドレス）を、アドレスｊのnext₂欄に書き込み(ｉ→ne
xt₂(j)、ステップ４４３)、以後ステップ４３３以降の
処理を繰り返す。ｊ＝０であれば、すなわち、第１候補
文字が存在しない場合、又は第１又は第２候補文字と一
致し、これら一致候補文字に連結する文字が存在しない
場合には、Ｔ＝０かチェックし（ステップ４４４）、Ｔ
＝０であれば、ｉ（今回の文字の格納アドレス）を直前
の入力文字の格納アドレスωのfirst₁欄に書き込み（ｉ
→fitst₁(ω)、ステップ４４５)、Ｔ＝１であれば、ｉ
（今回の文字の格納アドレス）を直前の入力文字の格納
アドレスωのfirst₂欄に書き込み（ｉ→fitst₂(ω)、ス
テップ４４６)、以後ステップ４３３以降の処理を繰り
返す。If j ≠ 0, i (the storage address of the current character) is written into the next ₂ column of the address j (i → ne
xt ₂ (j), step 443), and thereafter, the processing of step 433 and thereafter is repeated. If j = 0, that is, if the first candidate character does not exist, or if it matches the first or second candidate character and there is no character linked to these matching candidate characters, check if T = 0. (Step 444), T
If = 0, i (the storage address of the current character) is written in the first ₁ column of the storage address ω of the immediately preceding input character (i
→ fitst ₁ (ω), step 445), if T = 1, i
(The storage address of the current character) is written in the first ₂ column of the storage address ω of the immediately preceding input character (i → fitst ₂ (ω), step 446), and thereafter the processing of step 433 and thereafter is repeated.

【００６５】以上要約すれば、一度の辞書検索により第
１、第２候補文字ext₁、ext₂と共に、複数の次に参照す
べき候補文字が格納されているアドレスnext₂,first₁,f
irst ₂を読み出しておき、(1) 入力文字が第１候補文字
と一致した場合にはfirst₁アドレスより次に参照すべき
複数の候補文字とアドレスを直ちに読み出して次の入力
文字との比較照合を行い、又、(2)入力文字が第２候補
文字と一致した場合にはfirst₂アドレスより次に参照す
べき複数の候補文字とアドレスを直ちに読み出して次の
入力文字との比較照合を行い、更に、(3)入力文字が第
１、第２候補文字の両方と一致しない場合には、next₂
アドレスより次に参照すべき複数候補文字とアドレスを
直ちに読み出して今回の入力文字との比較照合を行い、
first,next方向に一致文字が見つからなくなると、辞書
検索を終了して最長一致文字列の符号語を出力し、つい
で辞書登録し、しかる後、次の入力文字から再び辞書検
索を開始する。以上の流れ図に従って、図３(a)の最上
段に示す文字列を符号化出力してゆくと図３(b)の辞書
が作成される。In summary, the first dictionary search allows
1, second candidate character ext₁, Ext_TwoTogether with multiple references
Address next where the candidate character to be stored is next_Two, first₁, f
irst _TwoAnd (1) the input character is the first candidate character
First if matches₁Should be referenced next to address
Read multiple candidate characters and addresses immediately and enter next
Performs comparison and comparison with characters, and (2) the input character is a second candidate
First if matches character_TwoRefer to next from address
Read multiple candidate characters and addresses immediately
The input character is compared and collated, and (3) the input character is
If they do not match both the first and second candidate characters, next_Two
Multiple candidate characters and addresses to be referenced next to the address
Immediately read out and compare and match with this input character,
If no matching character is found in the first or next direction, the dictionary
Terminates the search and outputs the code word of the longest matching character string.
To register the dictionary.
Start searching. According to the above flow chart, the top of FIG.
As the character string shown in the column is encoded and output, the dictionary shown in FIG.
Is created.

【００６６】図６は本発明に係わる辞書検索回路の第１
の実施例である。ＭＰＵ（マイクロ・プロセッサ・ユニ
ット）１２は図示しないＤＭＡ回路を介して入力文字Ｋ
を読み込んで比較照合部１４の第１レジスタ１４ａに格
納すると共に、直前の入力文字の参照番号をアドレスと
して辞書メモリ１１をアクセスし、以下のデータ (1) 所定文字に連結する第１文字(ext₁)と、(2) 第１文
字迄の文字列の番号（ω₁）と、(3) 前記所定文字に連
結する文字であって第１文字とは別の第２文字（ext₂)
と、 (4) 第２文字までの文字列の番号（ω₂）と、(5)
前記所定文字に連結する第１、第２文字とは別の第３文
字の格納アドレス(next₂)と、(6) 第１文字に連結する
第４文字の格納アドレス(first₁)と、(7) 第２文字に連
結する第５文字の格納アドレス(first₂)と、(8) 第１、
第２文字のうち幾つ記憶されているかを示すフラグ(fla
g)を取り込むと共にコントローラ１６に辞書検索の命令
を出す。これにより、コントローラ１６は、上記データ
のうち所定のデータをレジスタ部１３に一括して格納す
る。すなわち、flagデータをレジスタ１３ａに、first₁
アドレスをレジスタ１３ｂに、first₂アドレスをレジス
タ１３ｃに、next₂アドレスをレジスタ１３ｄに、第１
候補文字ext₁（＝Ｋ₁）をレジスタ１３ｅに、第２候補
文字ext₂（＝Ｋ₂）をレジスタ１３ｆに一度にラッチす
る。FIG. 6 shows a first example of the dictionary search circuit according to the present invention.
This is an embodiment of the present invention. An MPU (microprocessor unit) 12 inputs an input character K via a DMA circuit (not shown).
Is read and stored in the first register 14a of the comparison and collation unit 14, and the dictionary memory 11 is accessed using the reference number of the immediately preceding input character as an address, and the following data (1) The first character (ext ₁ ), (2) the number of the character string up to the first character (ω ₁ ), and (3) the second character (ext ₂ ) which is a character connected to the predetermined character and which is different from the first character
And (4) the number of the character string up to the second character (ω ₂ ), and (5)
A storage address (next ₂ ) of a third character different from the first and second characters connected to the predetermined character, (6) a storage address (first ₁ ) of a fourth character connected to the first character, and ( 7) The storage address (first ₂ ) of the fifth character to be connected to the second character, and (8) the first,
A flag (fla) indicating how many of the second characters are stored
g), and issues a dictionary search command to the controller 16. As a result, the controller 16 collectively stores predetermined data of the data in the register unit 13. That is, the flag data is stored in the register 13a, and the first ₁
The address is stored in the register 13b, the first ₂ address is stored in the register 13c, the next ₂ address is stored in the register 13d, and the first
The candidate character ext ₁ (= K ₁ ) is latched in the register 13e and the second candidate character ext ₂ (= K ₂ ) is latched in the register 13f at a time.

【００６７】ついで、コントローラ１６の制御で、比較
照合部１４の第１、第２比較回路１４ｂ，１４ｃは、第
１、第２の候補文字Ｋ₁，Ｋ₂とレジスタ１４ａにラッチ
してある入力文字Ｋを同時に比較照合する。尚、比較回
路１４ｂ，１４ｃは、flagデータが入力されており、
(1)第１、第２候補文字の両方が共に存在するか、(2)第
１候補文字のみが存在するか、(3)第１、第２候補文字
の両方共存在しないかを認識している。Then, under the control of the controller 16, the first and second comparison circuits 14b and 14c of the comparison and collation unit 14 input the first and second candidate characters K ₁ and K ₂ and the input latched in the register 14a. The character K is compared and matched at the same time. Note that the comparison circuits 14b and 14c receive the flag data, and
It is recognized whether (1) both the first and second candidate characters exist, (2) only the first candidate character exists, and (3) whether both the first and second candidate characters do not exist. ing.

【００６８】比較照合の結果、入力文字と第１候補文字
が一致する場合には、コントローラ１６はアドレス選択
部（マルチプレクサ）１５により、レジスタ１３ｂに記
憶されているfirst₁アドレスを選択・出力させる。As a result of the comparison and collation, if the input character matches the first candidate character, the controller 16 causes the address selecting section (multiplexer) 15 to select and output the first ₁ address stored in the register 13b.

【００６９】連結検出部１７はfirst₁アドレスが０であ
るかどうかを判断し、０であれば最早first₁方向に候補
文字は存在しないから、候補文字無しをＭＰＵ１２に通
知し、first₁アドレスが０でなければ、該アドレスをＭ
ＰＵ１２に通知する。ＭＰＵ１２はfirst₁アドレスが通
知されれば、次の入力文字を読み取ってレジスタ１４ａ
に格納すると共に、前記first₁アドレスを用いて辞書メ
モリ１１をアクセスし、読み取ったデータをコントロー
ラ１６の制御でレジスタ部１３に格納し、以後前述の比
較照合動作を繰り返す。[0069] connection detecting unit 17 determines whether the first ₁ address is 0, since the candidate character is not present in the longer first ₁ direction if 0, notifies the no candidate character MPU 12, first ₁ addresses If not 0, the address is M
Notify PU12. When notified of the first ₁ address, the MPU 12 reads the next input character, and
And the dictionary memory 11 is accessed using the first ₁ address, and the read data is stored in the register unit 13 under the control of the controller 16, and thereafter the above-described comparison and collation operation is repeated.

【００７０】一方、ＭＰＵ１２は連結検出部１７より、
候補文字無しを受信すれば、コントローラ１６に最長一
致文字列の検索が終了した旨を通知すると共に、符号語
を作成して図示しないＩ／Ｏポートより出力し、又、辞
書メモリ１１に辞書登録を行う。しかる後、入力データ
が終了してなければ、コントローラ１６に辞書検索を指
令して次の入力文字列に対して同様の動作を繰り返す。On the other hand, the MPU 12 detects the
When receiving no candidate character, it notifies the controller 16 that the search for the longest matching character string has been completed, creates a code word and outputs it from an I / O port (not shown), and registers the dictionary in the dictionary memory 11. I do. Thereafter, if the input data is not completed, a dictionary search is instructed to the controller 16 and the same operation is repeated for the next input character string.

【００７１】以上は、比較照合部１４による比較動作に
おいて、入力文字と第１候補文字が一致した場合である
が、入力文字と第２候補文字が一致した場合には、コン
トローラ１６はアドレス選択部１５をしてレジスタ１３
ｃに記憶されているfirst₂アドレスを選択・出力させ
る。連結検出部１７はfirst₂アドレスが０であるかどう
かを判断し、０であれば最早first₂方向に候補文字は存
在しないから、候補文字無しをＭＰＵ１２に通知し、fi
rst₂アドレスが０でなければ、該アドレスをＭＰＵ１２
に通知する。ＭＰＵ１２はfirst₂アドレスが通知されれ
ば、次の入力文字を読み取ってレジスタ１４ａに格納す
ると共に、前記first₂アドレスを用いて辞書メモリ１１
をアクセスし、読み取ったデータをコントローラ１６の
制御でレジスタ部１３に格納し、以後前述の比較照合動
作を繰り返す。The above is the case where the input character matches the first candidate character in the comparison operation by the comparison / matching unit 14, but if the input character matches the second candidate character, the controller 16 sets the address selection unit. Do 15 and register 13
Select and output the first ₂ address stored in c. The connection detecting unit 17 determines whether or not the first ₂ address is 0. If the address is 0, there is no candidate character in the first ₂ direction anymore, so the MPU 12 notifies the MPU 12 that there is no candidate character, and fi
If the rst ₂ address is not 0, the address is
Notify. MPU12 is if it is notified first ₂ address, dictionary memory 11 with stores read the next input character to the register 14a, the first ₂ address
And the read data is stored in the register section 13 under the control of the controller 16, and thereafter the above-described comparison and collation operation is repeated.

【００７２】一方、ＭＰＵ１２は連結検出部１７より、
候補文字無しを受信すれば、コントローラ１６に最長一
致文字列の検索が終了した旨を通知すると共に、符号語
を作成して図示しないＩ／Ｏポートより出力し、又、辞
書メモリ１１に辞書登録を行う。しかる後、入力データ
が終了してなければ、コントローラ１６に辞書検索を指
令して次の入力文字列に対して同様の動作を繰り返す。On the other hand, the MPU 12 outputs
When receiving no candidate character, it notifies the controller 16 that the search for the longest matching character string has been completed, creates a code word and outputs it from an I / O port (not shown), and registers the dictionary in the dictionary memory 11. I do. Thereafter, if the input data is not completed, a dictionary search is instructed to the controller 16 and the same operation is repeated for the next input character string.

【００７３】又、比較照合の結果、入力文字が第１、第
２候補文字の両方に一致しない場合には、コントローラ
１６はアドレス選択部１５をしてレジスタ１３ｄに記憶
されているnext₂アドレスを選択・出力させる。連結検
出部１７はnext₂アドレスが０であるかどうかを判断
し、０であれば最早next₂方向に候補文字は存在しない
から、候補文字無しをＭＰＵ１２に通知し、next₂アド
レスが０でなければ、該next₂アドレスをＭＰＵ１２に
通知する。ＭＰＵ１２はnext₂アドレスが通知されれ
ば、該アドレスを用いて辞書メモリ１１をアクセスし、
読み取ったデータをコントローラ１６の制御でレジスタ
部１３に格納し、以後前述の照合動作を繰り返す。If the input character does not match both the first and second candidate characters as a result of the comparison and collation, the controller 16 operates the address selecting section 15 to change the next ₂ address stored in the register 13d. Select and output. The connection detecting unit 17 determines whether or not the next ₂ address is 0. If it is 0, no candidate character exists in the next ₂ direction, so the MPU 12 notifies the MPU 12 that there is no candidate character, and the next ₂ address must be 0. If so, the MPU 12 is notified of the next ₂ address. When notified of the next ₂ address, the MPU 12 accesses the dictionary memory 11 using the address, and
The read data is stored in the register section 13 under the control of the controller 16, and thereafter, the above-described collating operation is repeated.

【００７４】一方、ＭＰＵ１２は連結検出部１７より、
候補文字無しを受信すれば、コントローラ１６に最長一
致文字列の検索が終了した旨を通知すると共に、符号語
を作成・出力し、又、辞書メモリ１１に辞書登録を行
う。しかる後、入力データが終了してなければ、コント
ローラ１６に辞書検索を指令して以上の動作を繰り返
す。On the other hand, the MPU 12 outputs
When receiving no candidate character, the controller 16 notifies the controller 16 that the search for the longest matching character string has been completed, creates and outputs a code word, and registers the dictionary in the dictionary memory 11. Thereafter, if the input data is not completed, a dictionary search is instructed to the controller 16 and the above operation is repeated.

【００７５】図７は本発明に係わる辞書検索回路の別の
実施例で、図６と同一部分には同一符号を付している。
図６の構成と異なる点は、(1) 辞書メモリ１１に第１候
補文字、第２候補文字の参照番号ω₁，ω₂を記憶しない
点、(2) firstアドレスから参照番号(アドレス)への変
換メモリ２１が設けられている点、(3) ＭＰＵ１２の符
号語出力処理が楽になっている点である。first₁欄に書
き込まれているアドレスデータ（first₁(i))と、該アド
レスデータで指定される第１候補文字の参照番号
（ω₁）は一致し、又、first₂欄に書き込まれているア
ドレスデータ（first₂(i))と、該アドレスデータで指定
される第２候補文字の参照番号（ω₂）は一致してい
る。従って、辞書メモリ１１に第１候補文字、第２候補
文字の参照番号ω₁，ω₂を記憶しなくてもアドレスデー
タfirst₁(i)，first₂(i)より求めることができる。FIG. 7 shows another embodiment of the dictionary search circuit according to the present invention, and the same parts as those in FIG. 6 are denoted by the same reference numerals.
The difference from the configuration of FIG. 6 is that (1) the reference numbers ω ₁ and ω ₂ of the first candidate character and the second candidate character are not stored in the dictionary memory 11, and (2) from the first address to the reference number (address). (3) that the MPU 12 can easily perform the code word output process. The address data (first ₁ (i)) written in the first ₁ column and the reference number (ω ₁ ) of the first candidate character specified by the address data match, and the address data written in the first ₂ column The address data (first ₂ (i)) corresponding to the address matches the reference number (ω ₂ ) of the second candidate character specified by the address data. Therefore, even if the reference numbers ω ₁ and ω ₂ of the first candidate character and the second candidate character are not stored in the dictionary memory 11, they can be obtained from the address data first ₁ (i) and first ₂ (i).

【００７６】このため、入力文字が第１候補文字と一致
した場合には変換メモリにfirst₁(i)を第１参照番号Ｗ₁
として格納し、入力文字が第２候補文字と一致した場合
にはfirst₂(i)を第１参照番号Ｗ₁として格納する。そし
て、第１参照番号Ｗ₁で指示された第１又は第２候補文
字が次の入力文字と一致した場合には、第１参照番号Ｗ
₁を第２参照番号Ｗ₂にし（Ｗ₁→Ｗ₂）、しかる後、前述
のやり方で新たな第１参照番号Ｗ₁を求めて記憶する。
しかし、第１参照番号Ｗ₁で指示された第１又は第２候
補文字が次の入力文字と一致しない場合には、それまで
の第２参照番号Ｗ ₂を検索した最長一致文字列の符号語
としてＭＰＵ１２に入力する。このようにすれば、ＭＰ
Ｕ１２が毎回一致した参照番号ωを覚えておかなくて済
み、一致後の次の検索のための文字入力迄の時間を短縮
することができる。Therefore, the input character matches the first candidate character.
The first in the conversion memory₁(i) is the first reference number W₁
And the input character matches the second candidate character
Has first_Two(i) is the first reference number W₁Stored as Soshi
And the first reference number W₁First or second candidate sentence specified by
If the character matches the next input character, the first reference number W
₁To the second reference number W_TwoNishi (W₁→ W_Two), And then,
New first reference number W in the manner of₁Is searched for and memorized.
However, the first reference number W₁1st or 2nd weather indicated by
If the complement does not match the next input character,
Second reference number W _TwoCodeword of the longest matching character string searched for
Is input to the MPU 12. In this way, MP
No need to remember the reference number ω that U12 matched each time
Only, shortens the time between entering characters for the next search after a match
can do.

【００７７】以上、本発明を実施例により説明したが、
本発明は請求の範囲に記載した本発明の主旨に従い種々
の変形が可能であり、本発明はこれらを排除するもので
はない。The present invention has been described with reference to the embodiments.
The present invention can be variously modified in accordance with the gist of the present invention described in the claims, and the present invention does not exclude these.

【００７８】[0078]

【発明の効果】以上、本発明によれば、一度の辞書検索
により複数の候補文字を読み出し、複数の候補文字と１
つの入力文字とを一度に照合して辞書検索を行うように
したから、辞書検索を高速に行うことができる。又、本
発明によれば、一度の辞書検索により複数の候補文字と
共に、複数のアドレスを読み出し、複数の候補文字と１
つの入力文字との比較照合結果（第１候補と一致、第２
候補と一致、いずれの候補とも一致せず等）に基づいて
次に参照すべき複数の候補文字を直ちに前記所定アドレ
スから読み出して比較照合するように構成したから、連
続して動作が可能で辞書検索を高速に行うことができ
る。As described above, according to the present invention, a plurality of candidate characters are read out by one dictionary search, and a plurality of candidate characters are
Since the dictionary search is performed by matching one input character at a time, the dictionary search can be performed at high speed. Further, according to the present invention, a plurality of addresses are read together with a plurality of candidate characters by a single dictionary search, and a plurality of candidate characters and one
Comparison result with two input characters (match with first candidate, second match
A plurality of candidate characters to be referred next are immediately read out from the predetermined address and compared and collated based on the candidate, matching none, etc. Search can be performed at high speed.

[Brief description of the drawings]

【図１】本発明の原理説明図である。FIG. 1 is a diagram illustrating the principle of the present invention.

【図２】本発明による辞書メモリの構造説明図である。FIG. 2 is a diagram illustrating the structure of a dictionary memory according to the present invention.

【図３】本発明による辞書内容説明図である。FIG. 3 is an explanatory diagram of dictionary contents according to the present invention.

【図４】本発明の符号化処理の第１の流れ図である。FIG. 4 is a first flowchart of an encoding process according to the present invention.

【図５】本発明の符号化処理の第２の流れ図である。FIG. 5 is a second flowchart of the encoding process of the present invention.

【図６】本発明による辞書検索回路の第１の構成図であ
る。FIG. 6 is a first configuration diagram of a dictionary search circuit according to the present invention.

【図７】本発明による辞書検索回路の別の構成図であ
る。FIG. 7 is another configuration diagram of the dictionary search circuit according to the present invention.

【図８】ＬＺＷ符号化説明図である。FIG. 8 is an explanatory diagram of LZW encoding.

【図９】辞書構成の説明図である。FIG. 9 is an explanatory diagram of a dictionary configuration.

【図１０】ＬＺＷ符号化のフローチャートである。FIG. 10 is a flowchart of LZW encoding.

【図１１】ＬＺＷ復号化のフローチャートである。FIG. 11 is a flowchart of LZW decoding.

【図１２】ＬＺＷ復号化の例外時における説明図であ
る。FIG. 12 is an explanatory diagram at the time of exception of LZW decoding.

【図１３】ＬＺＷ復号化説明図である。FIG. 13 is an explanatory diagram of LZW decoding.

【図１４】外部ハッシュ法の説明図FIG. 14 is an explanatory diagram of an external hash method

【図１５】外部ハッシュ法によるデータ構造説明図であ
る。FIG. 15 is an explanatory diagram of a data structure by an external hash method.

【図１６】外部ハッシュ法による辞書構造説明図であ
る。FIG. 16 is an explanatory diagram of a dictionary structure by an external hash method.

【図１７】外部ハッシュ法によるＬＺＷ復号化の辞書検
索、辞書登録のフローチャートである。FIG. 17 is a flowchart of dictionary search and dictionary registration for LZW decoding by the external hash method.

【図１８】辞書登録の様子を示す第１の説明図表であ
る。FIG. 18 is a first explanatory chart showing a state of dictionary registration.

【図１９】辞書登録の様子を示す第２の説明図表であ
る。FIG. 19 is a second explanatory diagram showing a state of dictionary registration.

【図２０】辞書登録の様子を示す第３の説明図表であ
る。FIG. 20 is a third explanatory diagram showing a state of dictionary registration.

【図２１】従来の外部ハッシュ法による辞書検索回路の
構成図である。FIG. 21 is a configuration diagram of a conventional dictionary search circuit using an external hash method.

[Explanation of symbols]

１１辞書メモリ１２ＭＰＵ１３レジスタ部１４比較照合部１５アドレス選択部 DESCRIPTION OF SYMBOLS 11 Dictionary memory 12 MPU 13 Register part 14 Comparison collation part 15 Address selection part

───────────────────────────────────────────────────── フロントページの続き (72)発明者千葉広隆神奈川県川崎市中原区上小田中1015番地富士通株式会社内 (56)参考文献特開昭60−116228（ＪＰ，Ａ) 特開平３−68219（ＪＰ，Ａ) 特開平３−204233（ＪＰ，Ａ) 特開昭59−231683（ＪＰ，Ａ) 特開昭58−155589（ＪＰ，Ａ) 特開昭61−13340（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) H03M 7/40 ────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Hirotaka Chiba 1015 Kamiodanaka, Nakahara-ku, Kawasaki City, Kanagawa Prefecture Inside Fujitsu Limited (56) References JP-A-60-116228 (JP, A) JP-A-3-68219 (JP, A) JP-A-3-204233 (JP, A) JP-A-59-231683 (JP, A) JP-A-58-155589 (JP, A) JP-A-61-13340 (JP, A) ( 58) Field surveyed (Int.Cl. ⁷ , DB name) H03M 7/40

Claims

(57) [Claims]

1. A character string that has already been encoded is divided into different partial character strings, the partial character strings are registered in a dictionary, and a partial character string that is the longest match with the input character string is searched from the dictionary. In the dictionary search method in data compression in which the number of the longest matching character string is designated and encoded, a plurality of data elements are added to the searched character string so that a plurality of candidate characters connected to the searched character string can be searched. Is stored in the storage area designated by the user, and the partial character string is registered in the dictionary. When searching for the longest matching character string, the data elements corresponding to the searched character string are collectively read from the dictionary and included in the data element. A plurality of candidate characters are compared with one input character for matching and, if there is a candidate character that matches the input character, a data element is read from the storage area specified by the candidate character and the next input character is read. Letters and Dictionary search method characterized by continuing the longest match search processing by comparing the plurality of candidate characters in the data element.

2. The data element includes a first character (ext ₁ ) connected to a searched character string, and a second character (ext) connected to the searched character string and different from the first character. ₂ ), a storage address (next ₂ ) of a third character different from the first and second characters connected to the searched character string, and a fourth address connected to the first character.
It has a character storage address (first ₁ ), a fifth character storage address (first ₂ ) linked to the second character, and a flag (flag) indicating how many of the first and second characters are stored. When the input character and the first and second characters are different from each other in matching and matching one input character and the two first and second characters which are a plurality of candidate characters, the following is performed based on the address (next ₂ ). The data element is read and matched for matching. If the input character matches the first character, the data element for the next input character is read based on the address (first ₁ ), and the longest match search process is continued. 2. The dictionary search method according to claim 1, wherein when the character and the second character match, the data element for the next input character is read based on the address (first ₂ ) and the longest match search process is continued. .