JP2535655B2

JP2535655B2 - Dictionary search method

Info

Publication number: JP2535655B2
Application number: JP2213990A
Authority: JP
Inventors: 茂吉田; 佳之岡田; 泰彦中野; 広隆千葉
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1990-08-13
Filing date: 1990-08-13
Publication date: 1996-09-18
Anticipated expiration: 2011-09-18
Also published as: JPH0496868A

Description

【発明の詳細な説明】〔目的〕概要産業上の利用分野従来の技術発明が解決しようとする課題課題を解決するための手段作用実施例発明の効果〔概要〕例えば増分分解型のZiv−Lempel符号化の際などに用
いられる辞書検索方式に関し、高速に辞書を検索することを目的とし、入力された参照番号と文字とで表される文字列を検索
する辞書検索方式において、辞書は、候補要素を拡張文
字に基づいて複数のグループに分割し、参照番号と拡張
文字に関する情報とに対応して、各グループに属する候
補要素の参照番号とこの候補要素に対応する識別情報と
を格納する索引と、参照番号に対応して、該当する候補
要素と同じグループに属する他の候補要素のいずれかの
参照番号とこの候補要素に対応する識別情報とを格納す
るリストとを有し、最初は、入力された参照番号と入力
文字に関する情報とに対応して索引から、以後は、辞書
によって出力された参照番号に対応してリストから参照
番号と識別情報との出力を辞書に対して指示する読出手
段と、辞書から読み出される識別情報に基づいて、入力
文字が拡張文字として付加されている候補要素を検出す
る検出手段と、辞書から読み出された参照番号に基づい
て、読出済でない候補要素があるか否かを判定する判定
手段とを備え、読出手段と検出手段と判定手段とが独立
して動作するように構成する。DETAILED DESCRIPTION OF THE INVENTION [Purpose] Outline Industrial field of application Conventional technology Problems to be solved by the invention Means for solving the problem Action Example Effect of the invention [Summary] For example, incremental decomposition type Ziv-Lempel Regarding the dictionary search method used at the time of encoding, for the purpose of searching the dictionary at high speed, in the dictionary search method for searching the character string represented by the input reference number and character, the dictionary is a candidate An index that divides the element into a plurality of groups based on the extension character and stores the reference number of the candidate element belonging to each group and the identification information corresponding to this candidate element, corresponding to the reference number and the information about the extension character. And a list storing the reference numbers of any of the other candidate elements belonging to the same group as the corresponding candidate element and the identification information corresponding to this candidate element, corresponding to the reference number. , At first, from the index corresponding to the input reference number and the information about the input character, and thereafter, from the list corresponding to the reference number output by the dictionary, output the reference number and the identification information to the dictionary. Based on the identification information read from the dictionary, the detection means for detecting the candidate element in which the input character is added as an extended character, and the reference number read from the dictionary. And a determination unit that determines whether or not there is a candidate element that does not exist, and the reading unit, the detection unit, and the determination unit operate independently.

[Industrial applications]

本発明は、ユニバーサル符号の一種である増分分解型
のZiv−Lempel符号化の際などに用いられる辞書検索方
式に関する。The present invention relates to a dictionary search method used in Ziv-Lempel encoding of an incremental decomposition type which is a kind of universal code.

近年、文字コード，ベクトル情報，画像情報など様々
な種類のデータがコンピュータによって扱われるように
なっており、また、扱われるデータ量も急速に増大して
いる。In recent years, various types of data such as character codes, vector information, and image information have been handled by computers, and the amount of data handled has been increasing rapidly.

このような膨大なデータを蓄積したり伝送したりする
際には、データの中に含まれている冗長な部分を省いて
データ量を圧縮することが望ましい。このため、データ
の種類にかかわらず、効率よくデータを圧縮する方法が
望まれている。When storing or transmitting such a huge amount of data, it is desirable to omit the redundant part contained in the data and compress the data amount. Therefore, there is a demand for a method of efficiently compressing data regardless of the type of data.

ユニバサール符号化方式は、予め符号表を定めておく
必要がないため、上述した様々なデータの圧縮に適用す
ることができるという特徴を有している。The universal coding method has a feature that it can be applied to the compression of various data described above, because it is not necessary to define a code table in advance.

ここで、本明細書においては、データの１語単位を
『文字』と称し、連続した複数語のデータを『文字列』
と称する。Here, in the present specification, a word unit of data is referred to as a “character”, and data of a plurality of consecutive words is referred to as a “character string”.
Called.

ジブ−レンペル（Ziv−lempel）符号は、上述したユ
ニバーサル符号の代表的な方法であり（宗像著「Ziv−L
empelのデータ圧縮法」，情報処理,Vol.26,No.1,1985参
照）、ユニバーサル型のアルゴリズムと増分分解型のア
ルゴリズムとが提案されている。更に、ユニバーサル型
アルゴリズムの改良として、LZSS符号（T.C.Bell,“Bet
ter OPM/L Text Compression",IEEE Trans.on Commun.,
Vol.COM−34,No.12,Dec.1986参照）があり、増分分解型
アルコリズムの改良として、LZW符号（T.A.Welch,“A T
echnique for High−Performance Data Compression",C
omputer,June 1984）がある。The Ziv-lempel code is a typical method of the above-mentioned universal code (see Munakata "Ziv-Lempel").
Data compression method of empel ", Information Processing, Vol.26, No.1,1985), universal type algorithm and incremental decomposition type algorithm are proposed. Furthermore, as an improvement of the universal type algorithm, LZSS code (TCBell, “Bet
ter OPM / L Text Compression ", IEEE Trans.on Commun.,
Vol.COM-34, No. 12, Dec. 1986), and as an improvement of the incremental decomposition type algorithm, LZW code (TAWelch, “AT
echnique for High-Performance Data Compression ", C
omputer, June 1984).

これらの符号化方式のうち、高速処理が可能であるこ
ととアルゴリズムが簡単であることから、LZW符号が記
憶装置のファイル圧縮などで使われるようになってい
る。Among these encoding methods, the LZW code has come to be used for file compression of a storage device because of its high-speed processing and its simple algorithm.

[Conventional technology]

増分分解型アルゴリズムは、入力された文字列を、既
に辞書に登録された部分列に１文字を増分として付加し
て形成される成分の系列に分解し、この成分の系列を登
録された部分列に対応する参照番号と増分とで表すこと
により、入力文字列を符号化するものである。また、上
述した成分は、新しい部分列として辞書に登録され、以
降の符号化処理に用いられるようになっている。The incremental decomposition type algorithm decomposes an input character string into a series of components formed by adding one character as an increment to a subsequence already registered in the dictionary, and this sequence of components is registered as the registered subsequence. The input character string is encoded by representing it by the reference number corresponding to and the increment. In addition, the above-mentioned components are registered in the dictionary as new subsequences, and are used in the subsequent encoding processing.

更に、LZW符号においては、上述した増分を次の部分
列に組み込むようになっている。Further, in the LZW code, the increment described above is incorporated in the next subsequence.

以下、簡単のために、入力文字列として、“a",“b",
“c"の３文字からなる文字列“ababcbababaaaaa・・
・”（第４図（ａ）参照）が入力された場合について、
このLZW符号化方式について説明する。Hereafter, for the sake of simplicity, as input string, "a", "b",
Character string consisting of three characters "c""ababcbababaaaaa ...
・ When "" (see Fig. 4 (a)) is input,
This LZW encoding method will be described.

この場合は、上述した３文字“a",“b",“c"に参照番
号『１』，『２』，『３』を与えて辞書に登録して、符
号化処理を開始する。In this case, the reference characters "1", "2", and "3" are given to the above-mentioned three characters "a", "b", and "c" to register them in the dictionary, and the encoding process is started.

まず、入力文字列の先頭の文字（例えば文字“a"）読
み込み、辞書からこの文字を検索し、この文字に対応す
る参照番号（例えば『１』）を着目している文字列に対
応する符号ωとする。First, read the first character (for example, the character "a") of the input character string, search for this character from the dictionary, and refer to the reference number (for example, "1") corresponding to this character as the code corresponding to the focused character string. Let ω.

その後、入力文字列の２番目以降の各文字を順次に読
み込んで、この文字を上述した増分に相当する拡張文字
Ｋとし、上述した符号ωとこの拡張文字Ｋとの組合せ
（ωＫ）で表される部分列（ωＫ）（以下、組合せ（ω
Ｋ）を部分列の表現と称する）を辞書から検索する。該
当する部分列（ωＫ）が検索された場合は、上述した部
分列（ωＫ）に対応する参照番号を新しい符号ωとし
て、更に、入力文字列の次の文字を読み込んで、上述し
た処理を繰り返す。After that, the second and subsequent characters of the input character string are sequentially read, and this character is designated as the extended character K corresponding to the increment described above, and is represented by the combination (ωK) of the above-mentioned code ω and this extended character K. Subsequence (ωK) (hereinafter, combination (ω
K) is referred to as a subsequence expression) is searched from the dictionary. When the corresponding subsequence (ωK) is retrieved, the reference number corresponding to the above-mentioned subsequence (ωK) is set as a new code ω, the next character of the input character string is read, and the above-described processing is repeated. .

このようにして、符号化しようとする文字列を順次に
１文字ずつ延ばしていき、辞書からこの文字列を順次に
検索することにより、辞書に登録された部分列の中か
ら、入力文字列の注目している部分と最も長く一致する
部分列が検索され、この部分列に対応する参照番号が、
該当する符号ωとして出力される。また、このとき、参
照番号ωに対応する部分列（ω）に拡張文字Ｋを継ぎ足
した部分列が、参照番号ωと拡張文字Ｋとの組合せ（ω
Ｋ）で表され、参照番号が与えられ、新しい部分列とし
て辞書に登録される。In this way, the character string to be encoded is sequentially extended one character at a time, and this character string is searched sequentially from the dictionary, so that the input character string of the input character string is selected from the substrings registered in the dictionary. The substring that matches the part of interest the longest is searched, and the reference number corresponding to this substring is
It is output as the corresponding code ω. Further, at this time, the partial sequence in which the extended character K is added to the partial sequence (ω) corresponding to the reference number ω is a combination of the reference number ω and the extended character K (ω
K), given a reference number and registered in the dictionary as a new subsequence.

このようにして、第４図（ａ）に示した文字列は、図
において下線を付して示した部分列に分解され、第４図
（ｂ）に示すように、各部分列に対応する符号『１』，
『２』，『４』，…が出力される。また、第４図（ｃ）
に入力文字列と辞書に登録された部分列との対応関係
を、第１表に作成された辞書の例を示す。In this way, the character string shown in FIG. 4 (a) is decomposed into underlined partial strings in the figure, and corresponds to each partial string as shown in FIG. 4 (b). Code "1",
“2”, “4”, ... Are output. Also, FIG. 4 (c)
Table 1 shows an example of the dictionary created in Table 1 for the correspondence relationship between the input character string and the partial string registered in the dictionary.

また、上述したLZW符号化処理の際に作成された辞書
は、第５図に示すように、本構造を有しており、辞書の
要素のそれぞれは、辞書の木の各節点に対応している。
第５図において、各節点に括弧を付して示した数字は、
対応する辞書の要素の参照番号を示している。 The dictionary created during the above-described LZW encoding process has this structure as shown in FIG. 5, and each element of the dictionary corresponds to each node of the dictionary tree. There is.
In Fig. 5, the numbers shown in brackets at each node are
The reference number of the corresponding dictionary element is shown.

ここで、上述した符号化処理における部分列の検索の
際に、辞書に登録された要素を順次に検索するのでは、
処理に要する時間が長くなるので、辞書の検索処理にハ
ッシュ法を適用して高速化を図っている。Here, when searching for subsequences in the above-described encoding process, the elements registered in the dictionary are searched sequentially,
Since the processing time becomes long, the hash method is applied to the dictionary search process to speed up the process.

ハッシュ法は、文字列からなる集合Ｓの要素ｘからこ
の要素ｘの格納場所のアドレスを求める関数（ハッシュ
関数）を定義し、このハッシュ関数によって求められた
アドレスに要素ｘを格納するようにしたものである。ま
た、上述したハッシュ関数によって求められるアドレス
をハッシュアドレスと称する。In the hash method, a function (hash function) for finding the address of the storage location of this element x from the element x of the set S consisting of character strings is defined, and the element x is stored at the address found by this hash function. It is a thing. The address obtained by the above-mentioned hash function is called a hash address.

例えば、上述した参照番号ωと拡張文字Ｋとを２進数
で表し、これを組合せ（ωＫ）のハッシュアドレスとす
ればよい。しかしながら、この場合は、辞書の膨大な容
量を割り当てる必要がある。For example, the reference number ω and the extended character K described above are represented by a binary number, and this may be a hash address of a combination (ωK). However, in this case, it is necessary to allocate a huge capacity of the dictionary.

このため、ハッシュアドレスごとに、同じハッシュア
ドレスを有する要素を格納するリストを設けるようにし
た外部ハッシュ法が用いられている。この外部ハッシュ
法においては、第６図に示すように、索引部をハッシュ
アドレスで検索することにより、該当するリストが示さ
れるようになっている。また、各リストには、各要素に
対応する識別情報と次の要素の格納場所を示すポインタ
とが格納されている。このようにして、索引とリストと
によって、同じハッシュアドレスを有する要素が連結さ
れており、順次に探索できるようになっている。Therefore, an external hash method is used in which a list storing elements having the same hash address is provided for each hash address. In the external hash method, as shown in FIG. 6, a corresponding list is shown by searching the index section with a hash address. Further, each list stores identification information corresponding to each element and a pointer indicating the storage location of the next element. In this way, the elements having the same hash address are linked by the index and the list so that they can be searched sequentially.

例えば、上述した参照番号ωをハッシュアドレスと
し、このハッシュアドレスに、参照番号ωに対応する部
分列に１文字を付加した部分列を格納するリストの先頭
アドレスを格納する。また、該当するリストに、上述し
た参照番号ωに対応する節点の『子』に相当する節点に
対応する部分列を順次に格納すればよい。このようにし
て、参照番号ωと１文字Ｋとの組合せで表される候補要
素の連結関係を示せばよい。また、この場合は、各要素
の拡張文字Ｋを対応する識別情報としてリストに格納す
ればよい。For example, the reference number ω described above is used as a hash address, and the top address of the list that stores a partial sequence in which one character is added to the partial sequence corresponding to the reference number ω is stored in this hash address. Further, the partial list corresponding to the node corresponding to the “child” of the node corresponding to the above-mentioned reference number ω may be sequentially stored in the corresponding list. In this way, the connection relationship between the candidate elements represented by the combination of the reference number ω and the one letter K may be shown. Further, in this case, the extended character K of each element may be stored in the list as corresponding identification information.

第７図に、辞書の検索に外部ハッシュ法を用いた場合
の符号化動作を表す流れ図を示す。FIG. 7 is a flowchart showing the encoding operation when the external hash method is used for searching the dictionary.

上述したように、入力文字列の少なくとも最初の文字
を含むように辞書を初期化し、変数ｎに次に登録される
部分列に与えられる参照番号をセットする。例えば、文
字“a",“b",“c"に与えた参照番号『１』，『２』，
『３』をハッシュアドレスとして辞書に格納し、変数ｎ
に数値『４』をセットすればよい。ここで、辞書に登録
できる部分列の最大数をN_maxとし、それぞれN_max個の成
分からなる配列First,配列Next,配列Extを定義し、これ
らの配列の全ての成分に初期値『０』を設定する。この
配設Firstは、第６図に示した索引部に相当し、配列Nex
tおよび配列Extはリストに相当している。従って、配列
Firstのｉ番目の成分First［ｉ］には、参照番号ｉの節
点に対応するリストの先頭となる配列Nextの成分を示す
番号が設定される。また、配列Extのｉ番目の成分Ext
［ｉ］には、参照番号ｉで示される辞書の要素の拡張文
字Ｋが設定される。また、配列Nextのｉ番目の成分Next
［ｉ］には、参照番号ｉの要素の『兄弟』に相当する要
素を示すポインタが設定される。As described above, the dictionary is initialized to include at least the first character of the input character string, and the variable n is set to the reference number given to the substring registered next. For example, the reference numbers "1", "2" assigned to the characters "a", "b", "c",
"3" is stored as a hash address in the dictionary, and the variable n
Set the numerical value "4" to. Here, let N _max be the maximum number of subsequences that can be registered in the dictionary, define an array First, an array Next, and an array Ext that are each composed of N _max elements, and set the initial value “0” for all the elements of these arrays. To set. This disposition First corresponds to the index part shown in FIG. 6, and the array Nex
The t and the array Ext correspond to a list. Therefore, the array
In the i-th component First [i] of First, a number indicating the component of the array Next that is the head of the list corresponding to the node with the reference number i is set. Also, the i-th component Ext of the array Ext
The extended character K of the element of the dictionary indicated by the reference number i is set in [i]. Also, the i-th component Next of the array Next
In [i], a pointer indicating an element corresponding to the “sibling” of the element with the reference number i is set.

次に、最初の文字Ｋを読み込んで、この文字Ｋに対応
する参照番号を変数ｉに設定して、符号化処理を開始す
る。Next, the first character K is read, the reference number corresponding to this character K is set in the variable i, and the encoding process is started.

まず、拡張文字Ｋとして、入力文字列の次の文字を読
み込んで（ステップ701）、その次に読み込むべき文字
があれば、ステップ702における肯定判定となり、辞書
の検索処理を開始する。First, as the extended character K, the next character of the input character string is read (step 701), and if there is a character to be read next, an affirmative decision is made in step 702, and dictionary search processing is started.

この場合は、変数ｉを別の変数ωに退避し、変ｊに初
期値『０』を設定してから（ステップ703）、変数ｉに
対応する成分First［ｉ］の値で示される配列Nextの成
分の番号を、変数ｉに設定する（ステップ704）。In this case, the variable i is saved to another variable ω, the initial value “0” is set to the variable j (step 703), and then the array Next indicated by the value of the component First [i] corresponding to the variable i The number of the component of is set to the variable i (step 704).

ステップ705において、変数ｉが数値『０』でないと
判定された場合（否定判定）は、該当するリストに格納
された要素を候補要素として、このリストにおける探索
処理を開始する。When it is determined in step 705 that the variable i is not the numerical value “0” (negative determination), the element stored in the corresponding list is set as a candidate element and the search process in this list is started.

この場合は、該当する候補要素の拡張文字を示す成分
Ext［ｉ］と拡張文字Ｋとを比較し（ステップ706）、こ
のステップ706における否定判定の場合は、ステップ707
において、成分Next［ｉ］に設定された次の候補要素の
ポインタを新しい変数ｉとして、ステップ705に戻る。
このようにして、ステップ705〜ステップ707を繰り返し
て、該当するリストを探索する。In this case, the component indicating the extended character of the corresponding candidate element
Ext [i] is compared with the extended character K (step 706). In the case of negative determination in step 706, step 707
In, the pointer of the next candidate element set in the component Next [i] is set as a new variable i, and the process returns to step 705.
In this way, steps 705 to 707 are repeated to search for the corresponding list.

ステップ706において、成分Ext［ｉ］＝Ｋとなった場
合（肯定判定）は、入力された文字列と一致する部分列
が辞書に登録されていると判断し、ステップ701に戻っ
て次の文字を読み込み、この文字を付加した文字列の符
号化を行う。In step 706, when the component Ext [i] = K (affirmative determination), it is determined that the partial string that matches the input character string is registered in the dictionary, and the process returns to step 701 and the next character is returned. Is read and the character string with this character added is encoded.

一方、変数ｉに対応する成分First［ｉ］あるいは成
分Next［ｉ］の値が『０』であった場合は、ステップ70
5における肯定判定となる。この場合は、参照番号ｉの
部分列に連結する他の候補要素が辞書に登録されていな
いことを示している。この場合は、該当する部分列は辞
書に登録されていないと判断し、ステップ708以下の処
理を行う。On the other hand, when the value of the component First [i] or the component Next [i] corresponding to the variable i is “0”, step 70
It becomes affirmative judgment in 5. In this case, it indicates that other candidate elements linked to the subsequence of reference number i are not registered in the dictionary. In this case, it is determined that the corresponding substring is not registered in the dictionary, and the processing from step 708 onward is performed.

ここで、上述したように、辞書から該当する部分列が
検索されるごとに、ステップ703において検索された部
分列に対応する参照番号が変数ωに退避されている。従
って、この変数ωに退避された参照番号は、入力された
文字列に最も長く一致する登録された部分列を示してお
り、この参照番号ωに対応する符号を出力して（ステッ
プ708）、新しい部分列の登録処理を行う。Here, as described above, every time the corresponding partial string is searched from the dictionary, the reference number corresponding to the partial string searched in step 703 is saved in the variable ω. Therefore, the reference number saved in this variable ω indicates the registered substring that matches the input character string the longest, and the code corresponding to this reference number ω is output (step 708), Register new substring.

まず、変数ｎの値を変数ｉに設定するとともに、変数
ｎをインクリメントし、また、変数ｉに対応する成分Ex
t［ｉ］に拡張文字Ｋを設定する（ステップ709）。First, the value of the variable n is set to the variable i, the variable n is incremented, and the component Ex corresponding to the variable i is set.
The extended character K is set in t [i] (step 709).

次に、変数ｊの値が『０』であるか否かを判定し（ス
テップ710）、肯定判定の場合は、成分First［ω］に変
数ｉを設定して（ステップ711）、参照番号ωに対応す
るリストを定義する。一方、ステップ710における否定
判定の場合は、成分Next［ｊ］に変数ｉを設定して（ス
テップ712）、該当するリストに新しい『兄弟』を付け
加える。Next, it is determined whether or not the value of the variable j is "0" (step 710). If the determination is affirmative, the variable i is set in the component First [ω] (step 711), and the reference number ω Defines a list corresponding to. On the other hand, in the case of a negative determination in step 710, the variable i is set in the component Next [j] (step 712) and a new “sibling” is added to the corresponding list.

このようにして、登録処理が終了した後に、拡張文字
Ｋに対応する参照番号を変数ｉとして（ステップ71
3）、ステップ701に戻って上述した処理を繰り返し、読
み込むべき文字がなくなったときに、ステップ702にお
ける否定判定となり、そのときの変数ωを対応する符号
を出力して（ステップ714）処理を終了する。Thus, after the registration process is completed, the reference number corresponding to the extended character K is set as the variable i (step 71
3) Return to step 701 and repeat the above-mentioned processing. When there are no more characters to be read, a negative judgment is made in step 702, the variable ω at that time is output as a corresponding code (step 714), and the processing ends. To do.

[Problems to be Solved by the Invention]

ところで、上述した従来方式にあっては、リストの検
索処理において、連結された候補要素があるか否かを判
定する連結判定処理、入力された拡張文字と一致する候
補文字を検出する一致検出処理、次のポインタを設定し
て辞書から読み出す読出処理、の３つの処理が順次に行
われている。このように、ソフトウェアで順次にリスト
を手繰る処理を行うと、候補要素の数が多い場合には、
特に、部分列の検索処理に時間がかかる。このため、符
号化処理速度は数10KB/s程度となり、磁気テープ装置や
磁気ディスク装置などへの転送速度（数100KB/s〜数MB/
s）に合わせて実時間で符号化処理を行うことができな
いという問題点があった。By the way, in the above-described conventional method, in the list search process, a concatenation determination process for determining whether or not there is a concatenated candidate element, a match detection process for detecting a candidate character that matches the input extended character. , And a reading process for setting the next pointer and reading from the dictionary are sequentially performed. In this way, if the list is sequentially manipulated by software and the number of candidate elements is large,
In particular, it takes a long time to perform a search process for a subsequence. Therefore, the encoding processing speed is about several tens of KB / s, and the transfer speed (several hundred KB / s to several MB / s) to a magnetic tape device or magnetic disk device.
There was a problem that the encoding process could not be performed in real time according to s).

一方、上述した符号化処理の各ステップをそれぞれ独
立な素子を用いてデータ圧縮装置を構成すれば、符号化
処理の高速化を図ることが可能であるが、回路規模が大
きくなり、コストが高くなるという欠点がある。On the other hand, if the data compression apparatus is configured by using each independent element for each step of the above-described encoding processing, the encoding processing can be speeded up, but the circuit scale becomes large and the cost is high. There is a drawback that

ここで、上述した従来例においては、簡単のために３
文字からなる文字列を符号化する場合について説明した
が、実際の文字列は多くの文字から構成されている。従
って、通常、辞書の検索処理においては、ある参照番号
に対応するリストを手繰って、『兄弟』に相当する候補
要素を順次に読み出して、一致する要素を検出する処理
および連結する要素の有無を検出する処理に最も長い時
間を要している。Here, in the above-mentioned conventional example, for simplicity, 3
Although the case of encoding a character string made up of characters has been described, the actual character string is composed of many characters. Therefore, normally, in the dictionary search process, the list corresponding to a certain reference number is handed down, the candidate elements corresponding to "siblings" are sequentially read out, the matching element is detected, and the presence / absence of a connected element is detected. Takes the longest time to detect.

本発明は、このような点にかんがみて創作されたもの
であり、高速に辞書の検索を行うようにした辞書検索方
式を提供することを目的とする。The present invention was created in view of such a point, and an object of the present invention is to provide a dictionary search system that searches a dictionary at high speed.

[Means for solving the problem]

第１図は、本発明の原理ブロック図である。 FIG. 1 is a block diagram of the principle of the present invention.

（ｉ）請求項１の発明図において、それぞれに与えられた参照番号に対応し
て辞書110に登録された相異なる文字列の中から、入力
された参照番号と文字とで表される文字列を検索する辞
書検索方式における辞書110は、参照番号が与えられた
文字列のそれぞれに相異なる１文字を拡張文字として付
加して得られる候補要素を拡張地に基づいて複数のグル
ープに分割し、参照番号と拡張文字に関する情報とに対
応して、各グループに属する候補要素のいずれかに与え
られた参照番号とこの候補要素に対応する識別情報とを
格納する索引111と、参照番号に対応して、当該参照番
号が与えられた候補要素と同じグループに属する他の候
補要素のいずれかに与えられた参照番号とこの候補要素
に対応する識別情報とを格納するリスト112とを有して
いる。(I) Invention of Claim 1 In the drawing, a character string represented by an input reference number and a character from among different character strings registered in the dictionary 110 corresponding to the respective reference numbers The dictionary 110 in the dictionary search method for searching for is to divide a candidate element obtained by adding a different character as an expansion character to each of the character strings to which reference numbers are given, and divide the candidate element into a plurality of groups based on the expansion place, Corresponding to the reference number and the extended character, the index 111 for storing the reference number given to any of the candidate elements belonging to each group and the identification information corresponding to this candidate element, and the reference number And a list 112 that stores the reference number given to any of the other candidate elements belonging to the same group as the candidate element given the reference number and the identification information corresponding to this candidate element. .

読出手段121は、最初は、入力された参照番号と入力
文字に関する情報とに対応して索引111に格納された参
照番号と識別情報との出力を、以後は、辞書110によっ
て出力された参照番号に対応してリスト112に格納され
た参照番号と識別情報との出力を辞書110に対して指示
する。The reading means 121 first outputs the reference number and the identification information stored in the index 111 corresponding to the input reference number and the information regarding the input character, and thereafter, the reference number output by the dictionary 110. In response to, the dictionary 110 is instructed to output the reference number and the identification information stored in the list 112.

検出手段122は、辞書110から読み出される識別情報に
基づいて、入力文字が拡張文字として付加されている候
補要素を検出し、この検出結果を検索結果として出力す
る。The detection means 122 detects a candidate element in which the input character is added as an extended character based on the identification information read from the dictionary 110, and outputs the detection result as a search result.

判定手段123は、辞書110から読み出された参照番号に
基づいて、読出済でない候補要素があるか否かを判定
し、この判定結果を検索結果として出力する。The determination means 123 determines whether or not there is a candidate element that has not been read based on the reference number read from the dictionary 110, and outputs this determination result as a search result.

全体として、読出手段121と検出手段122と判定手段12
3とがそれぞれ独立に動作するように構成されている。As a whole, the reading means 121, the detecting means 122, and the judging means 12
3 and 3 are configured to operate independently.

（ii）請求項２の発明請求項２の発明における読出手段121は、一定の時間
間隔で辞書110に対する読出動作を起動し、請求項１の
発明による辞書検索方式において、検出手段122による
検出動作と判定手段123による判定動作とを読出手段121
による読出動作と並行して行うように構成されている。(Ii) Invention of Claim 2 The reading means 121 in the invention of claim 2 activates the reading operation for the dictionary 110 at fixed time intervals, and in the dictionary search system according to the invention of claim 1, the detecting operation by the detecting means 122. Read-out means 121
The read operation is performed in parallel with.

[Action]

（ｉ）請求項１の発明請求項１の発明において、参照番号が与えられた文字
列のそれぞれに相異なる１文字を拡張文字として付加し
て得られる候補要素は、上述した拡張文字に基づいて複
数のグループに分割されている。また、索引111には、
参照番号と拡張文字に関する情報とに対応して、上述し
た各グループに属する候補要素のいずれかに与えられた
参照番号とこの候補要素に対応する識別情報とが格納さ
れている。また、リスト112には、参照番号に対応し
て、この参照番号が与えられた候補要素と同じグループ
に属する他の候補要素のいずれかに与えられた参照番号
とこの候補要素に対応する識別情報とが格納されてい
る。(I) Invention of Claim 1 In the invention of Claim 1, the candidate element obtained by adding a different character to each of the character strings given reference numbers as an expansion character is based on the above-mentioned expansion character. It is divided into multiple groups. In addition, in the index 111,
Corresponding to the reference number and the information about the extended character, the reference number given to any of the above-mentioned candidate elements belonging to each group and the identification information corresponding to this candidate element are stored. Further, in the list 112, corresponding to the reference number, the reference number given to any of the other candidate elements belonging to the same group as the candidate element given this reference number, and the identification information corresponding to this candidate element. And are stored.

上述した索引111は、外部ハッシュ法の索引部に相当
しており、リスト112は、外部ハッシュ法のリストに相
当している。また、索引111とリスト112に格納されてい
る参照番号は、次の候補要素の格納場所を示すポインタ
を兼ねており、これにより、同じグループに属する候補
要素の連結関係が示されている。また、識別情報として
は、各文字列における拡張文字を格納すればよい。The index 111 described above corresponds to the index part of the external hash method, and the list 112 corresponds to the list of the external hash method. Further, the reference numbers stored in the index 111 and the list 112 also serve as a pointer indicating the storage location of the next candidate element, which indicates the connection relationship of the candidate elements belonging to the same group. Further, as the identification information, the extended character in each character string may be stored.

最初は、読出手段121により、辞書110に対して、入力
される参照番号と入力文字に関する情報とに対応して索
引111に格納された参照番号と識別情報との出力が指示
される。また、以後は、この読出手段121により、辞書1
10によって出力された参照番号に対応してリスト112に
格納された参照番号と識別情報との出力が指示される。Initially, the reading means 121 instructs the dictionary 110 to output the reference number and the identification information stored in the index 111 corresponding to the input reference number and the information about the input character. Further, thereafter, the reading means 121 causes the dictionary 1
The output of the reference number and the identification information stored in list 112 corresponding to the reference number output by 10 is instructed.

このようにして、索引111に続いて、リスト112から上
述したポインタによって連結された候補要素に対応する
識別情報（例えば拡張文字）が次々に読み出される。In this way, following the index 111, the identification information (for example, extended characters) corresponding to the candidate elements linked by the above-mentioned pointers is sequentially read from the list 112.

従って、検出手段122は、辞書110から識別情報として
読み出される拡張文字と入力された文字が一致したとき
に、該当する候補要素を検出した旨を検索結果として出
力すればよい。Therefore, the detection means 122 may output that the corresponding candidate element is detected as the search result when the extended character read as the identification information from the dictionary 110 and the input character match.

また、判定手段123は、辞書110から出力された参照番
号が辞書110に登録された文字列に対応していない場合
に、読み出されていない候補要素がないと判定し、この
判定結果を検索結果として出力すればよい。Further, when the reference number output from the dictionary 110 does not correspond to the character string registered in the dictionary 110, the determining unit 123 determines that there is no unread candidate element, and searches this determination result. You can output it as a result.

請求項１の発明においては、各参照番号に対応する候
補要素は、拡張文字に関する情報に基づいて複数のグル
ープに分割されており、入力された参照番号と入力文字
に関する情報とに対応するグループに属する候補要素の
中から、該当する文字列の検出が行われる。従って、読
出手段121が辞書110から候補要素を読み出す動作の回数
および検出手段122が一致検出動作を行う回数を削減す
ることができ、検索処理を高速に行うことが可能とな
る。また、読出手段121と検出手段123と判定手段124と
が、それぞれ独立に動作するので、従来方式のように、
前の処理の終了を待つ必要がなく、辞書の検索処理を高
速化することが可能となる。In the invention of claim 1, the candidate element corresponding to each reference number is divided into a plurality of groups based on the information regarding the extended character, and the group corresponding to the input reference number and the information regarding the input character is formed. The corresponding character string is detected from the candidate elements to which it belongs. Therefore, it is possible to reduce the number of times the reading unit 121 reads the candidate elements from the dictionary 110 and the number of times the detecting unit 122 performs the matching detection operation, and it is possible to perform the search process at high speed. Further, since the reading means 121, the detecting means 123, and the judging means 124 operate independently of each other, like the conventional method,
It is not necessary to wait for the end of the previous processing, and the dictionary search processing can be speeded up.

（ii）請求項２の発明請求項２の発明において、読出手段121は、所定の時
間間隔で辞書110に対する読出動作を行い、また、検出
手段122による検出動作と判定手段123による判定動作と
が、読出手段121による読出手段と並行して行われる。(Ii) Invention of Claim 2 In the invention of Claim 2, the reading means 121 performs a reading operation for the dictionary 110 at predetermined time intervals, and the detecting operation by the detecting means 122 and the judging operation by the judging means 123 are performed. , Is performed in parallel with the reading means by the reading means 121.

例えば、上述した読出手段121は、辞書110からの読出
動作に要する時間ごとに読出動作を行うようにすればよ
い。通常は、読出動作に要する時間に比べて、検出手段
122による検出動作および判定手段123による判定動作は
短い時間で終了すると考えられ、読出動作と検出動作お
よび判定動作とを並行して行うことにより、検索処理を
パイプライン化して処理することができる。For example, the reading unit 121 described above may perform the reading operation at each time required for the reading operation from the dictionary 110. Normally, the detection means is compared with the time required for the read operation.
It is considered that the detection operation by 122 and the determination operation by the determination means 123 are completed in a short time. By performing the reading operation and the detection operation and the determination operation in parallel, the search processing can be pipelined and processed.

従って、請求項２の発明にあっては、読出動作および
一致検出動作を行う回数を削減するとともに、読出動作
と一致検出動作および判定動作とをパイプライン化して
処理することにより、辞書110からの文字列の検索処理
を高速化することが可能となる。Therefore, according to the second aspect of the present invention, the number of times the read operation and the match detection operation are performed is reduced, and the read operation and the match detection operation and the determination operation are pipelined to be processed, so that the dictionary 110 It is possible to speed up the character string search processing.

〔Example〕

以下、図面に基づいて本発明の実施例について詳細に
説明する。Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

第２図は、本発明の一実施例による辞書検索方式を適
用したデータ圧縮装置の構成を示す。FIG. 2 shows the configuration of a data compression apparatus to which the dictionary search method according to an embodiment of the present invention is applied.

ここで、第１図と実施例との対応関係について説明し
ておく。Here, the correspondence between FIG. 1 and the embodiment will be described.

辞書110は、辞書230に相当する。 The dictionary 110 corresponds to the dictionary 230.

索引111は、索引部231に相当する。 The index 111 corresponds to the index unit 231.

リスト112は、リスト部232に相当する。 The list 112 corresponds to the list unit 232.

読出手段121は、候補要素保持部241とタイミング制御
回路244とフリップフロップ（FF）245aとに相当する。The reading unit 121 corresponds to the candidate element holding unit 241, the timing control circuit 244, and the flip-flop (FF) 245a.

検出手段122は、拡張文字レジスタ261と比較回路262
とに相当する。The detection means 122 includes an extended character register 261 and a comparison circuit 262.
Is equivalent to.

判定手段123は、否定論理和（NOR）回路243に相当す
る。The determination means 123 corresponds to the NOR circuit (NOR) circuit 243.

以上のような対応関係があるものとして、以下実施例
の構成および動作を説明する。Assuming that there is the above correspondence, the configuration and operation of the embodiment will be described below.

第２図において、201はマイクロプロセッサ（MPU）
を、202は入力ポートを、230は辞書を、240は辞書検索
回路を、205は出力ポートをそれぞれ示している。ま
た、上述したMPU201と、入力ポート202と辞書230と辞書
検索回路240と出力ポート205とは、バス206を介して相
互に接続されている。In FIG. 2, 201 is a microprocessor (MPU)
202 indicates an input port, 230 indicates a dictionary, 240 indicates a dictionary search circuit, and 205 indicates an output port. The MPU 201, the input port 202, the dictionary 230, the dictionary search circuit 240, and the output port 205 described above are connected to each other via a bus 206.

また、入力ポート202を介して入力された文字列は、M
PU201によりLZW符号化され、出力ポート205を介して、
磁気ディスク装置（図示せず）などに転送されて蓄積さ
れるようになっている。Also, the character string input via the input port 202 is M
LZW encoded by PU201, through output port 205,
The data is transferred to and accumulated in a magnetic disk device (not shown) or the like.

このとき、MPU21は、辞書検索回路240に対して、入力
された文字列の着目している部分（以下、着目部分列と
称する）を辞書230から検索を指示するように構成され
ている。At this time, the MPU 21 is configured to instruct the dictionary search circuit 240 to search, from the dictionary 230, the part of interest of the input character string (hereinafter referred to as the target partial string).

ここで、LZW符号化方式においては、部分列はそれま
でに検索された部分列に対応する参照番号ωと、この部
分列に拡張文字として付加される最後の１文字Ｋとで表
される。従って、辞書検索回路240は、上述した着目部
分列の拡張文字Ｋを除く先頭部分に相異なる１文字を拡
張文字として付加して得られる候補要素の中から、拡張
文字Ｋが付加された部分列を検索すればよい。Here, in the LZW encoding method, a subsequence is represented by a reference number ω corresponding to the subsequence searched up to that point, and the last one character K added as an extension character to this subsequence. Therefore, the dictionary search circuit 240 selects the substring to which the expansion character K is added from the candidate elements obtained by adding one different character as the expansion character to the beginning part of the focused substring except the expansion character K. You can search for.

このとき、着目部分列は、上述した拡張文字Ｋの最上
位ビットK_mと等しい最上位ビットを有する文字が拡張文
字として付加された候補要素からなるグループに属して
いる。従って、上述した候補要素をそれぞれにおける拡
張文字の最上位ビットに基づいて２つのグループに分割
し、最上位ビットK_mに対応するグループに属する候補要
素の探索を行えば、上述した着目部分列を検索すること
ができる。At this time, the subsequence of interest belongs to a group of candidate elements to which a character having the most significant bit equal to the most significant bit K _m of the above-mentioned extended character K is added as an extended character. Therefore, if the above-mentioned candidate element is divided into two groups based on the most significant bit of the extended character in each, and the candidate element belonging to the group corresponding to the most significant bit K _m is searched, You can search.

以下、候補要素を拡張文字の最上位ビットに基づい
て、２つのグループに分割して検索するようにした場合
の辞書230の構成を示す。The configuration of the dictionary 230 when the candidate elements are divided into two groups and searched based on the most significant bit of the extended character will be described below.

辞書230は、第３図に示すように、3N個の連続した格
納領域から構成されており、辞書230は、導入される読
出信号に応じて、読出アドレスで指定された格納領域の
内容を出力するように構成されている。As shown in FIG. 3, the dictionary 230 is composed of 3N consecutive storage areas, and the dictionary 230 outputs the content of the storage area designated by the read address according to the read signal introduced. Is configured to.

上述した3N個の格納領域のうち、アドレスの上位２ビ
ットが『00』である格納領域と『01』である格納領域と
は、上述した外部ハッシュ法の索引に相当する索引部23
1に割り当てられている。また、アドレスの上位２ビッ
トが『10』である格納領域は、外部ハッシュ法のリスト
の集まりに相当するリスト部232に割り当てられてい
る。また、索引部231およびリスト部232の各格納領域
は、それぞれのアドレスの上位２ビットを除く下位部分
を参照番号とする部分列に連結する候補要素の格納場所
を示すポインタを格納するポイタン部と上述した候補要
素に対応する識別情報を格納する識別情報部とから構成
されている。Of the 3N storage areas described above, the storage area in which the upper 2 bits of the address are “00” and the storage area in which the address is “01” are the index unit 23 corresponding to the index of the external hash method described above.
Assigned to 1. Further, the storage area in which the upper 2 bits of the address are “10” is assigned to the list unit 232 corresponding to the collection of the list of the external hash method. Further, each storage area of the index unit 231 and the list unit 232 is a pointer unit that stores a pointer indicating a storage location of a candidate element connected to a subsequence whose reference number is the lower portion excluding the upper 2 bits of each address. It is composed of an identification information section that stores identification information corresponding to the above-mentioned candidate elements.

ここで、索引部231のアドレスの上位２ビットが『0
0』の領域（以下、索引部231₀₀と称する）のポインタ部
には、アドレスの下位部分を参照番号とする部分列に最
上位ビットが『０』である１文字を付加して得られる部
分列の参照番号をポインタとして格納すればよい。ま
た、アドレスの上位２ビットが『０』の領域（以下、索
引部231₀₁と称する）のポインタ部には、アドレスの下
位部分を参照番号とする部分列に最上位ビットが『１』
である１文字を付加して得られる部分列の参照番号をポ
インタとして格納すればよい。このように、各参照番号
の部分列に１文字を拡張文字として付加して得られる候
補要素は、それぞれの拡張文字の最上位ビットに応じ
て、索引部231₀₀と索引部231₀₁とに分割されて格納され
る。Here, the upper 2 bits of the address of the index unit 231 are "0.
0 "of the region to the pointer unit (hereinafter, referred to as index 231 _00), the portion most significant bit subsequence of the reference numbers the lower part of the address obtained by adding one character is" 0 " The column reference number may be stored as a pointer. The area of the upper two bits of the address "0" to the pointer unit (hereinafter, referred to as index 231 _01), the most significant bit subsequence of the reference numbers the lower part of the address is "1"
The reference number of the partial sequence obtained by adding 1 character is stored as a pointer. In this way, the candidate element obtained by adding one character to the subsequence of each reference number as an extended character is divided into the index part 231 ₀₀ and the index part 231 ₀₁ according to the most significant bit of each extended character. Stored.

また、リスト部232の各領域のポインタ部には、対応
する参照番号で示される部分列と拡張文字を除く部分と
拡張文字の最上位ビットとが同じで、他の部分が異なる
部分列に与えられた参照番号をポインタとして格納すれ
ばよい。In addition, the pointer part of each area of the list part 232 is assigned to the substring indicated by the corresponding reference number, the part excluding the extension character and the most significant bit of the extension character, and the other parts assigned to different substrings. The reference number thus obtained may be stored as a pointer.

また、索引部231およびリスト部232の各格納領域の識
別情報部には、各候補要素における拡張文字を識別情報
として格納すればよい。Further, the extended character in each candidate element may be stored as identification information in the identification information part of each storage area of the index part 231 and the list part 232.

このようにして、索引部231とリスト部232とによっ
て、拡張文字の最上位ビットが等しい同じグループに属
する候補要素の相互間の連結関係が示される。例えば、
第３図においては、参照番号『１』の部分列に最上位ビ
ットが論理“0"である拡張文字を付加した候補要素（図
において、参照番号ω₁，ω₂で示す）の連結関係および
参照番号『２』の部分列に最上位ビットが論理“1"であ
る拡張文字を付加した候補要素（図において、参照番号
ω_i，ω_jで示す）の連結関係が示されている。In this way, the index unit 231 and the list unit 232 show the mutual connection relationship between the candidate elements belonging to the same group in which the most significant bits of the extended characters are the same. For example,
In FIG. 3, the connection relationship between candidate elements (indicated by reference numbers ω ₁ and ω ₂ in the figure) in which an expansion character whose most significant bit is logical “0” is added to the subsequence of reference number “1” and The connection relationship of candidate elements (indicated by reference numbers ω _i and ω _j in the figure) in which an extended character having the highest bit of logic “1” is added to the subsequence of reference number “2” is shown.

また、上述した索引部231およびリスト部232の各格納
領域の内容には、符号化処理を開始する際に、初期値
『０』が設定されるようになっている。Further, an initial value “0” is set in the contents of the storage areas of the index unit 231 and the list unit 232 described above when the encoding process is started.

辞書検索回路240は、辞書230から読み出された候補要
素に関する情報を保持する候補要素保持部241と、導入
される候補要素の中から拡張文字がMPU201から入力され
る拡張文字Ｋと一致する要素を検出する一致検出部242
と、否定論理和（NOR）回路243と、これらの各部の動作
のタイミングを制御するタイミング制御部244と、２つ
のフリップフロップ（FF）245a,245bとを備えて構成さ
れている。The dictionary search circuit 240 includes a candidate element holding unit 241 that holds information about candidate elements read from the dictionary 230, and an element whose extended character matches the extended character K input from the MPU 201 among the introduced candidate elements. Match detection unit 242 for detecting
A NOR circuit 243, a timing control unit 244 for controlling the timing of the operation of each of these units, and two flip-flops (FF) 245a and 245b.

上述した候補要素保持部241には、バス206を介して、
辞書230の索引部231およびリスト部232によって出力さ
れたデータが導入されている。また、この候補要素保持
部241は、２つのアドレスレジスタ251,252と、候補文字
レジスタ253とから構成されている。In the candidate element holding unit 241 described above, via the bus 206,
The data output by the index unit 231 and the list unit 232 of the dictionary 230 is introduced. The candidate element holding unit 241 is composed of two address registers 251, 252 and a candidate character register 253.

バス206を介して導入されるデータのうち、ポインタ
部はアドレスレジスタ251に、識別情報部は候補文字レ
ジスタ253に導入されている。また、アドレスレジスタ2
51の出力の最上位ビットを除く部分は、アドレスレジス
タ252に導入され、一旦保持されるようになっている。Of the data introduced via the bus 206, the pointer portion is introduced into the address register 251, and the identification information portion is introduced into the candidate character register 253. Also, address register 2
The part of the output of 51 other than the most significant bit is introduced into the address register 252 and is held once.

これらのレジスタは、タイミング制御回路244から導
入されるロード信号に応じて、導入されたデータを格納
するように構成されている。These registers are configured to store the introduced data in response to the load signal introduced from the timing control circuit 244.

一致検出部242は、入力される拡張文字Ｋを格納する
拡張文字レジスタ261と、この拡張文字レジスタ261に格
納された拡張文字Ｋと上述した候補文字レジスタ253に
格納された候補文字とを比較する比較回路262とから構
成されている。また、この比較回路262は、上述した拡
張文字Ｋと候補文字とが一致したときに論理“1"を出力
するように構成されており、この比較回路262の出力
は、一致検出信号としてMPU201に導入されている。The match detection unit 242 compares the extended character register 261 that stores the input extended character K with the extended character K stored in the extended character register 261 and the candidate character stored in the candidate character register 253 described above. It is composed of a comparison circuit 262. Further, the comparison circuit 262 is configured to output a logical “1” when the above-mentioned extended character K and the candidate character match, and the output of this comparison circuit 262 is sent to the MPU 201 as a match detection signal. Has been introduced.

また、NOR回路243には、上述したアドレスレジスタ25
1の出力が導入されており、このNOR回路243の出力は、M
PU201に導入されている。Further, the NOR circuit 243 includes the address register 25 described above.
The output of 1 is introduced, and the output of this NOR circuit 243 is M
Introduced in PU201.

タイミング制御回路244は、MPU201からの指示に応じ
て、FF245aを制御するとともに読出信号とロード信号を
生成して出力するように構成されている。この読出信号
は、上述した辞書230に入力されており、また、ロード
信号は、候補要素保持部241のアドレスレジスタ251,252
と候補文字レジスタ253とに入力されている。The timing control circuit 244 is configured to control the FF 245a and generate and output a read signal and a load signal in response to an instruction from the MPU 201. This read signal is input to the dictionary 230 described above, and the load signal is the address register 251, 252 of the candidate element holding unit 241.
Is input to the candidate character register 253.

上述したFF245aの出力は、FF245bに導入されるととも
に、読出アドレスの最上位ビットとして、辞書230に供
給されている。また、上述した候補要素保持部241のア
ドレスレジスタ251の出力は、読出アドレスの下位部分
として辞書230の供給されている。The output of the FF 245a described above is introduced to the FF 245b and is also supplied to the dictionary 230 as the most significant bit of the read address. The output of the address register 251 of the candidate element holding unit 241 described above is supplied to the dictionary 230 as a lower part of the read address.

以下、辞書検索回路240による辞書230の検索動作につ
いて説明する。Hereinafter, a search operation of the dictionary 230 by the dictionary search circuit 240 will be described.

例えば、第４図（ａ）に示した文字列の符号化処理を
行う場合は、まず、最初の文字“a"を読み込んで、この
文字“a"を着目部分列の先頭部分とし、これに対応する
参照番号（例えば『１』）を求める。その後、次の文字
“b"を読み込んで拡張文字Ｋとし、この文字“b"の最上
位ビットを上述した参照番号に付加したものをハッシュ
アドレスとして、アドレスレジスタ251に入力する。ま
た、上述した文字“b"を拡張文字Ｋとして拡張文字レジ
スタ261に入力し、辞書検索回路240のタイミング制御回
路246に検索動作の開始を指示すればよい。For example, when performing the encoding processing of the character string shown in FIG. 4 (a), first, the first character "a" is read, and this character "a" is set as the head part of the target subsequence Find the corresponding reference number (eg "1"). After that, the next character “b” is read as an extended character K, and the result obtained by adding the most significant bit of this character “b” to the above-mentioned reference number is input to the address register 251 as a hash address. Further, the above-mentioned character “b” may be input as the expanded character K into the expanded character register 261 and the timing control circuit 246 of the dictionary search circuit 240 may be instructed to start the search operation.

上述した検索開始指示に応じて、タイミング制御回路
244は、まず、FF245aの出力を論理“0"にリセットし、
その後、辞書230に読出信号を導入する。これにより、
読出アドレスの最上位ビットとして論理“0"が導入さ
れ、辞書230の索引部231が選択される。また、上述した
アドレスレジスタ251の内容の最上位ビットに応じて、
索引部231₀₀と索引部231₀₁とのいずれかが選択され、選
択された索引部231の参照番号ωに対応する格納領域の
データが、バス206を介して辞書検索回路240に導入され
る。このようにして、索引部231から最初の候補要素の
拡張文字と、この候補要素に連結している他の候補要素
を示すポインタとが読み出されて、辞書検索回路240の
候補文字レジスタ253とアドレスレジスタ251とにそれぞ
れ導入される。In response to the search start instruction described above, the timing control circuit
The 244 first resets the output of the FF 245a to a logical "0",
Then, the read signal is introduced into the dictionary 230. This allows
A logical "0" is introduced as the most significant bit of the read address, and the index part 231 of the dictionary 230 is selected. In addition, according to the most significant bit of the contents of the address register 251 described above,
Either the index portion 231 ₀₀ and index 231 ₀₁ is selected, the data of the storage area corresponding to the reference number ω of the index portion 231 which is selected is introduced to the dictionary retrieval circuit 240 via the bus 206. In this way, the extended character of the first candidate element and the pointer indicating another candidate element connected to this candidate element are read from the index unit 231 and the candidate character register 253 of the dictionary search circuit 240 is read. Address registers 251 and respectively.

タイミング制御回路244は、上述した読出信号を出力
してから辞書230からのデータの読出動作に要する時間
（読出サイクル時間）τだけ経過した後に、候補要素保
持部241のアドレスレジスタ251,252と候補文字レジスタ
253とにロード信号を導入する。The timing control circuit 244 outputs the above-mentioned read signal, and after the time (read cycle time) τ required for the data read operation from the dictionary 230 elapses, the address register 251, 252 and the candidate character register of the candidate element holding unit 241.
Introduce load signal to 253 and.

このロード信号に応じて、アドレスレジスタ251によ
り、索引部231の該当する格納領域のポインタ部から読
み出されたポインタが格納される。また、このとき、タ
イミング制御回路244は、FF245aに論理“1"をセットす
る。従って、以後は、読出アドレスの最上位ビットが論
理“1"となり、辞書230のリスト部232が選択され、上述
したポインタに対応するリスト部232の格納領域から、
上述した最初の候補要素に連結された次の候補要素が読
み出される。In response to this load signal, the address register 251 stores the pointer read from the pointer section of the corresponding storage area of the index section 231. Further, at this time, the timing control circuit 244 sets the logic “1” in the FF 245a. Therefore, thereafter, the most significant bit of the read address becomes a logical “1”, the list section 232 of the dictionary 230 is selected, and the storage area of the list section 232 corresponding to the above-mentioned pointer is changed to
The next candidate element connected to the first candidate element described above is read.

以後、タイミング制御回路244は、MPU201からの指示
がない限り、読出信号を出力し、その後上述した読出サ
イクル時間τの経過後にロード信号を出力する動作を繰
り返す。このようにして、ポインタによって連結された
候補要素が、リスト部232から順次に読み出される。ま
た、アドレスレジスタ252には、直前にポインタとして
読み出された参照番号が保持され、FF245bには、直前の
読出動作の際に指定された読出アドレスの最上位ビット
が保持される。After that, the timing control circuit 244 repeats the operation of outputting the read signal and then outputting the load signal after the lapse of the read cycle time τ described above, unless otherwise instructed by the MPU 201. In this way, the candidate elements connected by the pointer are sequentially read from the list unit 232. Further, the address register 252 holds the reference number read as a pointer immediately before, and the FF 245b holds the most significant bit of the read address designated in the immediately previous read operation.

上述したようにして、拡張文字Ｋの最上位ビットK_mに
対応するグループに属する候補要素が、読出サイクル時
間τごとに順次に読み出される。As described above, the candidate elements belonging to the group corresponding to the most significant bit K _m of the extended character K are sequentially read out every read cycle time τ.

ここで、上述した一致検出部242は、候補要素保持部2
41とは独立に動作する。従って、上述した読出動作と並
行して、一致検出部242の比較回路262により、その前に
読み出されて候補文字レジスタ253に格納された文字
と、拡張文字Ｋとの比較が行われている。Here, the above-described match detection unit 242 is configured as the candidate element holding unit 2
Operates independently of 41. Therefore, in parallel with the above-described read operation, the comparison circuit 262 of the match detection unit 242 compares the character read previously and stored in the candidate character register 253 with the extended character K. .

上述したように、この比較回路262は、候補文字と拡
張文字とが一致したときに、一致検出結果として論理
“1"を出力してMPU201に入力するようになっている。従
って、MPU201は、一致検出結果として論理“1"が導入さ
れたときに、該当する部分列が検索されたと判断して、
以下に述べる割り込み処理を行えばよい。As described above, the comparison circuit 262 outputs the logic "1" as the match detection result and inputs it to the MPU 201 when the candidate character and the extended character match. Therefore, when the logical "1" is introduced as the match detection result, the MPU 201 determines that the corresponding subsequence has been searched,
The interrupt processing described below may be performed.

この場合は、MPU201は、まず、タイミング制御回路24
4に読出動作の中止を指示するとともに、アドレスレジ
スタ252に保持されている参照番号を読み込む。ここ
で、この参照番号は、直前に読み出された部分列、つま
り、一致検出部242によって着目部分列と一致するとさ
れた部分列に対応している。この場合は、MPU201は、入
力文字列の次の１文字を新しい拡張文字Ｋとして読み込
んで、拡張文字レジスタ261に入力するとともに、この
拡張文字Ｋの最上位ビットK_mと上述した参照番号とをア
ドレスレジスタ251に入力し、辞書検索回路240に該当す
る部分列の検索処理の開始を指示する。In this case, the MPU 201 first sets the timing control circuit 24
Instruct 4 to stop the read operation and read the reference number held in the address register 252. Here, this reference number corresponds to the partial string read out immediately before, that is, the partial string determined by the match detection unit 242 to match the target partial string. In this case, the MPU 201 reads the next one character of the input character string as a new extended character K, inputs it to the extended character register 261, and stores the most significant bit K _m of this extended character K and the above-mentioned reference number. The address is input to the address register 251, and the dictionary search circuit 240 is instructed to start the search processing of the corresponding partial sequence.

このようにして、着目部分列に一致する部分列が検索
されるごとに、検索された部分列に次の１文字が拡張文
字Ｋとして付加されて着目部分列が延ばされ、更に、こ
の着目部分列の検索を行って、符号化動作を続行するよ
うになっている。In this way, each time a substring that matches the substring of interest is searched, the next one character is added as an extended character K to the searched substring to extend the substring of interest. The subsequence is searched and the encoding operation is continued.

また、一致検出部242と同様に、NOR回路243も独立し
て動作している。従って、上述した読出動作と並行し
て、このNOR回路243により、アドレスレジスタ251に初
期値『０』以外の有効なポインタが格納されているか否
かにより、連結している候補要素があるか否かが判定さ
れる。In addition, the NOR circuit 243 also operates independently, like the match detection unit 242. Therefore, in parallel with the above-described read operation, the NOR circuit 243 determines whether or not there is a candidate element to be connected depending on whether or not the valid pointer other than the initial value “0” is stored in the address register 251. Is determined.

従がって、このNOR回路243により、判定結果として論
理“1"が出力され、連結している候補要素がないとされ
たときに、MPU201は、以下に述べる割り込み処理を行え
ばよい。Therefore, when the NOR circuit 243 outputs a logical "1" as the determination result and it is determined that there is no connected candidate element, the MPU 201 may perform the interrupt process described below.

まず、MPU201は、辞書検索回路240のタイミング制御
回路244に読出動作の中止を指示するとともに、最後に
検索された部分列に対応する参照番号ωを符号として出
力する。次に、MPU201は、アドレスレジスタ252に保持
された参照番号とFF245bの出力とを読み出し、このFF24
5bの出力に応じて、新しい部分列の登録処理を行う。First, the MPU 201 instructs the timing control circuit 244 of the dictionary search circuit 240 to stop the read operation, and outputs the reference number ω corresponding to the last searched substring as a code. Next, the MPU 201 reads the reference number held in the address register 252 and the output of FF245b, and the FF24b
Registration processing of a new subsequence is performed according to the output of 5b.

例えば、FF245bの出力が論理“0"である場合は、MUP2
01は、書込アドレスの最上位ビットを“0"として、索引
部231に新しい部分列を登録する。この場合は、拡張文
字Ｋの最上位ビットK_mと上述した参照番号とで示される
索引部231の格納領域に上述した参照番号ωに対応する
部分列に拡張文字Ｋを付加した部分列に与えられた参照
番号ω_nをポインタとして格納し、拡張文字Ｋを識別情
報として格納すればよい。このようにして、上述した参
照番号ωと拡張文字Ｋの最上位ビットK_mとに対応するグ
ループに属する最初の候補要素が登録される。For example, if the output of FF245b is logic "0", MUP2
In 01, the most significant bit of the write address is set to “0” and a new partial string is registered in the index unit 231. In this case, the extension string K is added to the substring corresponding to the above-mentioned reference number ω in the storage area of the index part 231 indicated by the most significant bit K _m of the extension character K and the above-mentioned reference number. The reference number ω _n thus obtained may be stored as a pointer, and the extended character K may be stored as identification information. In this way, the first candidate element belonging to the group corresponding to the above-mentioned reference number ω and the most significant bit K _m of the extended character K is registered.

一方、FF245bの出力が論理“1"である場合は、MPU201
は、書き込みアドレスの最上位ビットを“1"として、リ
スト部232に新しい部分列を登録する。この場合は、上
述したポインタに対応するリスト部232の格納領域に、
上述した参照番号ω_nをポインタとして格納し、拡張文
字Ｋを識別情報として格納すればよい。このようにし
て、上述した参照番号ωと拡張文字Ｋの最上位ビットK_m
とに対応するグループに属する新しい候補要素が登録さ
れる。On the other hand, when the output of FF245b is logic "1", MPU201
Registers a new partial string in the list section 232 with the most significant bit of the write address set to "1". In this case, in the storage area of the list unit 232 corresponding to the above-mentioned pointer,
The reference number ω _n described above may be stored as a pointer, and the extended character K may be stored as identification information. In this way, the above-mentioned reference number ω and the most significant bit K _m of the extended character K are
New candidate elements belonging to the group corresponding to and are registered.

その後、上述した拡張文字Ｋを着目部分列の先頭部分
とし、入力文字列の次の１文字を新しい拡張文字Ｋとし
て、符号化動作を続行すればよい。After that, the above-described extended character K is set as the head portion of the target substring, and the next one character of the input character string is set as the new extended character K, and the encoding operation may be continued.

上述したように、辞書230の索引部231とリスト部232
との各格納領域のポインタによって、各候補要素におけ
る拡張文字の最上位ビットが等しい候補要素の連結関係
を示し、候補要素を拡張文字の最上位ビットに応じて、
２つのグループに分割して探索する。As described above, the index part 231 and the list part 232 of the dictionary 230.
By the pointer of each storage area of and, the connection relationship of the candidate elements in which the most significant bit of the extended character in each candidate element is equal, the candidate element according to the most significant bit of the extended character,
Divide into two groups and search.

ここで、文字の最上位ビットとして、論理“1"が現れ
る確率と論理“0"が現れる確率とが等しい場合は、上述
したようにして分割された２つのグループには、それぞ
れ同数の候補要素が属すると考えられる。従って、上述
したようにして候補要素を２つのグループに分割して検
索を行うようにすることにより、検索処理の際に探索す
べき候補要素を約半分とすることができ、該当する候補
要素を高速に検出することが可能となる。Here, when the probability that a logic "1" appears as the most significant bit of a character and the probability that a logic "0" appears are the same, the two groups divided as described above have the same number of candidate elements. Are believed to belong. Therefore, by dividing the candidate elements into two groups and performing the search as described above, it is possible to reduce the number of candidate elements to be searched in the search processing to about half, and It is possible to detect at high speed.

更に、タイミング制御回路244により、読出サイクル
時間τごとに読出信号を辞書230に供給するとともに、
辞書230から読み出したポインタに基づいて作成した読
出アドレスを辞書230に供給する。また、一致検出部242
とNOR回路243とをそれぞれに独立に動作させ、辞書230
からの読出動作に並行して、一致検出動作と連結判定動
作とを行うことにより、MPU201を介することなく連結さ
れた候補要素の探索を行うことができ、また、読出処理
と一致検出処理および連結判定処理とをパイプライン化
して処理することが可能となる。Further, the timing control circuit 244 supplies a read signal to the dictionary 230 at every read cycle time τ, and
The read address created based on the pointer read from the dictionary 230 is supplied to the dictionary 230. Also, the match detection unit 242
And the NOR circuit 243 are operated independently of each other, and the dictionary 230
By performing the match detection operation and the connection determination operation in parallel with the read operation from, the connected candidate elements can be searched for without going through the MPU 201, and the read processing, the match detection processing, and the connection processing can be performed. It is possible to process the determination processing and the pipeline processing.

これにより、従来のように、MPUを介して全ての候補
要素の探索を行う場合に比べて、候補要素の探索処理に
要する時間を大幅に短縮することができる。As a result, it is possible to significantly reduce the time required for the candidate element search processing, as compared with the conventional case where all candidate elements are searched through the MPU.

このように、第２図に示したような簡単な回路を用い
て、辞書230からの文字列の検索処理を高速化すること
が可能となり、辞書の検索処理に要する時間を短縮し
て、符号化処理の高速化を図ることができる。この場合
は、符号化速度を磁気ディスク装置への転送速度と同程
度とすることができ、符号化したデータを磁気ディスク
装置などに実時間で転送することができる。As described above, by using the simple circuit as shown in FIG. 2, it is possible to speed up the search process of the character string from the dictionary 230, and the time required for the search process of the dictionary can be shortened. It is possible to speed up the conversion processing. In this case, the encoding speed can be set to the same level as the transfer speed to the magnetic disk device, and the encoded data can be transferred to the magnetic disk device or the like in real time.

また、MPU201は、一致検出部242およびNOR回路243の
出力に応じて、上述した割り込み処理を行えばよく、特
に高速に動作する必要はない。Further, the MPU 201 may perform the above-described interrupt processing according to the outputs of the match detection unit 242 and the NOR circuit 243, and does not need to operate at a high speed.

なお、上述した実施例においては、各候補要素の拡張
文字の最上位ビットに基づいて、候補要素を２つのグル
ープに分割する場合について説明したが、グループの分
割数および分割のしかたに制限はなく、各候補要素の拡
張文字に基づいて複数のグループに分割し、各グループ
ごとに探索するようにしたものであれば適用できる。In addition, in the above-mentioned embodiment, the case where the candidate element is divided into two groups based on the most significant bit of the extended character of each candidate element has been described, but the number of divisions of the group and the method of division are not limited. , It is applicable as long as it is divided into a plurality of groups based on the extended character of each candidate element and is searched for in each group.

例えば、拡張文字の上位２ビットに基づいて、候補要
素を４つのグループに分割してもよいし、拡張文字のビ
ットパターンに“1"が多い候補要素と、“0"が多い候補
要素とに分割してもよい。また、拡張文字のビットパタ
ーンのそれぞれに対応して、論理“1"あるいは論理“0"
を格納するルックアップテーブル（LUT）を用意し、こ
のLUTの出力が論理“1"となるグループと論理“0"とな
るグループとに分割してもよい。要は、複数のグループ
に属する候補要素の数がほぼ同数になるように分割すれ
ばよい。For example, the candidate element may be divided into four groups based on the upper 2 bits of the extended character, and the candidate element with many "1" s in the extended character bit pattern and the candidate element with many "0" s may be divided. You may divide. In addition, a logical "1" or a logical "0" is assigned to each of the extended character bit patterns.
It is also possible to prepare a look-up table (LUT) for storing and to divide the output of this LUT into a group having a logical “1” and a group having a logical “0”. In short, the division may be performed so that the number of candidate elements belonging to a plurality of groups is almost the same.

また、上述した実施例のように、拡張文字の１部をハ
ッシュアドレスに含めるようにした場合は、各候補要素
の識別情報として、拡張文字のハッシュアドレスに含ま
れていない部分のみを辞書230に格納すればよい。これ
により、ハッシュアドレスと識別情報との重複部分を除
いて、辞書230として用いるメモリの容量が増大するこ
とを防ぐことができる。When a part of the extended character is included in the hash address as in the above-described embodiment, only the portion not included in the extended character hash address is stored in the dictionary 230 as the identification information of each candidate element. Just store it. As a result, it is possible to prevent an increase in the capacity of the memory used as the dictionary 230 except for the overlapping portion between the hash address and the identification information.

また、上述した実施例にあっては、データ圧縮装置に
適用した場合について説明したが、これに限らず、木構
造を有する辞書を外部ハッシュ法を用いて構成し、この
辞書を検索する場合であれば適用できる。Further, in the above-described embodiment, the case where the present invention is applied to the data compression device has been described. However, the present invention is not limited to this, and when a dictionary having a tree structure is configured using the external hash method and this dictionary is searched, If applicable, it can be applied.

〔The invention's effect〕

上述したように、請求項１の発明によれば、読出動作
および一致検出動作を行う回数を削減するとともに、読
出手段と検出手段と判定手段とをそれぞれ独立に動作さ
せることにより、辞書の検索処理を高速化することが可
能となり、符号化処理の高速化を図ることができる。As described above, according to the first aspect of the present invention, the number of times the reading operation and the coincidence detecting operation are performed is reduced, and the reading means, the detecting means, and the determining means are operated independently of each other, whereby the dictionary search processing is performed. Can be speeded up, and the coding process can be speeded up.

請求項２の発明によれば、読出動作および一致検出動
作を行う回数を削減するとともに、読出動作と一致検出
動作および判定動作とをパイプライン化して処理するの
で、辞書からの文字列の検索処理を更に高速化すること
ができる。According to the second aspect of the present invention, the number of times the read operation and the match detection operation are performed is reduced, and the read operation and the match detection operation and the determination operation are pipelined and processed. Can be further speeded up.

[Brief description of drawings]

第１図は本発明の原理ブロック図、第２図は本発明の一実施例によるデータ圧縮装置の構成
図、第３図は実施例による辞書の構成を示す図、第４図はLZW符号化方式の説明図、第５図は辞書の構成を示す図、第６図は外部ハッシュ法の説明図、第７図は従来の符号化動作を表す流れ図である。図において、110は辞書、111は索引、112はリスト、121
は読出手段、122は検出手段、123は判定手段、201はマ
イクロプロセッサ、202は入力ポート、205は出力ポー
ト、206はバス、230は辞書、231は索引部、232はリスト
部、240は辞書検索回路、241は候補要素保持部、242は
一致検出部、243は否定論理和（NOR）回路、244はタイ
ミング制御回路、245はフリップフロップ（FF）、251,2
52はアドレスレジスタ、253は候補文字レジスタ、261は
拡張文字レジスタ、262は比較回路である。FIG. 1 is a block diagram of the principle of the present invention, FIG. 2 is a block diagram of a data compression device according to an embodiment of the present invention, FIG. 3 is a diagram showing a structure of a dictionary according to the embodiment, and FIG. 4 is LZW encoding. FIG. 5 is an explanatory diagram of the method, FIG. 5 is a diagram showing the structure of the dictionary, FIG. 6 is an explanatory diagram of the external hash method, and FIG. 7 is a flow chart showing a conventional encoding operation. In the figure, 110 is a dictionary, 111 is an index, 112 is a list, and 121.
Is a reading means, 122 is a detecting means, 123 is a determining means, 201 is a microprocessor, 202 is an input port, 205 is an output port, 206 is a bus, 230 is a dictionary, 231 is an index part, 232 is a list part, and 240 is a dictionary. A search circuit, 241 is a candidate element holding unit, 242 is a match detection unit, 243 is a NOR (NOR) circuit, 244 is a timing control circuit, 245 is a flip-flop (FF), 251,2
52 is an address register, 253 is a candidate character register, 261 is an extended character register, and 262 is a comparison circuit.

───────────────────────────────────────────────────── フロントページの続き (72)発明者千葉広隆神奈川県川崎市中原区上小田中1015番地富士通株式会社内 (56)参考文献特開平２−85927（ＪＰ，Ａ) 特開平２−132556（ＪＰ，Ａ) ─────────────────────────────────────────────────── --- Continuation of the front page (72) Hirotaka Chiba, Inventor Hirotaka Chiba 1015 Kamiodanaka, Nakahara-ku, Kawasaki-shi, Kanagawa Within Fujitsu Limited (56) References JP-A-2-85927 (JP, A) JP-A-2-132556 (JP, A)

Claims

(57) [Claims]

1. A dictionary for searching a character string represented by an input reference number and a character from different character strings registered in a dictionary (110) corresponding to a reference number given to each. In the search method, the dictionary (110) divides a candidate element obtained by adding one different character as an extended character to each of the character strings given the reference numbers into a plurality of groups based on the extended character. And an index (111) for storing the reference number given to any of the candidate elements belonging to each group and the identification information corresponding to this candidate element, corresponding to the reference number and the information about the extended character. In correspondence with the reference number, the reference number given to any of the other candidate elements belonging to the same group as the candidate element given the reference number and the identification information corresponding to this candidate element are stored. First, the output of the reference number and the identification information stored in the index (111) corresponding to the input reference number and the information about the input character is performed. The output of the reference number and the identification information stored in the list (112) corresponding to the reference number output by the dictionary (110) is output to the dictionary (11
0) and a reading means (121) for instructing, and based on the identification information read from the dictionary (110), a candidate element in which an input character is added as an extended character is detected, and this detection result is used as a search result. Based on the reference number read from the dictionary (110) and the detection means (122) for outputting as a determination result, and a determination for outputting the determination result as a search result. Means (123)
A dictionary retrieval system comprising: and a reading means (121), a detecting means (122), and a judging means (123) which operate independently of each other.

2. The reading means (121) starts a reading operation for the dictionary (110) at a predetermined time interval, and the reading operation by the detecting means (122) and the judging operation by the judging means (123) are performed. 2. The dictionary search system according to claim 1, wherein the reading operation is performed in parallel with the reading operation by the reading means (121).