JP3062119B2

JP3062119B2 - Character string search table, method for creating the same, and character string search method

Info

Publication number: JP3062119B2
Application number: JP9163913A
Authority: JP
Inventors: 吉昭高橋
Original assignee: 日本電気アイシーマイコンシステム株式会社
Priority date: 1997-06-20
Filing date: 1997-06-20
Publication date: 2000-07-10
Anticipated expiration: 2017-06-20
Also published as: JPH1115836A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は文字列探索方法に関
し、特に文字列探索用テーブルを用いることにより高速
に文字列探索可能とした文字列探索方法に関する。ま
た、その文字列探索用テーブルと、その作成方法に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character string search method, and more particularly to a character string search method capable of performing a high-speed character string search by using a character string search table. In addition, the present invention relates to the character string search table and a method of creating the same.

【０００２】[0002]

【従来の技術】従来の一般的な文字列探索方法として、
逐次探索方法、２分木探索方法、ハッシュ探索方法があ
るが、各々の探索方法の平均探索回数は、与えられた文
字列がそれらの中のどれかに一致しているかどうかを探
索する予約語数をＮとおくと、逐次探索方法：（Ｎ＋
１）／２回、２分木探索方法：Ｌｏｇ₂ Ｎ回、ハッシュ
探索方法：１回（ただし条件による）となり、ハッシュ
探索方法を除き、予約語数が増加するにつれ探索回数も
増加するため高速に予約語の探索をすることが困難であ
った。そのため、従来技術ではハッシュ探索方法が最も
高速な探索方法であった。2. Description of the Related Art As a conventional general character string search method,
There are a sequential search method, a binary tree search method, and a hash search method, and the average number of searches in each search method is the number of reserved words for searching whether a given character string matches any of them. Let N be the sequential search method: (N +
1) / 2 times binary tree search method: Log ₂ N times, hash search method: once (however, depending on conditions), except for the hash search method, the number of searches increases as the number of reserved words increases, so that it is difficult to search for reserved words at high speed. Therefore, in the related art, the hash search method is the fastest search method.

【０００３】以下、各方法について詳述する。Hereinafter, each method will be described in detail.

【０００４】［従来の技術１］逐次探索方法逐次探索方法とは、予めＮ個（Ｎ＞０）の予約語を文字
列の配列に格納しておき、入力された文字列に対し、こ
の配列の配列要素を１からＮまで増加させながら、順次
配列要素内に格納しておいた予約語を参照し、入力され
た文字列と参照した予約語間の文字列の比較を行うこと
で予約語の探索を行う方法である。探索方向が線形なた
め、線形探索方法とも呼ばれる。[Prior Art 1] Sequential Search Method A sequential search method is a method in which N (N> 0) reserved words are stored in an array of character strings in advance, and this array is stored in an array of character strings. While increasing the array elements from 1 to N, the reserved words sequentially stored in the array elements are referred to, and the input character string and the character string between the referenced reserved words are compared to obtain the reserved words. This is a method for searching. Because the search direction is linear, it is also called a linear search method.

【０００５】［従来の技術２］２分木探索方法２分木探索方法とは、予めＮ個（Ｎ＞０）の予約語を、
各ノード（節点）にデータと左右２個のポインタｌｅｆ
ｔ、ｒｉｇｈｔを持ち、左のポインタｌｅｆｔでつなが
る子孫のデータは自分より小さく、右のポインタｒｉｇ
ｈｔでつながる子孫のデータはすべて自分より大きいよ
うな２分木構造に格納しておき、入力文字列に対し、こ
の２分木構造の根のノードから探索を始め、入力文字列
と予約語の文字列の大小比較を行い、比較した結果が同
じであれば探索終了で、比較した結果が小さい場合はｌ
ｅｆｔポインタにつながっている子孫のノードに移動
し、比較した結果が大きい場合はｒｉｇｈｔポインタに
つながっている子孫のノードに移動し、再び先の入力文
字列と移動先のノードに格納してある予約語との比較を
行い文字列の探索を行う方法である。[Related Art 2] Binary Tree Search Method A binary tree search method is a method in which N (N> 0) reserved words are
Data and two left and right pointers ref at each node (node)
Descendants that have t and right, and the descendant data connected by the left pointer left are smaller than their own, and the right pointer rig
ht is stored in a binary tree structure that is larger than itself, and a search is started for the input character string from the root node of the binary tree structure. The character strings are compared in magnitude. If the comparison result is the same, the search ends. If the comparison result is small, l
move to the descendant node connected to the left pointer, and if the result of the comparison is large, move to the descendant node connected to the right pointer, and again save the input character string and the destination node This is a method of comparing a word and searching for a character string.

【０００６】［従来の技術３］ハッシュ探索方法データｘを０≦ｈ（ｘ）＜ｎの範囲のなるべく一様に分
布する整数に変換する関数ｈ（ｘ）をハッシュ関数とい
い、このハッシュ関数を使って予約語を探索する方法を
ハッシュ探索方法という。簡単なハッシュ関数として
は、データｘを整数化して値Ｍで割り、その余りを関数
値とする。また通例、値Ｍは素数とする。[Prior Art 3] Hash search method A function h (x) for converting data x into integers distributed as uniformly as possible within a range of 0 ≦ h (x) <n is called a hash function. A method of searching for a reserved word using is called a hash search method. As a simple hash function, the data x is converted to an integer, divided by the value M, and the remainder is used as a function value. Usually, the value M is a prime number.

【０００７】ハッシュ関数の値を予約語の登録場所に結
びつける代表的な方法は、先の２分探索木をｎ本用意
し、データｘをｈ（ｘ）番の木に登録する方法と、ｎ個
の予約語を格納する配列を用意しておいて、値ｈ（ｘ）
をこの配列の要素として要素内に直接予約語を登録する
方法である。２分木を用いる方法はｎ個の予約語を格納
する配列を用意する方法に比較して検索時間がかかるた
め、以下はｎ個の予約語を格納する配列を使う方法につ
いて述べる。A typical method of linking a hash function value to a reserved word registration location is to prepare n binary search trees and register data x in a tree of number h (x), An array for storing reserved words is prepared, and the value h (x)
Is a method of registering a reserved word directly in an element as an element of this array. Since a method using a binary tree requires a longer search time than a method of preparing an array storing n reserved words, a method using an array storing n reserved words will be described below.

【０００８】これは、予めこのハッシュ関数を用いて、
予約語を整数値化して、この整数値を予約語を格納する
配列の要素として予約語を格納しておき、入力された文
字列をハッシュ関数にかけて整数値を得て、この整数値
を予約語を格納しておいた配列の要素として文字列の探
索をする方法である。[0008] This is done by using this hash function in advance.
The reserved word is converted to an integer value, and the reserved word is stored as an element of an array for storing the reserved word. The input character string is subjected to a hash function to obtain an integer value. Is a method of searching for a character string as an element of an array in which is stored.

【０００９】例として、図９にハッシュ探索方法におけ
る予約語テーブルの作成フロー、図１０にハッシュ関数
の処理フローおよび図１１にハッシュ探索方法の処理フ
ローを示す。As an example, FIG. 9 shows a flow of creating a reserved word table in the hash search method, FIG. 10 shows a process flow of the hash function, and FIG. 11 shows a process flow of the hash search method.

【００１０】ハッシュ探索方法の前処理として、９０１
にて予約語を格納する配列（ハッシュ表）を用意する。
次に９０２にてハッシュ関数を作成する（後述）。次に
９０３にて予約語を９０２のハッシュ関数に渡し、ハッ
シュ値を得て、このハッシュ値を９０１の配列の要素と
して、要素内に予約語を格納する。予約語の数だけ９０
２から９０３の処理を繰り返す。[0010] As preprocessing of the hash search method, 901
Prepare an array (hash table) for storing reserved words.
Next, a hash function is created in 902 (described later). Next, in 903, the reserved word is passed to the hash function of 902, a hash value is obtained, and this hash value is stored as an element of the array of 901 in the element. 90 reserved words only
Steps 2 to 903 are repeated.

【００１１】ここで９０２のハッシュ関数の内部処理に
ついて説明する。図１０に示すのは、入力された文字列
を整数化して値Ｍの素数で割り、その余りを関数値とす
る簡単なハッシュ関数９０２の内部処理の一例である。
まず、１００１にて内部変数のうちの１つであるハッシ
ュ値ｈの値を０で初期化する。同様に内部変数のうちの
１つである文字ポインタｓを入力された文字列の頭文字
を示すように初期化する。１００２にてポインタｓが示
す文字を得て、１００３にてこの文字が文字列の終わり
かどうか判断する。終わりであれば１００８に進み、終
わりでなければ１００４に進む。１００４にて変数ｈの
値を左に８ビットシフト（２５６倍化）して、さらに１
００５にて変数ｈにポインタｓの示す文字の文字コード
値を加算する。次に１００６にて変数ｈの値を先の９０
１にて用意したハッシュ表の要素数で割り、これの余り
値を再びｈに代入する。次に１００７にて文字ポインタ
ｓを１つ進めて、再び１００２からの処理を繰り返す。
この繰り返し処理にて入力された文字列をすべて読み終
えたら、１００８にて確定されたハッシュ値ｈを戻す。Now, the internal processing of the hash function 902 will be described. FIG. 10 shows an example of internal processing of a simple hash function 902 in which an input character string is converted to an integer, divided by a prime number of a value M, and the remainder is used as a function value.
First, in step 1001, the value of a hash value h, which is one of the internal variables, is initialized to zero. Similarly, a character pointer s, which is one of the internal variables, is initialized to indicate the first character of the input character string. At 1002, the character indicated by the pointer s is obtained, and at 1003, it is determined whether or not this character is the end of the character string. If it is the end, go to 1008; otherwise, go to 1004. At 1004, the value of the variable h is shifted left by 8 bits (256 times), and
In 005, the character code value of the character indicated by the pointer s is added to the variable h. Next, in 1006, the value of the variable h is
The value is divided by the number of elements in the hash table prepared in step 1, and the remainder is substituted into h again. Next, at 1007, the character pointer s is advanced by one, and the processing from 1002 is repeated again.
When all the input character strings have been read in this repetition processing, the hash value h determined in 1008 is returned.

【００１２】最後にハッシュ探索方法の処理フローを図
１１を参照して述べる。１１０１にて入力された文字列
を得て、１１０２で例えば入力文字列長が０であるかど
うかをみることにより文字列入力を終了するかどうかを
判断し、終了でなければ１１０３に進む。１１０３にて
入力された文字列を先の９０２のハッシュ関数に渡し、
これの戻り値を先の９０１の配列の要素とする。この要
素内に予約語が格納されているかどうかを調べる。１１
０４にて予約語があった場合、１１０５の予約語が見つ
かった場合の処理を実行し、再び１１０１の文字列の入
力に戻る。１１０３にて予約語が見つからなかった場合
でも、同様に１１０１に戻る。入力文字列が終わった場
合、１１０２にて文字列入力の終わりを認識して処理フ
ローを終える。Finally, the processing flow of the hash search method will be described with reference to FIG. In step 1101, the input character string is obtained, and in step 1102, it is determined whether or not the input of the character string is terminated by checking whether the input character string length is 0. If not, the process proceeds to step 1103. The character string input in 1103 is passed to the hash function of 902,
The return value is used as the element of the array 901. Checks whether a reserved word is stored in this element. 11
If there is a reserved word in 04, the processing in the case where the reserved word of 1105 is found is executed, and the process returns to the input of the character string of 1101 again. If no reserved word is found in 1103, the process returns to 1101 in the same manner. When the input character string ends, the end of the character string input is recognized in 1102, and the processing flow ends.

【００１３】[0013]

【発明が解決しようとする課題】ところで、ここでの第
１の問題点は、ハッシュ探索方法はあらかじめそれぞれ
の予約語を、ハッシュ関数にてなるべく一様に分布する
整数値に変換し、その整数値を予約語を格納する配列の
要素番号として各々の予約語を格納するが、登録を行う
予約語数および１予約語を構成する文字数によってはハ
ッシュ値が重複する場合がある。The first problem here is that the hash search method previously converts each reserved word into an integer value that is distributed as uniformly as possible by using a hash function, and the integer value is converted to the integer value. Each reserved word is stored as a numerical value as an element number of an array for storing the reserved word. However, hash values may overlap depending on the number of reserved words to be registered and the number of characters constituting one reserved word.

【００１４】その理由は、一般に、計算機の内部表現に
おいて１文字をｎビットで表現している場合、１文字列
を構成する文字数をｍとおくと、文字列固有の整数値を
表現するにはｎ×ｍビットが必要となるが、ハッシュ値
もｎ×ｍビットで表現できなければ、その整数値を１対
１に写像することができないということである。例え
ば、３２ビットのＣＰＵのレジスタ長（３２ビット）で
扱えるハッシュ値をもとの文字列から写像しようとした
場合、文字がＡＳＣＩＩコード（１文字＝８ビット）で
あるとしても、４文字長の文字列しか１対１に写像でき
ないことになる。よって４文字を超える文字列のハッシ
ュ値を求める際にハッシュ値が重複する可能性がある。The reason is that, in general, when one character is represented by n bits in the internal representation of a computer, if the number of characters constituting one character string is m, it is necessary to express an integer value unique to the character string. Although n × m bits are required, if the hash value cannot be represented by n × m bits, the integer value cannot be mapped one-to-one. For example, if an attempt is made to map a hash value that can be handled by a 32-bit CPU register length (32 bits) from an original character string, even if the character is an ASCII code (1 character = 8 bits), a 4-character length is used. Only character strings can be mapped one-to-one. Therefore, when obtaining the hash value of a character string exceeding four characters, the hash values may overlap.

【００１５】また、第２の問題点は、ハッシュ関数を複
数化（多段化）して必ずユニークな値を生成できるよう
にしてもハッシュ関数内の演算に要する時間が増えてし
まいユニークな値の生成に時間がかかることである。The second problem is that even if the hash function is made plural (multi-stage) so that a unique value can always be generated, the time required for the operation in the hash function increases, and the unique value It takes time to generate.

【００１６】その理由は、現状のＣＰＵの中では演算命
令、特に乗除算命令は他の命令、例として分岐命令と比
較して時間がかかるものであるが、特に第１の問題点に
て述べたように、ハッシュ関数内にて、ハッシュ関数値
がＣＰＵの扱えるデータビット数を超えるような場合に
は必ず除算は必要となるということである。ハッシュ関
数を複数化すると乗除算命令を多数回使用しなければな
らなくなり、演算時間が増えてしまう。The reason is that, among the current CPUs, arithmetic instructions, especially multiplication / division instructions, take more time than other instructions, for example, branch instructions, but are described in the first problem. As described above, in the hash function, when the hash function value exceeds the number of data bits that can be handled by the CPU, division is always required. If a plurality of hash functions are used, a multiplication / division instruction must be used many times, which increases the operation time.

【００１７】例としてμＰＤ７０２０８における各演算
時間は以下の通りである。As an example, each operation time in the μPD 70208 is as follows.

【００１８】加算：１４クロック減算：１４クロック乗算：３９クロック除算：３４クロック比較：１６クロック８ビット左シフト：１５クロック分岐：１５クロック以上、２つの問題が文字列探索の高速化を妨げている。Addition: 14 clocks Subtraction: 14 clocks Multiplication: 39 clocks Division: 34 clocks Comparison: 16 clocks 8 bits left shift: 15 clocks Branch: 15 clocks The above two problems hinder the speed-up of character string search. .

【００１９】そこで本発明は、高速な文字列探索方法を
提供することを目的とする。Therefore, an object of the present invention is to provide a high-speed character string search method.

【００２０】[0020]

【００２１】[0021]

【課題を解決するための手段】本発明による文字列探索
用テーブル作成方法は、入力されうる全ての文字種類を
行とし、文字列探索処理状態を示す状態値種類を列とす
る２次元配列構造のテーブルであり、各配列要素に、予
め全ての予約語の探索処理状態に応じて決められた状態
値を変更する処理手順へ分岐する分岐先アドレス、もし
くは前記予約語が探索された場合の処理手順へ分岐する
分岐先アドレスが格納されている文字列探索用テーブル
を作成する文字列探索用テーブル作成方法において、入
力されうる全ての文字種類を前記文字列探索用テーブル
の行方向の大きさとする第１の手順と、文字列探索状況
の初期値を示す状態値「０」と状態値を「０」に変更す
る処理部「処理０」を用意する第２の手順と、前記文字
列探索用テーブルに登録する全ての予約語をソートする
第３の手順と、前記ソートされた全ての予約語から第１
の予約語「予約語１」を取り出し、「予約語１」を構成
する１文字目からｎ文字目（ｎは予約語１の文字列長）
までを各々状態値「１」から状態値「ｎ」として用意す
る第４の手順と、状態値「０」から状態値「ｎ」までを
前記文字列探索用テーブルの列方向の大きさとする第５
の手順と、状態値を「１」に変更する処理部「処理１」
から、状態値を「ｎ」に変更する処理部「処理ｎ」まで
のｎ個の処理部を用意する第６の手順と、予約語「予約
語１」を探索した場合の処理「処理予約語１」を用意す
る第７の手順と、前記文字列探索用テーブルにおいて、
「予約語１」の１文字目からｎ文字目までの各文字を
行、状態値「０」から状態値「ｎ−１」までを列とする
各々の要素内に「処理１」から「処理ｎ」への分岐アド
レスを格納する第８の手順と、前記文字列探索用テーブ
ルにおいて、デリミタ文字を行、状態値「ｎ」を列とす
る要素内に処理「処理予約語１」へのアドレスを格納す
る第９の手順と、前記ソート後の次の予約語「予約語
ｍ」を取り出し「予約語ｍ」の構成文字数Ａを求める第
１０の手順と、予約語「予約語ｍ」と予約語「予約語ｍ
−１」を比較し両予約語の１文字目からｎ文字目まで連
続した同じ文字の数Ｂを求める第１１の手順と、Ａ−Ｂ
の分だけ新規に状態値「ｎ＋１」から状態値「ｎ＋Ａ−
Ｂ」までを用意する第１２の手順と、改めて状態値
「０」から状態値「ｎ＋Ａ−Ｂ」までを前記文字列検索
用テーブルの列方向の大きさとする第１３の手順と、状
態値を「ｎ＋１」に変更する処理部「処理ｎ＋１」か
ら、状態値を「ｎ＋Ａ−Ｂ」に変更する処理部「処理ｎ
＋Ａ−Ｂ」までＡ−Ｂ個の処理部を各々用意する第１４
の手順と、予約語「予約語ｍ」を探索した場合の処理
「処理予約語ｍ」を用意する第１５の手順と、前記文字
列探索用テーブルにおいて、前記「予約語ｍ」のＢ＋１
文字目を行、状態値「Ｂ」を列とする要素内に処理部
「処理ｎ＋１」への分岐アドレスを格納する第１６の手
順と、前記文字列探索用テーブルにおいて、前記「予約
語ｍ」のＢ＋２文字目からＡ文字目までの各文字を行、
前記状態値「ｎ＋_100;50 B１ _100;100 B」から「ｎ
＋Ａ−Ｂ−１」を列とする各々の要素内に「処理ｎ＋
２」から処理「ｎ＋Ａ−Ｂ」への分岐アドレスを格納す
る第１７の手順と、前記文字列探索用テーブルにおい
て、デリミタ文字を行、状態値「ｎ＋Ａ−Ｂ」を列とす
る要素内に処理「処理予約語ｍ」へのアドレスを格納す
る第１８の手順と、状態値「ｎ」に状態値「ｎ＋Ａ−
Ｂ」を代入する第１９の手順と、前記ソートした残りの
全ての予約語について、第１０の手順から第１９の手順
まで繰り返し、全ての予約語の登録が終わった後、前記
文字列探索用テーブルの要素内のうち、何も格納してい
ない要素内に処理部「処理０」への分岐アドレスを格納
する第２０の手順とを有することを特徴とする。 According to the present invention, there is provided a method for creating a character string search table according to the present invention, in which a two-dimensional array structure is provided in which all types of characters that can be input are set as rows, and state values indicating the state of the character string search processing are set as columns. A branch destination address for branching to a processing procedure for changing a state value determined in advance according to a search processing state of all reserved words in each array element, or processing when the reserved word is searched for In the character string search table creating method for creating a character string search table in which a branch destination address for branching to a procedure is stored, all types of characters that can be input are set to the size of the character string search table in the row direction. A first procedure, a second procedure of preparing a state value “0” indicating an initial value of the character string search situation, and a processing unit “processing 0” for changing the state value to “0”; table A third procedure for sorting all reserved words to be registered, all reserved words which are the sorted first
Of reserved word "reserved word 1", and the first to nth characters constituting "reserved word 1" (where n is the character string length of reserved word 1)
And a fourth procedure for preparing state values “1” to “n” from the state values “0” to “n” in the column direction of the character string search table. 5
And the processing unit "processing 1" for changing the state value to "1"
, A sixth procedure for preparing n processing units up to a processing unit “processing n” for changing the state value to “n”, and a processing “processing reserved word” when the reserved word “reserved word 1” is searched for In the seventh procedure for preparing "1" and the character string search table,
Each of the characters from the first character to the n-th character of the “reserved word 1” is a line, and each of the elements having a state value “0” to a state value “n−1” is a column. Eighth procedure for storing the branch address to "n", and processing in the element having the delimiter character as a row and the state value "n" as a column in the character string search table The address to "processing reserved word 1" A ninth procedure of storing the reserved word “reserved word m” after the sorting, a tenth procedure of obtaining the number of characters A of the “reserved word m”, and a reserved word “reserved word m” The word "reserved word m
-1 ", and the eleventh procedure for calculating the number B of the same characters consecutive from the first character to the n-th character of both reserved words, and AB
From the state value “n + 1” to the state value “n + A−
B), a thirteenth procedure in which the state value “0” to the state value “n + A−B” are again set in the column direction of the character string search table, and a state value From the processing unit “processing n + 1” that changes to “n + 1”, the processing unit “processing n” that changes the state value to “n + AB”
+ A-B "to prepare AB processing units, respectively.
And a fifteenth procedure for preparing a process “processing reserved word m” when the reserved word “reserved word m” is searched. In the character string search table, B + 1 of the “reserved word m”
A sixteenth procedure of storing a branch address to the processing unit “processing n + 1” in an element having a character as a row and a status value “B” as a column, and in the character string search table, the “reserved word m” Each character from B + 2 character to A character is lined,
From the state value “n + _100; 50 B1 —100; 100 B” to “n”
+ A-B-1 ”is included in each of the elements having“ process n +
A seventeenth procedure for storing the branch address from "2" to the processing "n + AB", and processing in an element having a delimiter character as a row and a status value "n + AB" as a column in the character string search table An eighteenth procedure for storing the address to the “process reserved word m”, and the state value “n + A−
B) and the tenth to nineteenth steps are repeated for all the remaining reserved words that have been sorted, and after all the reserved words have been registered, the character string search A twentieth procedure of storing a branch address to the processing unit “processing 0” in an element that does not store anything among the elements of the table.

【００２２】本発明による記録媒体は、上記の文字列探
索方法をコンピュータに実行させるためのプログラムを
記録したことを特徴とする。A recording medium according to the present invention is characterized by recording a program for causing a computer to execute the above character string search method.

【００２３】また、本発明による記録媒体は、上記の文
字列探索用テーブル作成方法をコンピュータに実行させ
るためのプログラムを記録したことを特徴とする。Further, a recording medium according to the present invention is characterized by recording a program for causing a computer to execute the above-described method for creating a character string search table.

【００２４】[0024]

【００２５】[0025]

【発明の実施の形態】以下に示す順序で説明を進める。DESCRIPTION OF THE PREFERRED EMBODIMENTS The description will proceed in the following order.

【００２６】Ａ．文字列探索用テーブルの作成方法Ｂ．文字列探索用テーブルを使用した文字列探索方法A. Method of creating character string search table B. String search method using a string search table

【００２７】本実施形態では扱う文字をASCIIコード系
の文字とし、入力されうる文字は１６進表記で0x00から
0x7Fまでの１２８種の文字とする。入力されうる文字は
１２８種なので文字列探索用テーブルの文字要素数は１
２８個となる。また、文字列と文字列を区切る文字（こ
れをデリミタと定義する）を空白文字（文字コード0x2
0）とする。ここで予約語を以下のように定義する。In this embodiment, the characters to be handled are ASCII code type characters, and the characters that can be input are from 0x00 in hexadecimal notation.
128 types of characters up to 0x7F. Since there are 128 types of characters that can be input, the number of character elements in the character string search table is 1
It becomes 28 pieces. In addition, the character that separates a character string from the character string (this is defined as a delimiter) is replaced by a blank character (character code 0x2
0). Here, reserved words are defined as follows.

【００２８】 NEC XYZ NIMS NET まず、文字列探索処理状態の状態値「0」を用意して、
状態値を「0」に変更する処理部「処理0」を以下のよう
に定義する。NEC XYZ NIMS NET First, a state value “0” of the character string search processing state is prepared.
The processing unit “process 0” that changes the status value to “0” is defined as follows.

【００２９】[0029]

【表１】次に予約語をソートする。これは昇順でも降順でもかま
わない。本例では昇順にソートする。ソート後は以下の
ような並びになる。[Table 1] Next, the reserved words are sorted. This can be in ascending or descending order. In this example, sorting is performed in ascending order. After sorting, the sequence is as follows.

【００３０】 NEC NET NIMS XYZ まず初めの予約語NECは３文字から構成されているた
め、３つの新規状態値「1」、「2」、「3」を用意す
る。よって予約語NECを認識するための文字列探索用テ
ーブルは図４のように用意する。NEC NET NIMS XYZ Since the first reserved word NEC is composed of three characters, three new state values “1”, “2”, and “3” are prepared. Therefore, a character string search table for recognizing the reserved word NEC is prepared as shown in FIG.

【００３１】次にこれら３つの状態値に対応した状態値
を更新する処理部を３つ用意する。ここで３つの各処理
部を「処理1」、「処理2」、「処理3」と定義し、各処
理部へのアドレスが得られるようにラベル化しておく。
また予約語NECを探索した場合の処理部「処理NEC」も作
成する。Next, three processing units for updating state values corresponding to these three state values are prepared. Here, the three processing units are defined as “processing 1”, “processing 2”, and “processing 3”, and are labeled so that addresses to the respective processing units can be obtained.
In addition, a processing unit “processing NEC” when the reserved word NEC is searched is also created.

【００３２】これら各処理部を以下に示す。Each of these processing units is described below.

【００３３】[0033]

【表２】各処理内容中の「読み出し処理部」は後述のフロー説明
で述べる。また、「処理NEC」内の予約語NECを探索した
場合の処理は任意の処理である。[Table 2] The “read processing unit” in each processing content will be described later in the flow description. Further, the processing when the reserved word NEC in the “processing NEC” is searched is an arbitrary processing.

【００３４】次に予約語NECの１文字目「N」と状態値
「0」より図４の文字列探索用テーブルを参照して、状
態値が「0」で文字要素が「N」の要素内に「処理1」へ
の分岐アドレスを格納する。同様に２文字目の「E」と
状態値「1」より図４の文字列探索用テーブルを参照し
て、状態値が「1」で文字要素が「E」の要素内に「処理
2」への分岐アドレスを格納する。同様に３文字目の
「C」と状態値「2」より図４の文字列探索用テーブルを
参照して、状態値が「2」で文字要素が「C」の要素内に
「処理3」への分岐アドレスを格納する。最後にデリミ
タの「空白文字」と状態値「3」より図４の文字列探索
用テーブルを参照して、状態値が「3」で文字要素がデ
リミタの要素内に「処理NEC」への分岐アドレスを格納
する。以上で予約語NECを探索するための文字列探索用
テーブルは完成である。この結果得られた文字列探索用
テーブルを図５に示す。Next, referring to the character string search table of FIG. 4 from the first character "N" of the reserved word NEC and the state value "0", an element having the state value "0" and the character element "N" Stores the branch address to "Process 1". Similarly, referring to the character string search table of FIG. 4 based on the second character “E” and the state value “1”, “processing” is included in the element having the state value “1” and the character element “E”.
Store the branch address to "2". Similarly, referring to the character string search table of FIG. 4 from the third character “C” and the state value “2”, “processing 3” is included in the element having the state value “2” and the character element “C”. Stores the branch address to Finally, referring to the character string search table of FIG. 4 based on the delimiter "blank character" and the state value "3", the state value is "3", and the character element is branched to "processing NEC" in the delimiter element. Store the address. This completes the character string search table for searching for the reserved word NEC. FIG. 5 shows a character string search table obtained as a result.

【００３５】続いて２つ目以降の予約語を登録してい
く。２つ目以降の予約語を登録する場合、新規に状態値
を用意する必要があるかどうか、以下に述べる計算式に
従い判断する必要がある。Subsequently, the second and subsequent reserved words are registered. When registering the second and subsequent reserved words, it is necessary to determine whether it is necessary to prepare a new state value in accordance with the following formula.

【００３６】これから登録する予約語を構成する文字数
をＡとおき、直前に登録した予約語とこれから登録する
予約語において、頭文字から任意文字目まで連続した同
じ文字の数をＢとおいた場合、Ａ−Ｂが新規に必要な状
態数である。新規状態値が必要な場合、必然的に文字列
探索用テーブルを状態値要素方向に拡張しなければなら
ない。When the number of characters constituting a reserved word to be registered from now on is assumed to be A, and the number of the same character continuing from the first letter to an arbitrary character in the reserved word registered immediately before and the reserved word to be registered is assumed to be B, AB is the newly required number of states. When a new state value is required, the character string search table must be extended in the direction of the state value element.

【００３７】本例では、第２に登録する予約語はNET
で、予約語NETは計３文字であるのでＡ＝３である。次
に直前に登録した予約語NECと今回登録する予約語NETに
おいて、頭文字から任意文字目まで同じ文字数を調べ
る。予約語NECと予約語NETでは、「N」、「E」が同じで
あり、計２文字が同じであるのでＢ＝２である。ゆえ
に、Ａ−Ｂ＝１となり、新規の状態値を１つ用意する必
要がある。よって新たに状態値「4」を用意し、文字列
探索用テーブルの状態値要素方向（列方向）の要素を状
態値「0」から状態値「4」までとする。また新たに状態
値を更新する処理「処理4」および予約語NETを探索した
場合の処理「処理NET」を設ける。この「処理4」と「処
理NET」を以下のように定義する。In this example, the second reserved word is NET
Since the reserved word NET has a total of three characters, A = 3. Next, in the reserved word NEC registered immediately before and the reserved word NET registered this time, the same number of characters is checked from the initial letter to the arbitrary character. In the reserved word NEC and the reserved word NET, “N” and “E” are the same, and B = 2 because two characters are the same. Therefore, AB = 1, and it is necessary to prepare one new state value. Therefore, a new state value “4” is prepared, and the elements in the state value element direction (column direction) of the character string search table are changed from the state value “0” to the state value “4”. Further, a process "process 4" for newly updating the state value and a process "process NET" for searching for a reserved word NET are provided. The "process 4" and "process NET" are defined as follows.

【００３８】[0038]

【表３】次に、予約語NETにおける３文字目（Ｂ＋１文字目）の
「T」と状態値「2」（状態値「Ｂ」）より文字列探索用
テーブルを参照し、状態値が「2」で文字要素が「T」の
要素内に処理部「処理4」への分岐アドレスを格納す
る。次にデリミタ文字と状態値「4」（状態値「ｎ＋Ａ
−Ｂ」、但しｎは１つ前の予約後の登録を終了した時点
での最高の状態値）より文字列探索用テーブルを参照
し、状態値が「4」で文字要素がデリミタの要素内に処
理「予約語NET」への分岐アドレスを格納する。この結
果得られた文字列探索用テーブルを図６に示す。[Table 3] Next, the character string search table is referred from the "T" of the third character (B + 1 character) and the state value "2" (state value "B") in the reserved word NET, and the character having the state value "2" The branch address to the processing unit “processing 4” is stored in the element whose element is “T”. Next, the delimiter character and the status value “4” (status value “n + A
−B ”, where n is the highest state value at the end of the registration after the previous reservation) and refers to the character string search table, and the state value is“ 4 ”and the character element is within the delimiter element Stores the branch address to the processing "reserved word NET". FIG. 6 shows the character string search table obtained as a result.

【００３９】同様に次の予約語NIMSについて考える。直
前に登録した予約語はNETである。これから登録する予
約語を構成する文字数Ａ＝４（文字）、直前に登録した
予約語とこれから登録する予約語において、頭文字から
任意文字目まで連続した同じ文字の数Ｂ＝１（文字）な
ので、Ａ−Ｂ＝３となり、新たに３つの状態値「5」、
「6」、「7」を設け、改めて文字列探索用テーブルの状
態値要素方向（列方向）の大きさを状態値「0」から状
態値「7」までの大きさとする。また状態値を各々
「5」、「6」、「7」に変更する処理部「処理5」、「処
理6」、「処理7」を設ける。また予約語NIMSを探索した
場合の処理部「処理NIMS」も設ける。これら各処理部を
以下に示す。Similarly, consider the next reserved word NIMS. The reserved word registered immediately before is NET. Since the number of characters A constituting the reserved word to be registered from now on is A = 4 (characters), and the number of the same character continuous from the initial character to the arbitrary character in the reserved word registered immediately before and the reserved word to be registered is B = 1 (character). , AB = 3, and three new state values “5”,
“6” and “7” are provided, and the size in the state value element direction (column direction) of the character string search table is set to a size from the state value “0” to the state value “7”. Further, processing units “processing 5”, “processing 6”, and “processing 7” for changing the state values to “5”, “6”, and “7” are provided. Also, a processing unit “processing NIMS” for searching for a reserved word NIMS is provided. These processing units are described below.

【００４０】[0040]

【表４】次に予約語NIMSの２文字目「I」と状態値「1」（最初の
文字が「N」だと状態値は「1」になるので）より文字列
探索用テーブルを参照し、状態値が「1」で文字要素が
「I」の要素内に処理部「処理5」への分岐アドレスを格
納する。同様に予約語NIMSの３文字目から４文字目まで
と、状態値「5」、「6」より文字列探索用テーブルを参
照し、状態値が「5」で文字要素が「M」の要素内に処理
部「処理6」、状態値が「6」で文字要素が「S」の要素
内に「処理7」への分岐アドレスを格納する。最後にデ
リミタ文字と状態値「7」より文字列探索用テーブルを
参照し、状態値が「7」で文字要素がデリミタの要素内
に処理部「処理NIMS」への分岐アドレスを格納する。こ
の結果得られた文字列探索用テーブルを図７に示す。[Table 4] Next, refer to the character string search table from the second character "I" of reserved word NIMS and the state value "1" (since the state value is "1" if the first character is "N"), and the state value The branch address to the processing unit “processing 5” is stored in the element having “1” and the character element “I”. Similarly, refer to the character string search table from the third to fourth characters of the reserved word NIMS and the state values "5" and "6", and find an element whose state value is "5" and whose character element is "M". The branch address to “processing 7” is stored in an element whose processing value is “processing 6”, the state value is “6”, and the character element is “S”. Finally, the character string search table is referred to based on the delimiter character and the state value “7”, and the branch address to the processing unit “processing NIMS” is stored in the delimiter element with the state value “7”. FIG. 7 shows a character string search table obtained as a result.

【００４１】同様に最後の予約語XYZについて考える。
直前に登録した予約語はNIMSである。前回と同様にＡ、
Ｂを求めると、Ａ＝３文字、Ｂ＝０文字なので、Ａ−Ｂ
＝３となり、新たに３つの状態値「8」、「9」、「10」
を設け、改めて状態値「0」から状態値「10」までを文
字列探索用テーブルの列方向の大きさとする。また各々
状態値を「8」、「9」、「10」に変更する各処理部「処
理8」、「処理9」、「処理10」を設ける。また予約語XY
Zを探索した場合の処理「処理XYZ」も設ける。これら処
理部を以下に示す。Similarly, consider the last reserved word XYZ.
The reserved word registered just before is NIMS. A, as before
When B is obtained, since A = 3 characters and B = 0 characters, AB
= 3, three new state values "8", "9", "10"
Is provided, and the size from the state value “0” to the state value “10” is set as the size in the column direction of the character string search table. In addition, processing units “processing 8”, “processing 9”, and “processing 10” for changing the state values to “8”, “9”, and “10” are provided. Also reserved word XY
A process “process XYZ” when Z is searched is also provided. These processing units are described below.

【００４２】[0042]

【表５】次に予約語XYZの１文字目「X」と状態値「0」より文字
列探索用テーブルを参照し、状態値が「0」で文字要素
が「X」の要素内に処理部「処理8」への分岐アドレスを
格納する。同様に予約語XYZの２文字目「Y」から３文字
目「X」と、状態値「8」、「9」より文字列探索用テー
ブルを各々参照して、状態値が「8」で文字要素が「Y」
の要素内に処理部「処理9」、状態値が「9」で文字要素
が「Z」の要素内に「処理10」への分岐アドレスを格納
する。次にデリミタの文字と状態値「10」より文字列探
索用テーブルを参照し、要素内に「処理XYZ」への分岐
アドレスを格納する。この結果得られた文字列探索用テ
ーブルを図８に示す。[Table 5] Next, the character string search table is referred from the first character “X” of the reserved word XYZ and the state value “0”, and the processing unit “processing 8” is included in the element whose state value is “0” and whose character element is “X”. Is stored. Similarly, referring to the character string search table from the second character "Y" to the third character "X" of the reserved word XYZ and the state values "8" and "9", the character with the state value "8" Element is "Y"
The branch address to “processing 10” is stored in the element having the processing unit “processing 9” in the element and the element having the status value “9” and the character element “Z”. Next, the character string search table is referenced from the character of the delimiter and the state value “10”, and the branch address to “processing XYZ” is stored in the element. FIG. 8 shows a character string search table obtained as a result.

【００４３】最後に図８の文字列探索用テーブルにおい
て、予約語NECおよびNETおよびNIMSおよびXYZを登録す
る際に使用しなかった他の全ての要素内に「処理0」へ
の分岐アドレスを格納する。これで予約語NEC、NET、NI
MS、XYZに対する文字列探索用テーブルの設定は終わり
である。Finally, in the character string search table of FIG. 8, the branch address to "processing 0" is stored in all the other elements not used when registering the reserved words NEC and NET, NIMS and XYZ. I do. This is a reserved word NEC, NET, NI
The setting of the character string search table for MS and XYZ is completed.

【００４４】図２を参照して一般化した文字列探索用テ
ーブルの作成手順を説明する。The procedure for creating a generalized character string search table will be described with reference to FIG.

【００４５】第１の手順（２０１）において、入力され
うる全ての文字種類を文字列探索用テーブルの行方向の
大きさとする。In the first procedure (201), all types of characters that can be input are set to the size in the row direction of the character string search table.

【００４６】第２の手順（２０２）において、初期状態
値「0」を用意して、状態値を初期状態値にする初期化
処理部「処理0」を用意する。In the second procedure (202), an initial state value "0" is prepared, and an initialization processing unit "process 0" for setting the state value to the initial state value is prepared.

【００４７】第３の手順（２０３）において、登録する
全ての予約語をソート基準に従って、昇順又は降順にソ
ートする。In the third step (203), all the reserved words to be registered are sorted in ascending or descending order according to the sorting criterion.

【００４８】第４の手順（２０４）において、第１の予
約語に対応した状態値を用意する。この状態値の数
「n」は第１の予約語の文字列長である。通常は状態値
は１から順に用意する。In the fourth procedure (204), a state value corresponding to the first reserved word is prepared. The number “n” of this state value is the character string length of the first reserved word. Normally, state values are prepared in order from 1.

【００４９】第５の手順（２０５）において、文字列探
索用テーブルの列方向の大きさを決定する。列は「0」
から「n」まであり、列方向の大きさは、第１の予約語
の文字列長＋１である。In the fifth procedure (205), the size of the character string search table in the column direction is determined. Column is "0"
To “n”, and the size in the column direction is the character string length of the first reserved word + 1.

【００５０】第６の手順（２０６）において、状態値変
更（更新）処理部を用意する。上記の例では、状態値を
「1」に変更（更新）する処理部「処理1」から状態値を
「3(=n)」に変更する処理部「処理3」までを用意する。In the sixth procedure (206), a state value change (update) processing unit is prepared. In the above example, the processing unit “processing 1” that changes (updates) the state value to “1” to the processing unit “processing 3” that changes the state value to “3 (= n)” is prepared.

【００５１】第７の手順（２０７）において、第１の予
約語を探索した場合の処理部を用意する。上記の例で
は、「処理NEC」を用意する。In the seventh procedure (207), a processing unit for the case where the first reserved word is searched is prepared. In the above example, “processing NEC” is prepared.

【００５２】第８の手順（２０８）において、文字列探
索用テーブルに各状態変更処理部への分岐アドレスを格
納する。上記の例では、例えば、状態値が「0」で文字
要素が「N」の要素内に「処理1」への分岐アドレスを格
納する。In the eighth procedure (208), the branch address to each state change processing unit is stored in the character string search table. In the above example, for example, the branch address to “processing 1” is stored in an element whose state value is “0” and whose character element is “N”.

【００５３】第９の手順（２０９）において、文字列探
索用テーブルのデリミタ文字を行、状態値「n」を列と
する要素内に第１の予約語を探索した場合の処理部への
分岐アドレスを格納する。上記の例では、状態値が
「3」で文字要素がデリミタの要素内に「処理NEC」への
分岐アドレス格納する。In the ninth procedure (209), a branch is made to the processing section when the first reserved word is searched for in the element having the row of delimiter characters in the character string search table and the state value "n" as a column. Store the address. In the above example, the state value is “3”, and the character element stores the branch address to “processing NEC” in the delimiter element.

【００５４】次の予約語がある場合に進む第１０の手順
（２１１）において、これから予約しようとする第ｍ
（ｍは２以上の整数）の予約語よりこの予約語の構成文
字数Ａを求める。In the tenth procedure (211) to proceed when there is a next reserved word, the m-th
The number A of characters constituting the reserved word is obtained from the reserved word (m is an integer of 2 or more).

【００５５】第１１の手順（２１２）において、第ｍの
予約語と第（ｍ−１）の予約語の頭文字から連続して一
致する文字の数Ｂを求める。In an eleventh step (212), the number B of consecutively matching characters is obtained from the first letter of the m-th reserved word and the (m-1) -th reserved word.

【００５６】第１２の手順（２１３）において、Ａ−Ｂ
個の状態値「n+1」から状態値「n+A-B」までを用意する
新規状態を用意する。In the twelfth procedure (213), AB
A new state is prepared in which the state values “n + 1” to “n + AB” are prepared.

【００５７】第１３の手順（２１４）において、文字列
探索用テーブルの列方向の大きさをＡ−Ｂだけ拡大し
て、範囲を状態値「0」から状態値「n+A-B」にして更新
する。In the thirteenth procedure (214), the size of the character string search table in the column direction is enlarged by AB, and the range is updated from the state value "0" to the state value "n + AB". I do.

【００５８】第１４の手順（２１５）において、状態値
を「n+1」に変更する処理部「処理n+1」から、状態値を
「n+A-B」に変更する処理部「処理n+A-B」までＡ−Ｂ個
の処理部を各々用意する上記の例では、例えば「NIMS」
を扱っているときに、状態値を「5」に更新する処理部
「処理5」などを用意する。In the fourteenth procedure (215), the processing unit “processing n + 1” for changing the state value to “n + AB” is changed from the processing unit “processing n + 1” for changing the state value to “n + 1”. In the above example of preparing AB processing units up to "AB", for example, "NIMS"
When processing is performed, a processing unit “processing 5” that updates the state value to “5” is prepared.

【００５９】第１５の手順（２１６）において、第ｍの
予約語を探索した場合の処理部を用意する。上記の例で
は、例えば、「処理NIMS」を用意する。In the fifteenth procedure (216), a processing unit is prepared for searching for the m-th reserved word. In the above example, for example, “processing NIMS” is prepared.

【００６０】第１６の手順（２１７）において、文字列
探索用テーブルにおいて、「予約語m」のＢ＋１文字目
を行、状態値「B」を列とする要素内に処理部「処理n+
1」への分岐アドレスを格納する。上記の例では、例え
ば「NIMS」を扱っているときに、「NIMS」の２文字目で
ある「I」を行、状態値「1」を列とする要素内に「処理
5」への分岐アドレスを格納する。In the sixteenth procedure (217), in the character string search table, the processing unit “processing n +” is included in an element having the B + 1st character of “reserved word m” as a row and the state value “B” as a column.
Store the branch address to "1". In the above example, for example, when handling “NIMS”, “I” which is the second character of “NIMS” is placed in a row, and “Process”
Store the branch address to "5".

【００６１】第１７の手順（２１８）において、文字列
探索用テーブルにおいて、「予約語m」のＢ＋２文字目
からＡ文字目までの各文字を行、前記状態値「n+1」か
ら「n+A-B-1」を列とする各々の要素内に「処理n+2」か
ら処理「n+A-B」への分岐アドレスを格納する。上記の
例では、例えば「NIMS」を扱っているときに、「NIMS」
の３文字目から４文字目までの各文字を行、状態値「5
(=4+1)」から状態値「6(=4+4-1-1)」を列とする各々の
要素内に「処理6」から「処理7」への分岐アドレスを格
納する。In the seventeenth procedure (218), in the character string search table, the characters from the B + 2nd character to the Ath character of the “reserved word m” are lined, and the state values “n + 1” to “n” are written. The branch address from "process n + 2" to process "n + AB" is stored in each element having "+ AB-1" as a column. In the above example, for example, when dealing with "NIMS", "NIMS"
Lines each character from the third character to the fourth character of
(= 4 + 1) ”and a branch address from“ process 6 ”to“ process 7 ”is stored in each element having the state value“ 6 (= 4 + 4-1-1) ”as a column.

【００６２】第１８の手順（２１９）において、文字列
探索用テーブルにおいて、デリミタ文字を行、状態値
「n+A-B」を列とする要素内に処理「処理予約語m」への
アドレスを格納する。上記の例では、例えば「NIMS」を
扱っているときに、デリミタ文字を行、状態値「7(=4+4
-1)」を列とする要素内に「処理NIMS」へのアドレスを
格納する。In the eighteenth procedure (219), in the character string search table, the address of the processing “processing reserved word m” is stored in the element having the delimiter character as a row and the state value “n + AB” as a column. I do. In the above example, for example, when handling "NIMS", the delimiter character is lined, and the status value "7 (= 4 + 4
-1) ”, the address to the“ processing NIMS ”is stored in the element whose column is“ ”.

【００６３】第１９の手順（２２０）において、状態値
「n」に状態値「n+A-B」を代入する。上記の例では、例
えば「NET」を扱っているときに状態値「3」に状態値
「4(=3+3-2)」を代入する。In the nineteenth procedure (220), the state value “n + AB” is substituted for the state value “n”. In the above example, for example, when “NET” is handled, the status value “4 (= 3 + 3−2)” is substituted for the status value “3”.

【００６４】次の予約語がない場合にすすむ第２０の手
順（２２１）において、文字列探索用テーブルの要素内
のうち、何も格納していない要素内に処理部「処理０」
への分岐アドレスを格納する。In the twentieth procedure (221), which proceeds when there is no next reserved word, among the elements of the character string search table, the processing unit "processing 0" is placed in an element storing nothing.
Stores the branch address to

【００６５】次に本実施形態の文字列探索の処理フロー
について述べる。図３が本実施形態における文字列探索
方法の処理フローである。まず３００にて状態値を
「0」に初期化する。ここで例として文字列NEC、次いで
デリミタの空白文字が入力される場合を考える。はじめ
に読み出し処理部３０１にて文字「N」が読み込まれ
る。Next, the processing flow of the character string search of this embodiment will be described. FIG. 3 is a processing flow of the character string search method according to the present embodiment. First, at 300, the state value is initialized to "0". Here, as an example, consider the case where a character string NEC and then a delimiter blank character are input. First, the reading processing unit 301 reads the character “N”.

【００６６】次に３０２にて図６に示す文字列探索用テ
ーブル内の文字要素が「N」で状態値要素が「0」の位置
に格納されている「処理1」のアドレスを得て、３０３
の中の一つの処理「処理1」にジャンプし、「処理1」に
おいて状態値を「1」に更新して、再び３０１の読み出
し処理部へ戻る。同様に、３０２にて図６に示す文字列
探索用テーブル内の文字要素が次の入力文字「E」で状
態値要素が「1」の位置に格納されている「処理2」のア
ドレスを得て、３０２の「処理2」へジャンプし、「処
理2」において状態値を「2」に更新して再び３０１の読
み出し処理部へ戻る。Next, at 302, the address of the "process 1" stored at the position where the character element is "N" and the state value element is "0" in the character string search table shown in FIG. 303
Jumps to one of the processes “process 1”, updates the state value to “1” in “process 1”, and returns to the read processing unit 301 again. Similarly, at 302, the address of "processing 2" in which the character element in the character string search table shown in FIG. 6 is the next input character "E" and the state value element is stored at the position "1" is obtained. Then, the process jumps to “Process 2” of 302, updates the state value to “2” in “Process 2”, and returns to the read processing unit of 301 again.

【００６７】更に、３０２にて図６に示す文字列探索用
テーブル内の文字要素が次の入力文字「C」で状態値要
素が「2」の位置に格納されている「処理3」のアドレス
を得て、３０３の「処理3」へジャンプし、「処理3」に
おいて状態値を「3」に更新して再び３０１の読み出し
処理部へ戻る。Further, at 302, the address of "process 3" in which the character element in the character string search table shown in FIG. 6 is stored at the position of the next input character "C" and the state value element of "2" Then, the process jumps to “Process 3” of 303, updates the state value to “3” in “Process 3”, and returns to the read processing unit 301 again.

【００６８】更に、３０２にて図６に示す文字列探索用
テーブル内の文字要素が次の入力文字「空白文字」で状
態値要素が「3」の位置に格納されている「処理NEC」の
アドレスを得て、３０３の「処理NEC」へジャンプす
る。「処理NEC」にジャンプした時点で予約語NECを探索
したことになる。Further, at 302, the character element in the character string search table shown in FIG. 6 is the next input character "blank character" and the state value element is stored at the position of "3". The address is obtained, and the process jumps to “processing NEC” 303. When jumping to “processing NEC”, the reserved word NEC has been searched.

【００６９】次に、「処理NEC」内の予約語NECを探索し
た場合の処理を実行し、状態値を「0」に更新して、再
び３０１の文字読み出し部に戻り、次の文字列の探索に
備える。Next, the processing for searching for the reserved word NEC in the "processing NEC" is executed, the state value is updated to "0", and the process returns to the character reading unit 301 again to return to the next character string. Prepare for the search.

【００７０】また別の例として予約語に含まれない文字
列「ABC」が入力された場合を考える。プログラムの初
期状態において、状態値は「0」である。はじめに３０
１の読み出し処理部にて文字「A」が読み込まれる。次
に３０２にて図６の示す文字列探索用テーブル内の文字
要素が「A」で状態値要素が「0」の位置に格納されてい
るアドレス「処理0」を得て、３０３の「処理0」にジャ
ンプし、「処理0」において状態値を「0」に更新して、
再び読み出し処理部へ戻る。すなわち状態値は「0」か
ら変わらない。以降の入力文字「B」、「C」に関しても
同様であるため、入力された文字列ＡＢＣは無視される
のである。As another example, consider a case where a character string "ABC" not included in a reserved word is input. In the initial state of the program, the state value is “0”. Introduction 30
The character “A” is read by the read processing unit 1. Next, at 302, the address "processing 0" stored at the position where the character element is "A" and the state value element is "0" in the character string search table shown in FIG. Jump to "0", update the status value to "0" in "Process 0",
The process returns to the read processing unit again. That is, the state value does not change from “0”. The same applies to the subsequent input characters "B" and "C", so that the input character string ABC is ignored.

【００７１】文字列探索の処理フローを一般化すると図
１に示すようになる。図１において、符号１００から符
号１０２で示す処理ブロックは、符号３００から符号３
０２で示す処理ブロックと同一である。「処理M+1」
（符号１０５）は、「処理NEC」、「処理NET」、「処理
NIMS」、「処理XYZ」に対応する。「処理N」は、「処理
10」に対応する。FIG. 1 shows a generalized processing flow of the character string search. In FIG. 1, processing blocks denoted by reference numerals 100 to 102 correspond to reference numerals 300 to 3 respectively.
It is the same as the processing block indicated by 02. "Process M + 1"
(Reference numeral 105) indicates “processing NEC”, “processing NET”, “processing
NIMS "and" Process XYZ ". "Process N"
10 ".

【００７２】なお、本実施形態においては、扱う文字を
ASCIIコード系のものとしたが、文字はこの体系のもの
に限定されるものではなく、ＪＩＳ漢字、UNICODEな
ど、どのような体系の文字にも本発明を適用することが
できる。In this embodiment, the characters to be handled are
The ASCII code system is used, but the characters are not limited to this system, and the present invention can be applied to characters in any system such as JIS kanji and UNICODE.

【００７３】また、本実施形態における文字列探索方
法、及び文字列探索用テーブル作成方法は、コンピュー
タが実行可能なプログラムにより実現され、これはコン
ピュータで読み取り可能な記録媒体に記録される。The character string search method and the character string search table creation method in this embodiment are realized by a computer-executable program, which is recorded on a computer-readable recording medium.

【００７４】また、本実施形態における文字列探索用テ
ーブルはコンピュータで読み取り可能な記録媒体に記録
さる。この文字列探索用テーブルは、コンピュータが本
実施形態による作成方法のプログラムで作成し、記録媒
体に記録し、コンピュータが本実施形態による探索方法
のプログラムを実行する際に記録媒体より読み取り、使
用する。The character string search table in this embodiment is recorded on a computer-readable recording medium. This character string search table is created by the computer using the program of the creation method according to the present embodiment, recorded on a recording medium, and read from the recording medium and used when the computer executes the program of the search method according to the present embodiment. .

【００７５】[0075]

【発明の効果】以上説明したように本発明によれば、ハ
ッシュ探索方法と比較して高速に文字列の探索ができる
ことである。As described above, according to the present invention, a character string can be searched at a higher speed than in a hash search method.

【００７６】その理由は、本発明による文字列探索方法
は入力文字およびその時の状態値を文字列探索用テーブ
ルの配列の要素の位置情報として、状態要素値と文字要
素値で指定される位置にある要素内として格納されてい
る分岐先処理のアドレスを参照し分岐し、分岐先の処理
を繰り返すことで文字列の認識を行うため、ＣＰＵ演算
の中で時間がかかる乗算や除算などの演算命令を使用し
ないためである。また入力された文字列は一回の探索に
よって、予約語かどうか判断できるため高速に文字列の
探索ができる。The reason is that the character string search method according to the present invention uses the input character and the state value at that time as the position information of the elements of the array of the character string search table, at the position specified by the state element value and the character element value. Operation instructions such as multiplication and division, which take a long time in CPU operation, are performed by recognizing a character string by branching by referring to the address of the branch destination process stored in a certain element and repeating the branch destination process. Is not used. Further, the input character string can be determined as a reserved word by a single search, so that the character string can be searched at high speed.

【００７７】ここで目安として、入力された文字列を整
数化して値Ｍの素数で割り、その余りを関数値とする簡
単な図１０に示すハッシュ検索方法と本発明による文字
列探索方法を比較して、本発明による文字列探索方法が
どの位高速になるか述べる。Here, as a guide, a comparison is made between the simple hash search method shown in FIG. 10 and the character string search method according to the present invention, in which the input character string is converted to an integer, divided by the prime number of the value M, and the remainder is used as a function value. Then, how fast the character string search method according to the present invention is will be described.

【００７８】ＮＥＣのμＰＤ７０２０８ＣＰＵを例に取
ると、各演算に必要なクロック数は以下の通りである。Taking the μPD70208 CPU of NEC as an example, the number of clocks required for each operation is as follows.

【００７９】加算：１４クロック減算：１４クロック乗算：３９クロック除算：３４クロック比較：１６クロック８ビット左シフト：１５クロック分岐：１５クロックこれらより、ハッシュ探索方法に必要なＣＰＵ時間と当
方法の探索方法に必要なＣＰＵ時間を比較すると以下の
ようになる。Addition: 14 clocks Subtraction: 14 clocks Multiplication: 39 clocks Division: 34 clocks Comparison: 16 clocks 8 bits left shift: 15 clocks Branch: 15 clocks From these, the CPU time required for the hash search method and the search for this method The comparison of the CPU time required for the method is as follows.

【００８０】ハッシュ探索方法におけるＣＰＵ時間＝７７クロック×文字数（左８ビットシフト演算１５クロック×１回＝１５クロック）（加算１４クロック×２回＝２８クロック）（除算３４クロック×１回＝３４クロック）本発明による探索方法におけるＣＰＵ時間＝３０クロック×文字数（分岐１５クロック×２回＝３０クロック）よって本発明による探索方法を使用するとＣＰＵ時間が
およそ１／２に短縮され高速に文字列の探索を行うこと
が可能となる。また、ハッシュ関数が複数になり演算が
増えれば増えるほど両者の差が開く。CPU time in the hash search method = 77 clocks × number of characters (left 8 bit shift operation 15 clocks × 1 time = 15 clocks) (addition 14 clocks × 2 times = 28 clocks) (division 34 clocks × 1 time = 34 clocks) CPU time in the search method according to the present invention = 30 clocks × the number of characters (15 clocks × 2 times = 30 clocks) Therefore, when the search method according to the present invention is used, the CPU time is reduced to about ２ and the character string is searched at high speed. Can be performed. Also, the more the number of hash functions and the number of operations increase, the more the difference between them increases.

[Brief description of the drawings]

【図１】本発明による文字列探索用テーブルを用いた文
字列探索方法のフローチャートである。FIG. 1 is a flowchart of a character string search method using a character string search table according to the present invention.

【図２】本発明による文字列探索用テーブルを作成する
方法のフローチャートである。FIG. 2 is a flowchart of a method for creating a character string search table according to the present invention.

【図３】本発明の実施形態における文字列探索用テーブ
ルを用いた文字列探索方法のフローチャートである。FIG. 3 is a flowchart of a character string search method using a character string search table according to the embodiment of the present invention.

【図４】本発明の実施形態における予約語「NEC」を探
索するために必要になる文字列探索用テーブルのテーブ
ルサイズを示す図である。FIG. 4 is a diagram showing a table size of a character string search table required for searching for a reserved word “NEC” in the embodiment of the present invention.

【図５】本発明の実施形態における予約語「NEC」を探
索するための文字列探索用テーブルを示す図である。FIG. 5 is a diagram showing a character string search table for searching for a reserved word “NEC” in the embodiment of the present invention.

【図６】本発明の実施形態における予約語「NEC」及び
「NET」を探索するための文字列探索用テーブルを示す
図である。FIG. 6 is a diagram showing a character string search table for searching for reserved words “NEC” and “NET” in the embodiment of the present invention.

【図７】本発明の実施形態における予約語「NEC」、「N
ET」及び「NIMS」を探索するための文字列探索用テーブ
ルを示す図である。FIG. 7 shows reserved words “NEC” and “N” in the embodiment of the present invention.
It is a figure which shows the table for character string search for searching for "ET" and "NIMS".

【図８】本発明の実施形態における予約語「NEC」、「N
ET」、「NIMS」及び「XYZ」を探索するための文字列探
索用テーブルを示す図である。FIG. 8 shows reserved words “NEC” and “N” in the embodiment of the present invention.
It is a figure which shows the table for character string search for searching for "ET", "NIMS", and "XYZ".

【図９】ハッシュ探索方法におけるハッシュ表作成方法
のフローチャートである。FIG. 9 is a flowchart of a hash table creation method in the hash search method.

【図１０】ハッシュ関数内部の処理フローチャートであ
る。FIG. 10 is a processing flowchart inside a hash function.

【図１１】ハッシュ探索方法のフローチャートである。FIG. 11 is a flowchart of a hash search method.

[Explanation of symbols]

１００初期化１０１読み出し処理部１０２ジャンプ１０３処理０１０４処理１１０５処理Ｍ＋１１０６処理Ｎ 100 initialization 101 read processing unit 102 jump 103 processing 0 104 processing 1 105 processing M + 1 106 processing N

フロントページの続き (56)参考文献特開平４−348469（ＪＰ，Ａ) 特開昭63−311530（ＪＰ，Ａ) 特開平３−75869（ＪＰ，Ａ) 特開平２−255985（ＪＰ，Ａ) 青江順一，「トライとその応用」，情報処理Ｖｏｌ．34，Ｎｏ．２，244−251 （平成５年２月15日) 伊藤哲郎著，「ソフトウェア講座19 情報検索」，ｐｐ99−100，昭晃堂（昭和61年８月10日) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/30 Continuation of the front page (56) References JP-A-4-348469 (JP, A) JP-A-63-31530 (JP, A) JP-A-3-75869 (JP, A) JP-A-2-255985 (JP) , A) Junichi Aoe, "Tries and Applications", Information Processing Vol. 34, no. 2,244-251 (February 15, 1993) Tetsuro Ito, "Software Course 19 Information Retrieval," pp 99-100, Shokodo (August 10, 1986) (58) Fields surveyed (58) Int.Cl. ⁷ , DB name) G06F 17/30

Claims

(57) [Claims]

1. All characters that can be input are defined as lines,
This is a table of a two-dimensional array structure in which status value types indicating a character string search processing state are set as columns, and a processing procedure for changing, in each array element, a state value determined in advance according to a search processing state of all reserved words A character string search table storing a branch destination address to branch to, or a branch destination address to branch to a processing procedure when the reserved word is searched, wherein a reserved word having a common halfway character is stored. In the method for creating a character string search table for creating a character having a common character string search processing state with respect to a common character, all character types that can be input are set to the size in the line direction of the character string search table. A second procedure of preparing a processing value “0” indicating an initial value of the character string search situation and a processing unit “processing 0” for changing the state value to “0”; Register A third procedure for sorting all reserved words, extracting a first reserved word “reserved word 1” from all the sorted reserved words, and extracting a first reserved word to an n-th character constituting the “reserved word 1” (N is the character string length of the reserved word 1), a fourth procedure for preparing state values “1” to “n” from the state values “1” to “n”, respectively. A fifth procedure for setting the size of the search table in the column direction, and a processing unit “processing 1” for changing the state value to “1” to a processing unit “processing n” for changing the state value to “n” a sixth procedure for preparing n processing units; a seventh procedure for preparing a process “processing reserved word 1” when a reserved word “reserved word 1” is searched; 1 of "reserved word 1"
The branch address from "Process 1" to "Process n" is set in each element in which each character from the character to the n-th character is a line, and the status value is from "0" to "n-1" as a column. 8th to store
A ninth procedure of storing the address of the processing “processing reserved word 1” in an element having a delimiter character as a row and a state value “n” as a column in the character string search table; If there is a subsequent reserved word “reserved word m”, it is retrieved and the tenth character number A of the reserved word “m” is obtained.
And an eleventh procedure in which the reserved word “reserved word m” and the reserved word “reserved word m−1” are compared to obtain the number B of the same characters consecutive from the first character to the n-th character of both reserved words. , AB, the state value “n + 1” is newly changed from the state value “n + 1”.
A twelfth procedure for preparing up to "+ AB"; and a thirteenth procedure for again setting the size from the state value "0" to the state value "n + AB" in the column direction of the character string search table.
And the processing unit “process n” that changes the state value to “n + AB” from the processing unit “process n + 1” that changes the state value to “n + 1”
+ A-B "to prepare AB processing units, respectively.
And a fifteenth procedure for preparing a process “process reserved word m” when the reserved word “reserved word m” is searched; and in the character string search table, the “reserved word m”
The first address stores the branch address to the processing unit “processing n + 1” in an element having the B + 1st character as a row and the status value “B” as a column.
6. In the character string search table, the "reserved word m"
From the (process n + 2) to the process (n + AB) in each element having the B + 2 to A-th characters as rows and the state values "n + 1" to "n + AB-1" as columns. And a seventeenth procedure for storing the branch address of the process. In the character string search table, the address of the process “processing reserved word m” is set in an element having a row of delimiter characters and a column of the state value “n + AB”. An eighteenth procedure for storing, and a nineteenth procedure for substituting the state value “n + AB” for the state value “n”
Steps 10 to 19 are repeated for all the remaining reserved words that have been sorted, and after all the reserved words have been registered, among the elements of the character string search table, A twentieth procedure of storing a branch address to the processing unit “processing 0” in an element storing nothing.

2. A computer-readable recording medium on which a program for causing a computer to execute the method for creating a character string search table according to claim 1 is recorded.

3. The first to twentieth procedures of the method for creating a character string search table according to claim 1, a twenty-first procedure for initializing a state value to 0, and a step of obtaining input characters one by one. A twenty-third procedure, referring to the character string search table using the obtained input character as a line and a state value as a column, and branching to an address stored in an element, A twenty-fourth procedure for performing a state value changing process or a reserved word matching process and returning to the twenty-second procedure again.

4. A computer-readable recording medium on which a program for causing a computer to execute the character string search method according to claim 3 is recorded.

5. A computer-readable recording medium on which a character string search table created by the method of claim 1 is recorded.