JPH06274701A

JPH06274701A - Word collating device

Info

Publication number: JPH06274701A
Application number: JP5062182A
Authority: JP
Inventors: Keiko Hara; 恵子原; Tsuyoshi Kitani; 強木谷; Toshiyuki Yoshida; 敏之吉田
Original assignee: N T T DATA TSUSHIN KK; NTT Data Communications Systems Corp
Current assignee: N T T DATA TSUSHIN KK; NTT Data Corp
Priority date: 1993-03-22
Filing date: 1993-03-22
Publication date: 1994-09-30

Abstract

PURPOSE:To provide a word collating device which can retrieve at a high speed such a word that includes the correct characters in a character string of combination of recognized candidate characters in number more than a fixed rate in regard of those recognized candidate characters supplied from a character recognizing device in particular. CONSTITUTION:In regard of each of recognized candidate characters which are supplied to a word retrieval processor 1, the word numbers of these candidate characters are read out the corresponding 1st-N-th character word tables and then registered in a word register table 7 under the control of a table registration processing part 2. Then the word numbers registered in the table 7 are sorted and the emerging frequency of each word is calculated. Then the umber of characters of a word having a large emerging frequency is read out of a character number table 9. This character number is compared with the word emerging frequency by an output word decision processing part 4. If the coincidence between these number and frequency exceeds a prescribed degree, the part 4 decides that those component characters are recognized.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は単語照合装置に係り、特
に文字認識装置から出力される認識候補文字に対して、
認識候補文字の各文字ごとに照合を行い正しい単語を検
索する単語照合装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a word collating device, and more particularly to a recognition candidate character output from a character recognizing device.
The present invention relates to a word matching device that matches each character of a recognition candidate character and searches for a correct word.

【０００２】[0002]

【従来の技術】ＣＣＤ等のセンサで読み取った文字列に
対して文字認識装置で認識処理を行うと、一の文字列に
対し似通った複数個の認識候補文字が出力される。かか
る文字列に対し、正しい文字を決定する為、単語辞書と
の照合を行っている。例えば、従来の単語照合装置とし
て、認識候補文字が読み出されるとその文字列に対し、
予め多数の単語が登録されたテーブルを検索し、一致す
る単語の読み出し処理を行っている。しかし、必ずしも
認識候補文字を構成する全ての文字が、読み出そうとす
る単語と一致するわけではない。この為、例えば１文字
少ない文字について一致していれば目的の単語として検
索していた。2. Description of the Related Art When a character recognition device performs recognition processing on a character string read by a sensor such as a CCD, a plurality of recognition candidate characters similar to one character string are output. The character string is collated with the word dictionary to determine the correct character. For example, as a conventional word matching device, when a recognition candidate character is read,
A table in which a large number of words are registered is searched in advance and the matching word is read out. However, not all the characters forming the recognition candidate character match the word to be read. For this reason, for example, if there is a match with one less character, it is searched as the target word.

【０００３】[0003]

【発明が解決しようとする課題】上述した従来の単語照
合装置では、ひとつの文字列に対してテーブルから検索
すべき単語は膨大な数となり、検索に時間を要する。In the above-mentioned conventional word collating device, the number of words to be searched from the table for one character string is huge, and it takes time to search.

【０００４】本発明は、上記従来の実情に鑑みてなされ
たものであり、その目的とするところは、単語検索を高
速で行うことを可能とした単語照合装置を提供すること
である。The present invention has been made in view of the above conventional circumstances, and an object of the present invention is to provide a word collation device capable of performing word search at high speed.

【０００５】[0005]

【課題を解決するための手段】本発明の単語照合装置の
ひとつの構成例としては、所定の単語について単語番号
が設定され、該単語を構成する各文字毎に前記単語番号
が登録され、例えば４文字構成の単語であれば、第１文
字目〜第４文字目まで各文字毎にテーブルを有し、各テ
ーブルにその文字を含む単語番号が登録されている文字
位置別単語テーブル手段と、例えば文字認識装置から出
力された目的の単語に似通った認識候補文字列を入力す
る入力手段と、上記入力手段から入力した認識候補文字
列に対し、前記文字位置別単語テーブル手段を検索し、
前記認識候補文字列の各文字ごとの単語番号を読み出す
単語番号検索手段と、上記単語番号検索手段で検索した
単語番号の出現回数を求め、その出現回数は各文字毎に
ついて検索した単語番号のトータル数であり、例えば４
文字構成の認識候補文字であった場合には４文字につい
て検索した単語番号のトータル数を求める単語番号出現
回数算出手段と、上記単語出現回数算出手段で算出した
単語番号出現回数に基づき、出現回数の多い単語番号の
単語の構成文字数とその出現回数を比較し、その一致度
によって単語を決定する出力単語決定手段とで構成され
ている。As one configuration example of the word collating apparatus of the present invention, a word number is set for a predetermined word, and the word number is registered for each character forming the word. In the case of a word having a four-character structure, there is a table for each character from the first character to the fourth character, and the word table means for each character position in which a word number including the character is registered in each table, For example, the input means for inputting a recognition candidate character string similar to the target word output from the character recognition device, and the recognition candidate character string input from the input means, search the character position by word table means,
The word number search means for reading out the word number of each character of the recognition candidate character string and the number of appearances of the word number searched by the word number search means are obtained, and the number of appearances is the total of the word numbers searched for each character. Is a number, for example 4
When it is a recognition candidate character having a character structure, the number of appearances is calculated on the basis of the number of appearances of word numbers for obtaining the total number of word numbers searched for four characters, and the number of appearances of word numbers calculated by the above-mentioned number of appearances calculation means. The number of constituent characters of a word having a large number of words and the number of appearances of the word are compared, and the output word determining means determines the word based on the degree of coincidence.

【０００６】また、本発明の単語照合装置の他の構成例
としては、上記と同様な文字位置別単語テーブル手段
と、入力手段とを有すると共に、上記入力手段から入力
した認識候補文字列に対し、上記文字位置別単語テーブ
ル手段を検索し、ハッシュ関数を用いてテーブルアドレ
スを決定し、検索した単語とその出現回数をテーブルに
登録するテーブル登録処理手段と、単語出現回数と単語
構成文字数を比較して得る一致度によって単語を決定す
る出力単語決定手段とで構成されている。Another example of the structure of the word collating device of the present invention is to have a word position-based word table means similar to the above and an input means, and to recognize a recognition candidate character string input from the input means. , Searching the word table for each character position, determining the table address using a hash function, and comparing the word occurrence count and the number of word constituent characters with table registration processing means for registering the searched word and its appearance count in the table And an output word determining means for determining a word according to the degree of coincidence obtained.

【０００７】[0007]

【作用】本発明の単語照合装置は、認識候補文字を構成
する各文字について、その文字を含む単語が出現する度
にその単語番号をテーブルに登録し、最後にそのテーブ
ルを単語番号でソートし、単語出現回数を得、目的の単
語の構成文字数と単語出現回数とを比較し、一定の一致
度に基づき単語を決定する単語照合装置であり、これに
より、単語検索処理を高速に行うことができる。The word collating device of the present invention registers the word number of each character forming the recognition candidate character in the table each time a word including the character appears, and finally sorts the table by the word number. , A word collating device that obtains the number of word appearances, compares the number of constituent characters of a target word with the number of word appearances, and determines a word based on a certain degree of matching, which enables high-speed word search processing. it can.

【０００８】また、単語出現回数をハッシュ関数を用い
て行い、テーブル登録処理手段の指定するアドレスに書
き込むことにより、単語検索処理をさらに高速に行うこ
とができる。Further, by performing the word appearance count using a hash function and writing it to the address designated by the table registration processing means, the word search processing can be performed at a higher speed.

【０００９】[0009]

【実施例】以下、本発明の一実施例について、図面を参
照しながら説明する。図１は本実施例の単語照合装置を
説明するシステム構成図である。同図において、入力手
段及び単語番号検索手段としての単語検索処理部１に
は、不図示の文字認識装置から出力される文字データが
入力する。文字認識装置は、例えば手書き文字等をセン
サにより読み取り、読み出された文字を解析し、似通っ
た文字を出力する。単語検索処理部１は文字認識装置か
ら出力されるこのような文字列中で所定の文字列を切り
出し、この文字列を認識候補文字列とする。また、この
単語検索処理部１はメモリ部５に接続されており、メモ
リ部５内の後述する文字位置別単語テーブル６を参照
し、上述の候補文字列の各文字毎に単語番号を検索す
る。すなわち、候補文字列の中の第１文字目から第Ｎ文
字目まで、各文字毎に単語検索を行う。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a system configuration diagram illustrating a word matching device according to the present embodiment. In the figure, character data output from a character recognition device (not shown) is input to a word search processing unit 1 as an input unit and a word number search unit. The character recognition device reads, for example, handwritten characters by a sensor, analyzes the read characters, and outputs similar characters. The word search processing unit 1 cuts out a predetermined character string from such a character string output from the character recognition device, and sets this character string as a recognition candidate character string. The word search processing unit 1 is connected to the memory unit 5 and refers to a character position-based word table 6 in the memory unit 5 described later to search a word number for each character of the above candidate character string. . That is, a word search is performed for each character from the first character to the Nth character in the candidate character string.

【００１０】テーブル登録処理部２は、上述の単語検索
処理部１から出力される単語番号を、検索された順にメ
モリ部５内の単語登録テーブル７に登録する。尚、この
テーブル登録処理部２が行う、単語登録テーブル７への
単語番号の登録は、後述するように各文字毎に行う。The table registration processing unit 2 registers the word numbers output from the word search processing unit 1 in the word registration table 7 in the memory unit 5 in the order in which they are searched. The registration of the word number in the word registration table 7 performed by the table registration processing unit 2 is performed for each character as described later.

【００１１】単語番号出現回数算出手段としての単語出
現回数算出部３は、上述の単語登録テーブル７に接続さ
れており、単語登録テーブル７に登録された単語番号を
読み出し、対応する単語番号の出現回数を算出する。こ
の算出処理は、認識候補文字列の第１文字目から第Ｎ文
字目までの全ての検索処理により得られた単語番号に対
する算出処理である。この単語出現回数算出部３の演算
結果は出力単語決定処理部４へ出力される。The word appearance number calculation unit 3 as the word number appearance number calculation means is connected to the above-mentioned word registration table 7, reads the word number registered in the word registration table 7, and appears the corresponding word number. Calculate the number of times. This calculation process is a calculation process for the word numbers obtained by all the search processes from the first character to the Nth character of the recognition candidate character string. The calculation result of the word appearance frequency calculation unit 3 is output to the output word determination processing unit 4.

【００１２】出力単語決定手段としての出力単語決定処
理部４は、メモリ部５に設けられた表記テーブル８、文
字数テーブル９、最小一致文字数テーブル１０に接続さ
れ、単語出現回数算出部３から出力される演算結果から
認識候補文字が何の単語に該当するか決定する。The output word determination processing unit 4 as output word determination means is connected to the notation table 8, the character number table 9, and the minimum matching character number table 10 provided in the memory unit 5, and is output from the word appearance frequency calculation unit 3. Which word the recognition candidate character corresponds to is determined from the calculation result.

【００１３】図２は上述のメモリ部５内に設けられた文
字位置別単語テーブル６の具体的なメモリ構成を説明す
る図である。文字位置別単語テーブル手段としての文字
位置別単語テーブル６は同図の（ａ）に示す第１文字目
単語テーブルから同図の（ｂ）に示す第Ｎ文字目単語テ
ーブルまでのＮ個の単語テーブルで構成され、各単語テ
ーブルはキー文字情報部６ａと単語番号情報部６ｂで構
成されている。キー文字情報部６ａは同図の（ａ）、
（ｂ）に示すように、キー文字エリア、単語個数エリ
ア、単語番号情報へのポインタエリアで構成されてい
る。このキー文字エリアには予め登録された単語の第１
文字目の文字が記憶され、例えば単語「歩き疲れ」、
「書き換え」等の第１文字目の文字に該当する「歩」、
「書」などの文字が登録されている。また、単語個数エ
リアには上述のキー文字エリアに記憶された文字「歩」
を含む単語、例えば「歩き疲れ」、「歩き過ぎ」、「歩
いて」など、文字位置別単語テーブル６に登録された単
語の中で、第１文字目に「歩」を含む単語の個数が登録
されている。すなわち、同図の（ａ）に示す“７８”
は、第１文字目に文字「歩」を含む単語が７８個登録さ
れていることを意味する。また、例えばこのキー文字エ
リアに登録された文字が「書」の場合、「書き換え」、
「書き間違え」など、第１文字目に「書」を含む単語が
この文字位置別単語テーブル６に４５個登録されている
ことを意味する。FIG. 2 is a diagram for explaining a specific memory configuration of the word position-specific word table 6 provided in the memory section 5 described above. The character position-specific word table 6 as the character position-specific word table means has N words from the first character word table shown in (a) of the figure to the Nth character word table shown in (b) of the figure. Each word table is composed of a key character information section 6a and a word number information section 6b. The key character information part 6a is (a) in FIG.
As shown in (b), it is composed of a key character area, a word count area, and a pointer area for word number information. In this key character area, the first of the words registered in advance is
The letters of the letters are stored, for example, the word "walking tired",
"Step" corresponding to the first character such as "rewrite",
Characters such as "calligraphy" are registered. Also, in the word count area, the character "step" stored in the key character area described above is used.
Among words registered in the character position-specific word table 6, such as “walking tired”, “walking too much”, and “walking”, the number of words including “step” as the first character is registered. Has been done. That is, “78” shown in FIG.
Means that 78 words including the character “step” as the first character are registered. In addition, for example, when the character registered in this key character area is "writing", "rewrite",
This means that 45 words including “writing” as the first character such as “wrong writing” are registered in the character position-based word table 6.

【００１４】上述の文字列「歩き始め」、「書き換え」
などの文字列には、具体的に単語番号が予め付与されて
いて、この単語番号は具体的には上述の単語番号情報部
６ｂに記憶されている。例えば、文字列「書き換え」に
は単語番号“１００”が付与され、文字列が「歩き始
め」には単語番号“３５”が付与されている。また、こ
の単語番号情報部６ｂを検索する為のアドレスは、上述
の単語番号情報へのポインタエリアに記憶されたポイン
タデータにより指定される。例えば文字「歩」の場合、
対応する単語番号情報へのポイントデータ“１９０４
８”は、同図の（ａ）に示す単語番号情報部６ｂの最上
部のアドレスを指定するデータである。同様に、文字
「書」に対応する単語番号情報へのポイントデータ“２
０６８４”は、単語番号情報部６ｂの例えば２番目のア
ドレスを指定するデータである。The above-mentioned character strings "beginning of walking" and "rewriting"
A specific word number is given in advance to the character string such as, and this word number is specifically stored in the above-mentioned word number information section 6b. For example, the word number “100” is given to the character string “rewriting”, and the word number “35” is given to the character string “beginning of walking”. The address for searching the word number information section 6b is designated by the pointer data stored in the above-mentioned pointer area for word number information. For example, in the case of the character "Ayu",
Point data for corresponding word number information “1904
8 "is data designating the uppermost address of the word number information portion 6b shown in (a) of the same figure. Similarly, point data" 2 "for word number information corresponding to the character" calligraphy "
"0684" is data designating, for example, the second address of the word number information section 6b.

【００１５】以上の構成は、第１文字目単語テーブルの
テーブル構成を説明するものであるが、第２文字目〜第
Ｎ文字目までの単語テーブルについても同様である。一
方、図３は上述の表記テーブル８の構成を示す図であ
る。表記テーブル８は、文字位置別単語テーブル６に記
憶された文字列の文字情報を記憶しており、例えば「あ
うん」、「あさ」、「あの」、「あれ」、・・・上述の
「歩き疲れ」、・・・、「書き換え」などの単語情報が
登録されている。これらの各単語には前述のように単語
番号が付与されており、この単語番号と単語の構成文字
数、及びこの表記テーブル８へのアドレスは、図４に示
す文字数テーブル９に登録されている。例えば、上述の
ように文字列「書き換え」は、単語番号“１００”であ
り、文字数テーブル９の単語番号“１００”が判れば、
その文字数と共に、表記テーブル８へのポイントデータ
により表記テーブル８を参照して「書き換え」という読
みも判る構成である。The above-mentioned configuration explains the table configuration of the first character word table, but the same applies to the word table of the second character to the Nth character. On the other hand, FIG. 3 is a diagram showing the configuration of the notation table 8 described above. The notation table 8 stores the character information of the character strings stored in the character position-specific word table 6, and for example, “Aun”, “Asa”, “Ano”, “Are”, ... , ..., “Rewrite” and other word information are registered. As described above, each word is given a word number, and the word number, the number of constituent characters of the word, and the address to the notation table 8 are registered in the character number table 9 shown in FIG. For example, as described above, the character string “rewrite” is the word number “100”, and if the word number “100” in the character number table 9 is known,
Along with the number of characters, by referring to the notation table 8 by the point data to the notation table 8, the reading "rewrite" can be understood.

【００１６】一方、図５は最小一致文字数テーブル１０
のテーブル構成を説明する図であり、各単語長（単語の
構成文字数）に対する最小一致文字数の情報が登録され
ている。この最小一致文字数の情報は、登録された単語
の単語長（単語の構成文字数）に対し、候補文字が当該
単語であると認識しても間違えないと判断できる情報が
登録されている。例えば、単語「書き換え」、「歩き疲
れ」などの単語長（単語の構成文字数）が“４”の単語
では、最小一致文字数は“３”であり、単語長が“３”
の場合には、最小一致文字数は“２”であり、単語長が
“９”、“１０”の場合は、最小一致文字数は“６”で
ある。On the other hand, FIG. 5 shows the minimum matching character number table 10
Is a diagram for explaining the table configuration of FIG. 3, in which information on the minimum number of matching characters for each word length (the number of constituent characters of words) is registered. The information on the minimum number of matching characters is registered with respect to the word length of the registered word (the number of constituent characters of the word), even if the candidate character is recognized as the word, it can be determined that there is no mistake. For example, in a word having a word length (the number of constituent characters of the word) of “4” such as the words “rewriting” and “walking fatigue”, the minimum number of matching characters is “3” and the word length is “3”.
In the case of, the minimum number of matching characters is “2”, and in the case of word lengths of “9” and “10”, the minimum number of matching characters is “6”.

【００１７】上述のような構成の単語照合装置におい
て、以下に単語照合の具体的動作を説明する。図６は、
例えば単語「書き換え」について、文字認識が行われ、
不図示の文字認識装置から出力された認識候補文字の例
である。すなわち、同図に□で囲った「書」、「き」、
「換」が正しく認識された文字であるが、文字認識装置
は同図に示す他の文字、例えば「歩」、「見」、「さ」
なども候補文字として認識し、単語検索処理部１へ出力
する。単語検索処理部１は、この認識候補文字列が入力
すると、先ず認識候補文字列の中の第１文字目の第１位
の位置の文字について検索処理を行う。すなわち、第１
文字目の第１位の文字「歩」を１文字目に持つ単語の単
語番号を前述の第１文字目単語テーブルから検索する。A specific operation of word matching in the word matching device having the above configuration will be described below. Figure 6
For example, for the word "rewrite", character recognition is performed,
It is an example of a recognition candidate character output from a character recognition device (not shown). That is, "calligraphy", "ki", enclosed in □ in the figure,
Although the character "Kana" is a character that has been correctly recognized, the character recognition device uses other characters shown in the figure, such as "Ayumu", "Look", and "Sa".
Is also recognized as a candidate character and is output to the word search processing unit 1. When the recognition candidate character string is input, the word search processing unit 1 first performs a search process for the character at the first position of the first character in the recognition candidate character string. That is, the first
The word number of the word having the first character "Ayu" of the first character as the first character is searched from the above-mentioned first character word table.

【００１８】図７はこの検索処理、及びその後の単語照
合処理の流れを示す図である。上述の第１文字目の第１
位の文字「歩」は、文字位置別単語テーブル６内の第１
文字目単語テーブルを検索することによって、該当する
文字「歩」を含む単語番号が読み出される。すなわち、
前述の図２の（ａ）に示したように、単語「歩」を検索
すると、この単語「歩」を含む単語番号が単語番号情報
部６ｂから読み出される。そして、この時読み出される
単語番号は、図２の（ａ）に示す如く、“３５”、“５
９”、“８３”などである。このようにして読み出され
た単語番号は、テーブル登録処理部２へ出力され、テー
ブル登録処理部２の制御により単語登録テーブル７に記
憶される。図７に示すａは、上述の処理により単語番号
“３５”（この単語番号“３５”は単語「歩き始め」に
対応する）が単語登録テーブル７に記憶されたことを示
す。尚、図７の単語番号“３５”の下位には、示されて
いないが“５３”、“８３”など、上述の単語番号情報
部６ｂから読み出された単語番号が書き込まれている。FIG. 7 is a diagram showing the flow of this search processing and the subsequent word matching processing. First of the first character above
The character "Ayu" of the rank is the first in the word table 6 for each character position.
By searching the character word table, the word number including the corresponding character "step" is read. That is,
As shown in FIG. 2A, when the word "step" is searched, the word number including the word "step" is read from the word number information section 6b. The word numbers read at this time are "35" and "5" as shown in FIG.
9 ”,“ 83 ”, etc. The word numbers thus read are output to the table registration processing unit 2 and stored in the word registration table 7 under the control of the table registration processing unit 2. A indicates that the word number “35” (this word number “35” corresponds to the word “beginning to walk”) is stored in the word registration table 7 by the above-described processing. It should be noted that, although not shown, word numbers such as "53" and "83" read from the above-mentioned word number information section 6b are written below the word number "35" in FIG.

【００１９】次に、単語検索処理部１は、第１文字目の
第２位の「書」を１文字目に持つ単語の単語番号を第１
文字目単語テーブルの単語番号情報部６ｂから読み出
す。この時読み出される単語番号は、同図に示す如く、
“１００”、“１０４”、“９２０”などの単語番号で
あり、このようにして読み出された単語番号は、上述と
同様、テーブル登録処理部２の制御により単語登録テー
ブル７に書き込まれる。図６に示すｂは、この時の処理
により単語番号“９２０”、“１００”など（単語番号
“９２０”は、例えば単語「書きもの」、単語番号“１
００”は単語「書き換え」に対応する）が単語登録テー
ブル７に記憶される。Next, the word search processing unit 1 makes the word number of the word having the second character "calli" of the first character the first character the first word number.
It is read from the word number information section 6b of the character word table. The word number read at this time is, as shown in the figure,
The word numbers such as “100”, “104”, and “920” are read out in this way, and the word numbers thus read out are written in the word registration table 7 under the control of the table registration processing unit 2. In FIG. 6, b indicates the word numbers “920”, “100”, etc. by the processing at this time (the word number “920” is, for example, the word “writing”, word number “1
00 ”corresponds to the word“ rewrite ”) is stored in the word registration table 7.

【００２０】以後同様に、第１文字目の最後の第Ｎ位の
文字「と」まで検索処理を行い、読み出した単語番号を
テーブルに登録する。この登録は、検索した順にシーケ
ンシャルに行えば良い。以上の処理により、単語登録テ
ーブル７には図７に示す如く、“３５”、“９２０”、
“１００”、“７８”、“１２３０”などの単語番号が
登録される。Thereafter, similarly, a search process is performed up to the last N-th character "to" of the first character, and the read word number is registered in the table. This registration may be performed sequentially in the order of retrieval. As a result of the above processing, the word registration table 7 has "35", "920",
Word numbers such as "100", "78", "1230" are registered.

【００２１】以上のようにして、第１文字目の第１位〜
第Ｎ位まで単語検索処理、テーブル登録処理が終了する
と、次に第２文字目の第１位の「き」から同様に処理を
行う。尚、この時、既に第１文字目で出現した単語が再
度出現した場合にも、単語登録テーブル７への登録処理
を行う。例えば、単語検索処理部１は、第２文字目の第
１位の「き」を２文字目に持つ単語の単語番号を第２文
字目単語テーブルから読み出し、図２のｃに示す如く、
第１文字目で検索した単語番号“３５”、“１００”を
含む“１００”、“７００”、“３５”などの単語番号
を単語登録テーブル７に書き込む。尚、この時単語番号
“１００”は上述のように単語「書き換え」、単語番号
“７００”は単語「書き留める」、単語番号“３５”は
単語「歩き始め」対応する。As described above, the first character of the first character
When the word search process and the table registration process are completed up to the Nth position, the same process is performed from the first character "ki" of the second character. At this time, even when the word that has already appeared in the first character appears again, the registration process in the word registration table 7 is performed. For example, the word search processing unit 1 reads the word number of the word having the second character "ki" in the second character as the second character from the second character word table, and as shown in c of FIG.
The word numbers such as “100”, “700”, “35” including the word numbers “35” and “100” retrieved by the first character are written in the word registration table 7. At this time, the word number "100" corresponds to the word "rewrite" as described above, the word number "700" corresponds to the word "write down", and the word number "35" corresponds to the word "start walking".

【００２２】以後同様に、第２文字目の最後の第Ｎ位の
文字まで検索処理まで行い、第３文字目から第Ｎ文字目
まで同様に検索処理を行う。この検索の結果は上述と同
様、単語登録テーブル７に登録され、例えば図７に示す
如く、第３文字目として“１００”、“４６”、“７０
０”、などの単語番号が登録され、さらに同様に第４文
字目として“６５”、“１０”、“１０５４”、などの
単語番号が記憶される。尚、上述の単語「書き換え」の
例は、４文字構成の単語であるので第４文字目が第Ｎ文
字目となり、上述の例では単語検索処理を終了する。Similarly, the search process is performed up to the last Nth character of the second character, and the search process is similarly performed from the third character to the Nth character. Similar to the above, the result of this search is registered in the word registration table 7. For example, as shown in FIG. 7, the third character is “100”, “46”, “70”.
A word number such as "0" is registered, and similarly, a word number such as "65", "10", "1054", etc. is stored as the fourth character. Note that the above-mentioned word "rewrite" is an example. Is a four-character word, the fourth character is the Nth character, and the word search process ends in the above example.

【００２３】次に、単語出現回数算出部３は、上述の単
語登録テーブル７に記憶された単語番号を読み出し、ソ
ート処理する。このソート処理は、単語登録テーブル７
に記憶された第１文字目、第２文字目等に関係なく単語
登録テーブル７に登録された全ての単語番号に対して行
う。したがって、登録された全ての単語番号は、図７の
ｄに示す如く単語番号順に並べ変えられる。このように
単語番号が並べ変えられると、複数回読み出された単語
番号は必ず連続し、出現回数も容易に求められる。例え
ば、単語番号“１００”は出現回数が３回であり、単語
番号“３５”及び“７００”は出現回数が２回である。
この出現回数のデータは出力単語決定処理部４へ出力さ
れる。Next, the word appearance number calculation unit 3 reads out the word numbers stored in the word registration table 7 and performs a sorting process. This sorting process is performed by the word registration table 7
This is performed for all the word numbers registered in the word registration table 7 regardless of the first character, the second character, etc. stored in. Therefore, all the registered word numbers are rearranged in the order of word numbers as shown in d of FIG. When the word numbers are rearranged in this way, the word numbers read a plurality of times are always continuous, and the number of appearances can be easily obtained. For example, the word number "100" has three appearances, and the word numbers "35" and "700" have two appearances.
The data of the number of appearances is output to the output word determination processing unit 4.

【００２４】出力単語決定処理部４は上述の出現回数デ
ータが入力すると、出現回数が最大である単語番号を文
字数テーブル９へ出力し、該当する単語番号の文字数の
データを得る。例えば、上述の例では単語番号“１０
０”が最大の出現回数３であり、この単語番号“１０
０”について文字数テーブル９が検索される。この処理
により、単語番号“１００”の文字数は４であり、対応
する表記テーブル８へのポインタ（この時のポイントデ
ータ「１０８４６４」）に従って表記テーブル８を検索
し、単語番号“１００”に対応する単語「書き換え」を
読み出す。When the above-mentioned appearance count data is input, the output word determination processing unit 4 outputs the word number having the maximum appearance count to the character count table 9 to obtain the data of the character count of the corresponding word number. For example, in the above example, the word number "10"
0 "is the maximum number of appearances 3, and this word number" 10 "
The character number table 9 is searched for "0". By this processing, the number of characters of the word number "100" is 4, and the notation table 8 is searched according to the pointer to the corresponding notation table 8 (point data "108464" at this time). The word "rewrite" corresponding to the word number "100" is retrieved and read.

【００２５】次に、出力単語決定処理部４は文字数テー
ブル９から読み出した文字数データ（例えば４）を、最
小一致文字数テーブル１０へ出力し、最小一致文字数の
データを読み出す。この時上述の例では、単語番号“１
００”の文字数データは“４”であるので、最小一致文
字数テーブル１０で規定されている最小一致文字数は
“３”であり、この最小一致文字数“３”のデータが出
力単語決定処理部４へ出力される。出力単語決定処理部
４はこの最小一致文字数“３”のデータと、単語番号
“１００”（単語「書き換え」）の出現回数とを比較判
断し、単語照合を行う。ここで、出力単語決定処理部４
は、最小一致文字数テーブル１０から読み出されたデー
タが“３”であり、単語出現回数算出部３から出力され
た出現回数も“３”であるので両データは一致する。す
なわち、出現回数“３”は認識候補文字の中で正解の単
語（目的の単語）と思われる「書き換え」（単語“１０
０”）に対して３文字一致したことを意味し、最小一致
文字数“３”は４文字構成の単語の場合３文字まで一致
すれば、正解の単語であると判断しても誤りはないとし
て規定されたものである。したがって、上述の両データ
の一致から出力単語決定処理部４は文字認識装置から読
み出された認識候補文字を照合した結果として単語番号
“１００”の単語「書き換え」が正解単語であるとし
て、単語「書き換え」を出力する。Next, the output word determination processing unit 4 outputs the character number data (for example, 4) read from the character number table 9 to the minimum matching character number table 10 and reads the data of the minimum matching character number. At this time, in the above example, the word number "1"
Since the character count data of "00" is "4", the minimum matching character count defined in the minimum matching character count table 10 is "3", and the data of this minimum matching character count "3" is sent to the output word determination processing unit 4. The output word determination processing unit 4 compares the data of the minimum matching character number "3" with the number of appearances of the word number "100" (word "rewriting"), and performs word matching. Here, the output word determination processing unit 4
Indicates that the data read from the minimum matching character number table 10 is “3” and the number of appearances output from the word appearance number calculation unit 3 is also “3”, so that both data match. That is, the number of appearances “3” is “rewriting” (word “10”, which is considered to be the correct word (target word) in the recognition candidate characters.
0 ") means that three characters match, and if the minimum number of matching characters" 3 "matches up to three characters in the case of a word consisting of four characters, it can be judged that there is no error even if it is judged as the correct word. Therefore, the output word determination processing unit 4 determines that the word "rewrite" with the word number "100" is obtained as a result of matching the recognition candidate characters read from the character recognition device based on the coincidence of the above two data. The word "rewrite" is output as the correct word.

【００２６】以上のように処理することにより、容易に
単語照合を行うことができるものである。図８は本発明
の他の実施例を説明する単語照合装置の構成図である。By processing as described above, word matching can be easily performed. FIG. 8 is a block diagram of a word matching device for explaining another embodiment of the present invention.

【００２７】本実施例は、単語検索にハッシュ関数を使
用するものであり、図８はそのシステム構成図である。
すなわち、単語検索処理部１１、テーブル登録処理部１
２、出力単語決定処理部１３、メモリ部１５で構成され
ている。図８は、前述の実施例で説明した図１の構成に
比べ、単語出現回数算出部３の構成が含まれていない。
また、メモリ部１５の構成は、文字位置別単語テーブル
１６、単語登録テーブル１７、表記テーブル１８、文字
数テーブル１９、最小一致文字数テーブル２０で構成さ
れ、図１に示すメモリ部５の構成と同様であるが、単語
登録テーブル１７の内部構成が異なる。すなわち、この
単語登録テーブル１７はハッシュ関数に基づきアドレス
指定されたエリアに単語番号が登録される。したがっ
て、テーブル登録処理部１２は、検索した単語番号か
ら、ハッシュ関数を用いて単語番号を登録するテーブル
アドレスを決定し、単語登録テーブル１７の対応するア
ドレスに単語番号と単語出現回数を登録する。This embodiment uses a hash function for word search, and FIG. 8 is a system configuration diagram thereof.
That is, the word search processing unit 11 and the table registration processing unit 1
2. The output word determination processing unit 13 and the memory unit 15. FIG. 8 does not include the configuration of the word appearance frequency calculation unit 3 as compared with the configuration of FIG. 1 described in the above embodiment.
The memory unit 15 is composed of a character position-based word table 16, a word registration table 17, a notation table 18, a character number table 19, and a minimum matching character number table 20, and is similar to the memory unit 5 shown in FIG. However, the internal structure of the word registration table 17 is different. That is, in this word registration table 17, word numbers are registered in the area addressed by the hash function. Therefore, the table registration processing unit 12 determines a table address for registering the word number from the searched word number using a hash function, and registers the word number and the number of word appearances at the corresponding address in the word registration table 17.

【００２８】図９は上述の単語検索処理、テーブルへの
登録処理、及びその後の処理を説明する図である。すな
わち、単語検索処理部１は不図示の文字認識装置から出
力された認識候補文字の第１文字目から検索処理を行
い、第１文字目の第１位の「歩」を１文字目に持つ単語
の単語番号を第１文字目単語テーブルから得る。この単
語番号を元にハッシュ関数により単語登録テーブル１７
への登録先アドレスを求め、単語番号と出現回数を登録
する。このときの出現回数は１回とする。同様に、第１
文字目の第２位の「書」から第Ｎ文字目の「と」につい
て単語検索処理を行い、単語登録テーブル１７への登録
先アドレスを求め、単語登録テーブル１７の対応するエ
リアに登録処理を行う。上述の処理により、前述の実施
例と同様“３５”などの単語番号が登録され、第２位の
「書」について第２文字目単語テーブルを検索すると
“１００”などの単語番号が登録される。したがって、
上述の処理を第Ｎ位まで繰り返すと上述の“３５”、
“７８”、“１００”、“９２０”などの単語番号が検
索され、この時検索される単語番号はすべて始めて読み
出される単語番号である。この単語番号はテーブル登録
処理部１２の制御により図９に示す如く単語登録テーブ
ル１７に登録される。すなわち、単語登録テーブル１７
には読み出された単語番号と、その単語番号の出現回数
“１”が登録される。FIG. 9 is a diagram for explaining the above-mentioned word search processing, table registration processing, and subsequent processing. That is, the word search processing unit 1 performs the search process from the first character of the recognition candidate characters output from the character recognition device (not shown), and has the first step "Ayu" in the first character. The word number of the word is obtained from the first letter word table. A word registration table 17 is created by a hash function based on this word number.
Obtain the registration destination address for and register the word number and the number of appearances. The number of appearances at this time is once. Similarly, the first
The word search processing is performed for the second character “calli” to the Nth character “to” to obtain the registration destination address in the word registration table 17, and the registration processing is performed in the corresponding area of the word registration table 17. To do. By the above-described processing, the word number such as "35" is registered as in the above-mentioned embodiment, and when the second character word table is searched for the second-rank "calligraphy", the word number such as "100" is registered. . Therefore,
When the above process is repeated up to the Nth position, the above “35”,
Word numbers such as “78”, “100”, and “920” are searched, and the word numbers searched at this time are all the first read word numbers. This word number is registered in the word registration table 17 as shown in FIG. 9 under the control of the table registration processing unit 12. That is, the word registration table 17
The read word number and the number of appearances “1” of the word number are registered in.

【００２９】次に、第２文字目の第１位から第Ｎ位まで
同様に検索処理を繰り返すと、既に第１文字目で出現し
た単語番号が再度出現する場合がある。しかし、この場
合には、本実施例のテーブル登録処理部１２がハッシュ
関数に基づく登録処理を行う為、単語登録テーブル１７
（ハッシュテーブル）の対応する単語番号の登録回数は
インクリメントされる。すなわち、図９に示す如く再度
読み出された単語番号“３５”、“１００”については
出現回数“２”がセットされる。Next, when the search process is repeated in the same manner from the first character to the Nth character of the second character, the word number that has already appeared in the first character may appear again. However, in this case, since the table registration processing unit 12 of the present embodiment performs the registration processing based on the hash function, the word registration table 17
The number of registrations of the corresponding word number in the (hash table) is incremented. That is, the number of appearances "2" is set for the word numbers "35" and "100" read again as shown in FIG.

【００３０】以下同様にして、第３文字目、第４文字目
に対して単語検索処理を実行し、例えば第４文字目につ
いては、同図に示す単語番号“３５”、“９２０”、
“１００”、“１０”、“７８”、“１２３０”、“４
６”、“７００”などの単語番号が登録される。この中
で、３回検索された単語番号である“１００”には
“３”が設定され、２回検索された単語番号“３５”、
“７００”には“２”が設定される。In the same manner, the word search processing is executed for the third and fourth characters. For the fourth character, for example, the word numbers "35", "920" shown in FIG.
"100", "10", "78", "1230", "4"
Word numbers such as "6" and "700" are registered. In this, "3" is set to "100" which is the word number searched three times, and the word number "35" searched twice. ,
“2” is set in “700”.

【００３１】出力単語決定処理部１３は、上述の第２文
字目以降の検索処理以降、文字位置別単語テーブル１６
から単語が読み出される度に文字数テーブル１９と最小
一致文字数テーブル２０を参照し、単語登録テーブル１
７から得られる単語出現回数がその単語構成文字数の最
小一致文字数以上であるか否かを調べる。そして、前述
の実施例と同様、最小一致文字数以上であれば対応する
単語を出力する。この処理を認識候補文字列の最後、第
４文字目の「ん」まで繰り返す。例えば上述の例の場
合、第３文字目の第１位の文字「換」を含む単語番号を
単語登録テーブル１７に登録した時、その中に含まれる
単語番号“１００”は３回読み出されたことになり、単
語番号“１００”の対応するエリアには“３”が設定さ
れる。したがって、この時点で出力単語決定処理部１３
は、文字数テーブル１９を検索し単語番号“１００”の
構成文字数のデータ“４”を読み出し、この構成文字数
のデータ“４”に基づき最小一致文字数テーブル２０か
ら最小一致文字数“３”のデータを読み出す。出力単語
決定処理部４は上述のデータから前述の実施例と同様に
して正解単語は単語番号“１００”の「書き換え」であ
ると判断する。The output word determination processing unit 13 performs the character position-based word table 16 after the above-described search processing for the second and subsequent characters.
Each time a word is read from the table, the character number table 19 and the minimum matching character number table 20 are referred to, and the word registration table 1
It is checked whether or not the number of word appearances obtained from No. 7 is equal to or more than the minimum number of matching characters of the number of word constituent characters. Then, as in the above-described embodiment, if the number of matching characters is the minimum or more, the corresponding word is output. This process is repeated until the fourth character "n" at the end of the recognition candidate character string. For example, in the case of the above example, when the word number including the first character “ka” of the third character is registered in the word registration table 17, the word number “100” included therein is read three times. This means that "3" is set in the area corresponding to the word number "100". Therefore, at this point, the output word determination processing unit 13
Searches the character number table 19 and reads the data "4" of the number of constituent characters of the word number "100", and reads the data of the minimum number of matching characters "3" from the minimum number of matching characters table 20 based on the data "4" of the number of constituent characters. . The output word determination processing unit 4 determines from the above data that the correct word is the "rewrite" of the word number "100" in the same manner as in the above embodiment.

【００３２】尚、上述の他の実施例では、単語検索処理
部１１によって候補単語とされた単語を、ハッシュ関数
を用いてテーブルに登録するとが、さらに単語構成文字
別の単語テーブルにも登録する構成としても良い。この
ように構成することで、単語構成文字数別の単語テーブ
ルと、最小一致文字数テーブルと、現在処理中の文字位
置と、単語出現回数とから、ある単語長以上の単語につ
いては、それ以上文字位置を進めて処理を行っても絶対
に照合結果が得られないことが分かる場合がある。すな
わち、どの単語も、現在の処理位置から後の全ての文字
位置で単語番号が出現したと仮定しても、単語構成文字
数の最小一致文字数以上にならない場合がある。この場
合を具体的に示すと、単語の構成文字数をＷｌとし、そ
の単語の出現回数をＷｈとし、その単語長の場合の最小
一致文字数をＭｉｎ（Ｗｌ）とし、現在の文字位置をｍ
とすると、Ｍｉｎ（Ｗｌ）−（Ｗｌ−ｍ）≦Ｗｈとなる場合は照合結果が得られない場合である。この場
合には、構成文字がＷｌ以上の単語については処理を継
続しないこととする。このことにより、単語照合の処理
時間を短縮することができる。In the above-described other embodiment, the words selected as candidate words by the word search processing unit 11 are registered in the table by using the hash function, but are also registered in the word table for each word constituent character. It may be configured. By configuring in this way, from the word table for each number of characters constituting the word, the minimum matching character number table, the character position currently being processed, and the number of word occurrences, for words of a certain word length or more, the character position It may be found that the collation result cannot be obtained even if the process is advanced to. That is, even if it is assumed that the word numbers appear at all the character positions after the current processing position for all words, the number of characters constituting the word may not exceed the minimum matching character number. As a concrete example of this case, let W1 be the number of constituent characters of a word, Wh be the number of appearances of that word, and Min (Wl) be the minimum number of matching characters in the case of that word length, and let the current character position be m.
Then, if Min (Wl)-(Wl-m) ≤Wh, the collation result cannot be obtained. In this case, the processing is not continued for words whose constituent characters are Wl or more. As a result, the processing time for word matching can be shortened.

【００３３】例えば、上述の単語番号“３５”の単語
「歩き始め」について検討すると、第４文字目の検索処
理が終了した時点で出現回数が２回であったとすると、
単語の構成文字数Ｗｌが“４”であり、単語の出現回数
Ｗｈが“２”であり、その単語長（Ｗｌ）の場合の最小
一致文字数をＭｉｎ（Ｗｌ）は前述の最小一致文字数テ
ーブル２０から“３”であり、現在の文字位置ｍは
“４”であることから上述の式Ｍｉｎ（Ｗｌ）−（Ｗｌ
−ｍ）≦Ｗｈは、３−（４−４）≦２となり、この式を
満足しないのでこの時点で以後の処理を打ち切るもので
ある。For example, considering the word "beginning of walking" with the above-mentioned word number "35", if the number of appearances is two at the time when the search processing for the fourth character is completed,
The minimum number of matching characters in the case where the number of constituent characters Wl of the word is “4”, the number of appearances of the word Wh is “2”, and the word length (Wl) is Min (Wl) is the minimum matching character number table 20 described above. Since it is "3" and the current character position m is "4", the above expression Min (Wl)-(Wl
-M) ≤ Wh is 3- (4-4) ≤ 2, which does not satisfy this expression, so that the subsequent processing is terminated at this point.

【００３４】[0034]

【発明の効果】以上詳細に説明したように、本発明によ
れば認識候補文字列の各文字毎にその文字を含む単語の
検索を行い、検索された単語をテーブルに登録し、この
テーブルをソートすることにより簡単に求められる単語
出現回数と、単語構成文字数を比較して得るその一致度
によって単語照合を行うので、従来のように単語が読み
出される度に出現回数の記録を行っていたテーブル内の
単語検索処理が不要となり、処理速度の高速化が可能と
なる。As described above in detail, according to the present invention, for each character of the recognition candidate character string, the word containing the character is searched, the searched word is registered in the table, and this table is stored. Since the word matching is performed based on the matching number obtained by comparing the number of word occurrences that is easily obtained by sorting and the number of word constituent characters, the table that records the number of appearances each time a word is read as in the conventional case This eliminates the need for the word search process within, and makes it possible to increase the processing speed.

【００３５】また、ハッシュ関数を用いて決定したテー
ブルアドレスに単語出現回数を登録することにより、ハ
ッシュ関数を用いて単語とその出現回数とを高速に得る
ことができるから、さらに処理速度の高速化が実現でき
る。Further, by registering the number of word appearances in the table address determined by using the hash function, the word and the number of appearances thereof can be obtained at high speed by using the hash function, so that the processing speed is further increased. Can be realized.

【００３６】さらに、単語をテーブルに登録する際に、
単語構成文字数別のテーブルに登録することによって、
所定の文字位置以降での単語照合処理を省略することが
でき、この点からも処理速度の高速化が可能である。Furthermore, when registering words in the table,
By registering in the table according to the number of word constituent characters,
The word matching process after the predetermined character position can be omitted, and the processing speed can be increased from this point as well.

[Brief description of drawings]

【図１】一実施例の単語検索方式のシステム構成図であ
る。FIG. 1 is a system configuration diagram of a word search system according to an embodiment.

【図２】文字位置別単語テーブルのテーブル構成を示す
図である。FIG. 2 is a diagram showing a table structure of a character position-based word table.

【図３】表記テーブルのテーブルの構成を示す図であ
る。FIG. 3 is a diagram showing a configuration of a notation table.

【図４】文字数テーブルのテーブルの構成を示す図であ
る。FIG. 4 is a diagram showing a configuration of a character number table.

【図５】最小一致文字数テーブルの構成を示す図であ
る。FIG. 5 is a diagram showing a structure of a minimum matching character number table.

【図６】認識候補文字列の具体例を示す図である。FIG. 6 is a diagram showing a specific example of a recognition candidate character string.

【図７】一実施例の単語検索方式の動作を説明する図で
ある。FIG. 7 is a diagram illustrating an operation of a word search method according to an embodiment.

【図８】他の実施例の単語検索方式のシステム構成図で
ある。FIG. 8 is a system configuration diagram of a word search system of another embodiment.

【図９】他の実施例の単語検索方式の動作を説明する図
である。FIG. 9 is a diagram for explaining the operation of the word search method of another embodiment.

[Explanation of symbols]

１、１１単語検索処理部２、１２テーブル登録処理部３単語出現回数算出処理部４、１４出力単語決定処理部５、１５メモリ部６、１６文字位置別単語テーブル７、１７単語登録テーブル８、１８表記テーブル９、１９文字数テーブル１０、２０最小一致文字数テーブル 1, 11 word search processing unit 2, 12 table registration processing unit 3 word appearance number calculation processing unit 4, 14 output word determination processing unit 5, 15 memory unit 6, 16 character position-based word table 7, 17 word registration table 8, 18 notation table 9, 19 character number table 10, 20 minimum matching character number table

Claims

[Claims]

1. A word position-specific word table means in which a word number is set for a predetermined word and the word number is registered for each character forming the word, and an input means for inputting a recognition candidate character string, A word number search means for searching the word table means for each character position for the recognition candidate character string input from the input means and reading a word number for each character of the recognition candidate character string; and the word number search means. A number-of-words-number-of-occurrences calculation unit that obtains the number of occurrences of the searched word number, and the number of constituent characters of a word of a word number with a large number of occurrences and its number of occurrences are compared based on the number of occurrences of word numbers calculated by the number of occurrences of word number And an output word determining unit that determines a word based on the degree of coincidence.

2. A word position-specific word table means in which a word number is set for a predetermined word, and the word number is registered for each character forming the word, and an input means for inputting a recognition candidate character string. A table registration process for searching the word table means for each character position for the recognition candidate character string input from the input means, determining a table address using a hash function, and registering the searched word and the number of appearances thereof in a table. A word collating apparatus comprising: a means for determining a word based on a degree of coincidence obtained by comparing the number of appearances of the word and the number of characters constituting the word.

3. The address of the word number obtained by searching the character position-based word table is determined using a hash function, and the address is registered in the table.
Registration means for registering in a word table by the number of constituent characters of the word corresponding to the word number, and using the word table, word matching processing continuation for deciding whether or not to continue word matching at a predetermined character position 3. The word matching device according to claim 2, further comprising: determining means.