JPS63268083A

JPS63268083A - Word recognizing device

Info

Publication number: JPS63268083A
Application number: JP62102098A
Authority: JP
Inventors: Yukikazu Kaburayama; 蕪山　幸和
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1987-04-27
Filing date: 1987-04-27
Publication date: 1988-11-04

Abstract

PURPOSE:To easily detect a relevant word despite the misrecognition of a candidate character string by retrieving an area and its periphery in a word dictionary in response to the number of characters and the degree of complicatedness of the candidate character string. CONSTITUTION:A degree of complexity table 7 defines the degree of complexity of each character. A word dictionary 9 stores words in the order of the number of characters and the degrees of complexity of words. An area indication table 8 stores area information on the area in the dictionary 9 where the words are stored in response to the number of characters and the degrees of complexity. A recognizing part 4 refers to a recognition dictionary 5 for the characters extracted by a character extracting part 3 and selects the corresponding candidate character string for each character. A word collating part 6 refers to the table 7 for calculation of the degree of complexity of the candidate character string and reads the area information out of the table 8. Then the part 6 collates the words read out of the corresponding area and its periphery in the dictionary 9 based on the area information with the relevant candidate character string to detect the relevant word.

Description

【発明の詳細な説明】〔概要〕この発明は、候補文字列に対応する単語を認識して出力
する単語認識装置において、認識した候補文字列の先頭
の文字に些細な誤認識などがあった場合、単語辞書中の
この誤認識した先頭の文字から始まる領域から順次検索
することによって該当する正しい単語が見つからない問
題を解決するため、単語の文字数および複雑度順に単語
を格納した単語辞書を設け、候補文字列の文字数および
複雑度に該当する単語辞書中の領域およびその近傍を検
索して該当する正しい単語を見つけ出すことにより、候
補文字列に些細な誤認識があったとしても容易に該当す
る正しい単語を見つけ出すようにしている。[Detailed Description of the Invention] [Summary] This invention provides a word recognition device that recognizes and outputs a word corresponding to a candidate character string. In order to solve the problem of not being able to find the correct word by sequentially searching the area starting from the first character of this misrecognized word in the word dictionary, we have created a word dictionary that stores words in order of number of characters and complexity. By searching the area in the word dictionary and its vicinity corresponding to the number of characters and complexity of the candidate string to find the correct word, even if there is a minor misrecognition in the candidate string, it can be easily detected. I'm trying to find the correct word.

[Industrial application field]

本発明は、単語の文字数および複雑度順に格納した単語
辞書を設け、候補文字列の文字数および複雑度に該当す
る単語辞書中の領域およびその近傍を検索して該当する
正しい単語を見つけ出すよう構成した単語認識装置に関
するものである。The present invention is configured to provide a word dictionary in which words are stored in order of number of characters and degree of complexity, and to search an area in the word dictionary corresponding to the number of characters and degree of complexity of a candidate character string and its vicinity to find the corresponding correct word. This invention relates to a word recognition device.

〔従来の技術と発明が解決しようとする問題点〕文字認
識装置には、単語辞書を持ち、これを利用して単語の認
識率を向上させようとするものがある。以下第７図およ
び第８図を参照して簡単に説明する。[Prior art and problems to be solved by the invention] Some character recognition devices have a word dictionary and use this to improve the word recognition rate. A brief explanation will be given below with reference to FIGS. 7 and 8.

用紙に記入された文字例えば“自動車”をスキャナーな
どを用いて読み取り、文字単位に認識して第７図に示す
ような候補文字列を夫々生成する。The characters written on the paper, such as "automobile", are read using a scanner or the like, and each character is recognized to generate candidate character strings as shown in FIG.

この生成した候補文字列と、第８図に示すような単語辞
書との照合を行い、合致するもの、あるいは合致するも
のがない場合には類似度の高いものを認識結果として出
力していた。この際、第８図に示す従来の単語辞書は、
単語の“文字数”および“文字コード”の順に並べられ
ていた。このため、手書きした文字の先頭の候補文字列
が“自”として認識され、３文字の単語という場合には
、第７図図中に示すように簡単に該当する正しい単語“
自動車”を見つけ出すことができる。This generated candidate character string is compared with a word dictionary as shown in FIG. 8, and a matching string or, if there is no matching string, a string with a high degree of similarity is output as a recognition result. At this time, the conventional word dictionary shown in Figure 8 is
Words were arranged in order of "number of characters" and "character code". Therefore, the candidate character string at the beginning of the handwritten character is recognized as "self", and in the case of a three-letter word, the corresponding correct word "" is easily recognized as shown in Figure 7.
car” can be found.

しかし、第７図の第１候補ないし第３候補に示すように
、候補文字列の先頭文字“自”が何等かの原因により間
違って“目”、“白”、あるいは“日”のように認識さ
れた場合、これらの文字を先頭に含む３文字からなる単
語を第８圓単語辞書から検索しても見つからないという
問題点があった。However, as shown in the first to third candidates in Figure 7, for some reason the first character "self" in the candidate character string is mistakenly changed to "eye", "white", or "day". When the words are recognized, there is a problem in that a word consisting of three letters including these letters at the beginning cannot be found even if the word is searched from the 8th round word dictionary.

[Means for solving problems]

本発明は、前記問題点を解決するため、入力された文字
毎に認識して候補文字列を生成する文字認識部４と、文
字の複雑度を定義した複雑度テーブル７と、単語の文字
数および単語の複雑度順に単語を格納した単語辞書９と
、候補文字列の文字数および複雑度に対応する単語が単
語辞書９中のいずれの領域に格納されているかの領域情
報を格納した領域指示テーブル８と、上記候補文字列の
文字数と複雑度テーブル（７）を参照して算出した当該
候補文字列の複雑度とによって、領域指示テーブル８か
ら読み出した領域情報に基づいて、辞書９中の該当する
領域およびその近傍から読み出した単語と当該候補文字
列との照合を行って該当する単語を見つけ出す単語照合
部６とを設け、この単語照合部６によって見つけ出した
単語を出力するようにしている。In order to solve the above problems, the present invention includes a character recognition unit 4 that recognizes each input character and generates a candidate character string, a complexity table 7 that defines the complexity of characters, and a character recognition unit 4 that recognizes each input character and generates a candidate character string. A word dictionary 9 storing words in order of word complexity, and an area instruction table 8 storing area information indicating in which area in the word dictionary 9 words corresponding to the number of characters and complexity of a candidate character string are stored. Based on the number of characters in the candidate character string and the complexity of the candidate character string calculated with reference to the complexity table (7), the corresponding area in the dictionary 9 is determined based on the area information read from the area instruction table 8. A word matching section 6 is provided to find the corresponding word by matching the word read from the area and its vicinity with the candidate character string, and the word found by the word matching section 6 is output.

第１図は本発明の原理構成図を示す。図中観測部１は、
手書き文字などが記載された帳票をスキャナーなどを用
いて読み取るものである。FIG. 1 shows a basic configuration diagram of the present invention. Observation section 1 in the figure is
A scanner or other device is used to read forms containing handwritten characters.

制御部２は、各種制御を行うものである。The control unit 2 performs various controls.

文字特徴抽出部３は、観測部１で読み取られた２値画像
信号中から文字の特徴を抽出するものである。The character feature extraction section 3 extracts character features from the binary image signal read by the observation section 1.

文字認識部４は、文字特徴抽出部３によって抽出された
文字の特徴に基づいて、認識辞書５を参照して文字毎に
候補文字を生成するものである。The character recognition unit 4 generates candidate characters for each character by referring to the recognition dictionary 5 based on the character features extracted by the character feature extraction unit 3.

単語照合部６は、文字認識部４によって認識された候補
文字列に該当する単語を単語辞書９から見つけ出すもの
である。The word matching unit 6 searches the word dictionary 9 for words that correspond to the candidate character string recognized by the character recognition unit 4.

複雑度テーブル７は、文字に対する複雑度を格納したも
のである。The complexity table 7 stores complexity levels for characters.

領域指示テーブル８は、候補文字列の文字数、および複
雑度テーブル７を用いて算出した複雑度に対応する単語
が単語辞書９中に格納されている領域情報を格納したも
のである。The area designation table 8 stores area information in which words corresponding to the number of characters in a candidate character string and the degree of complexity calculated using the complexity table 7 are stored in the word dictionary 9.

単語辞書９は、文字数および複雑度順に単語を格納した
ものである。The word dictionary 9 stores words in order of number of characters and degree of complexity.

出力部１０は、単語認識部６によって単語辞書９中から
見つけ出された単語を文字認識結果として出力するもの
である。The output unit 10 outputs words found in the word dictionary 9 by the word recognition unit 6 as character recognition results.

[Effect]

次に動作を説明する。 Next, the operation will be explained.

第１図において、手書き文字などの記入された帳票を観
測部１に入力すると、当該手書き文字などが読み取られ
、制御部２を介して文字特徴抽出部に通知される。文字
特徴抽出部３は、文字の特徴を抽出する。文字認識部４
は、抽出された文字の特徴に基づいて、認識辞書５を参
照して文字毎に該当する候補文字を夫々選択する。これ
らｉｆｆ訳された候補文字列の通知を受けた単語照合部
６は、複雑度テーブル７を参照して当該候補文字列の複
雑度例えば総画数を算出し、候補文字列の文字数および
この算出した複雑度に対応する単語が単語辞書９に格納
されている領域情報を領域指示テーブル８から読み出す
。この読み出した領域情ｆ［ｉ４こ該当する単語辞書９
中の領域およびその近傍から単語を順次読み出して候補
文字列と照合を行う。In FIG. 1, when a form with handwritten characters, etc. written thereon is input into the observation unit 1, the handwritten characters, etc. are read and notified to the character feature extraction unit via the control unit 2. The character feature extraction unit 3 extracts character features. Character recognition section 4
refers to the recognition dictionary 5 and selects corresponding candidate characters for each character based on the characteristics of the extracted characters. The word matching unit 6, which has been notified of these IF-translated candidate character strings, refers to the complexity table 7 to calculate the complexity of the candidate character string, such as the total number of strokes, and calculates the number of characters in the candidate character string and the calculated number of characters. Area information in which words corresponding to the degree of complexity are stored in the word dictionary 9 is read out from the area instruction table 8. This read area information f [i4 corresponding word dictionary 9
Words are sequentially read from the inner region and its vicinity and compared with candidate character strings.

照合の結果、合致した単語、あるいは合致した単語がな
い場合には類似度の高い単語を認識結果として、出力部
１０を介して出力する。As a result of the matching, a matching word, or if there is no matching word, a word with a high degree of similarity is outputted via the output unit 10 as a recognition result.

以上のように、単語を文字数および複雑度順に並べた単
語辞書９を設け、候補文字列の文字数および複雑度に対
応する単語が格納されている単語辞書９中の領域および
その近傍を検索して照合し、正しい単語を見つけ出すこ
とにより、手書き文字を認識した候補文字列に些細な誤
認識例えば画数が少し異なる文字に誤認識したとしても
、正しい単語を見つけ出すことが可能となり、認識率を
向上させることができる。As described above, a word dictionary 9 in which words are arranged in order of number of characters and complexity is provided, and an area and its vicinity in the word dictionary 9 in which words corresponding to the number of characters and complexity of a candidate character string are stored are searched. By comparing the handwritten characters and finding the correct word, even if there is a small misrecognition in the candidate string of handwritten characters, such as a character with a slightly different number of strokes, it is possible to find the correct word, improving the recognition rate. be able to.

〔Example〕

次に、第２図ないし第６図を用いて本発明の１実施例の
構成および動作を順次詳細に説明する。Next, the configuration and operation of one embodiment of the present invention will be explained in detail using FIGS. 2 to 6.

第２図（イ）は、帳票に記入した単語例“自動車”を示
す。これは、帳票に手書き文字を記入したものである。FIG. 2(a) shows an example of the word "automobile" written in the form. This is a form with handwritten characters written on it.

第２図（ロ）は、候補文字列を示す。これは、以下のよ
うにして生成されたものである。第２図（イ）に示す手
書き文字“自動車”が第１図観測部１によって読み取ら
れる。この読み取られた２値画像中から文字特徴抽出部
３が文字の特徴を抽出する。この抽出した文字の特徴に
基づいて、文字認識部４が認識辞書５を参照し、類似度
の高い順に第１候補ないし第３候補として候補文字列を
生成する。FIG. 2(b) shows candidate character strings. This was generated as follows. The handwritten character "car" shown in FIG. 2(a) is read by the observation unit 1 in FIG. A character feature extraction unit 3 extracts character features from this read binary image. Based on the extracted character features, the character recognition unit 4 refers to the recognition dictionary 5 and generates candidate character strings as first to third candidates in descending order of similarity.

第３図は、複雑度テーブル例を示す。これは、第１図複
雑度テーブル７に格納されている内容例を表したもので
あって、文字の複雑度を例えば画数で代表させて表した
ものである。この複雑度テーブル７を参照して、候補文
字列例えば第２図に示す第１候補“自動車”について求
めると、文字“目”は複雑度“５”、文字“動”は複雑
度“１１”、文字“車”は複雑度“７”であるから、当
該第１候補の候補文字列に対する複雑度は、“２３”と
算出される。尚、単語“自動車”は、文字数“３”およ
び複雑度“２４”と算出される。FIG. 3 shows an example complexity table. This represents an example of the contents stored in the complexity table 7 in FIG. 1, and represents the complexity of a character by, for example, the number of strokes. When searching for the candidate character string, for example, the first candidate "car" shown in FIG. 2, with reference to this complexity table 7, the character "eye" has a complexity of "5", and the character "movement" has a complexity of "11". Since the character "car" has a complexity level of "7", the complexity level for the candidate character string of the first candidate is calculated as "23". Note that the word "automobile" is calculated to have the number of characters "3" and the complexity level "24".

第４図は、単語を文字数および複雑度順に格納した単語
辞書例を示す。これは、第１口車語辞書９に格納されて
いる内容例を示す。この単語辞書９には、単語の文字数
、および第３図を用いて算出した単語の複雑度の順に並
べたものである。詳述すれば、第１に、１文字の複雑度
の小さい単語から複雑度の大きい単語順に並べる。第２
に、２文字の複雑度の小さい単語から複雑度の大きい単
語順に並べる。以下同様に並べる。これは、帳票に手書
きされた文字の文字数については、はぼ確実にｉ＝　Ｒ
することができるので、この文字数を第１に優先して格
納するようにした。そして、各文字の複雑度については
、誤認識する可能性が高いので、これが少し値開違って
認識されて候補文字列が生成されたとし°ζも、正しい
単語を容易に検索し得るように複雑度（例えば画数）順
に並べて格納したものである。このように、文字数およ
び？！［雑度順に単語を並べた単語辞書９を設けること
により、例えば第２図（ロ）に示すように少し位文字の
画数を間違えた候補文字列を生成したとしても、容易に
正しい単語を照合して見つけ出すことが可能となる。FIG. 4 shows an example of a word dictionary in which words are stored in order of number of characters and complexity. This shows an example of contents stored in the first colloquial language dictionary 9. In this word dictionary 9, the words are arranged in order of the number of characters in the words and the degree of complexity of the words calculated using FIG. Specifically, first, the words are arranged in order from the least complex one character to the highest complexity. Second
The two-letter words are arranged in order from the least complex to the most complex. Arrange in the same manner below. This means that for the number of characters handwritten on a form, it is almost certain that i = R.
Therefore, this number of characters is stored with first priority. As for the complexity of each character, there is a high possibility that it will be misrecognized, so if this is recognized with a slightly different value and a candidate character string is generated, °ζ will also make it easier to search for the correct word. They are arranged and stored in order of complexity (for example, number of strokes). In this way, the number of characters and ? ! [By providing a word dictionary 9 in which words are arranged in order of complexity, even if a candidate character string with a slightly incorrect number of strokes is generated as shown in Figure 2 (b), the correct word can be easily collated. It becomes possible to find it.

第５図は、領域指示テーブル例を示す。これは、第１図
領域指示テーブル８に格納されている内容例を表す。図
中左から第１番目の欄“文字数゛は候補文字列の文字数
を表し、第２番目の欄“複雑度”は候補文字列の複雑度
を表し、第３番目の欄“先頭文字番号”は第１番目の欄
の文字数かつ第２番目の欄の複雑度に対応する単語が単
語辞書９中に格納されている先頭の番号を表し、第４番
目の欄の゛終了文字番号”は単語が単語辞書９中に格納
されている終了（末尾）の番号を表す。この領域指示テ
ーブル９を設け、候補文字列の文字数、および第３図？
！雑度テーブル７を用いて求めた候補文字列の複雑度に
対応する単語が格納されている単語辞書９中の先頭番号
および終了番号を求め、該当する文字数および複雑度の
単語のみとの照合を行うことが可能となる。また、この
複雑度中に正しい単語が見つからない場合には、複雑度
が１つ多いあるいは１つ少ない単語辞書９中の領域の先
頭番号および終了番号を読み出して、更に照合を行い、
画数などが少し異なって認識された候補文字列に対応す
る正しい単語を見つけ出すことが可能となる。FIG. 5 shows an example of the area designation table. This represents an example of the contents stored in the area instruction table 8 in FIG. In the figure, the first column from the left, "Number of characters," represents the number of characters in the candidate character string, the second column, "Complexity," represents the complexity of the candidate character string, and the third column, "First character number." represents the number at the beginning of the word stored in the word dictionary 9 that corresponds to the number of characters in the first column and the complexity level in the second column, and the "end character number" in the fourth column represents the number of words corresponding to the number of characters in the first column and the complexity level in the second column. represents the end (end) number stored in the word dictionary 9. This area designation table 9 is provided, and the number of characters in the candidate character string and the number of characters in the candidate character string are shown in FIG.
! The starting number and ending number in the word dictionary 9 that stores words corresponding to the complexity of the candidate character string obtained using the complexity table 7 are obtained, and matching is performed only with words having the corresponding number of characters and complexity. It becomes possible to do so. In addition, if the correct word is not found within this complexity level, read out the start number and end number of the area in the word dictionary 9 whose complexity level is one more or one less, and perform further comparison.
It becomes possible to find the correct word corresponding to a candidate character string recognized with a slightly different number of strokes.

次に、第６図を用いて、第１図構成の動作を詳細に説明
する。Next, the operation of the configuration shown in FIG. 1 will be explained in detail using FIG. 6.

第６図において、図中■は、文字認識する状態を示す。In FIG. 6, ■ indicates a character recognition state.

これは、第１図文字認識部４が帳票中に記入された文字
から抽出された特徴に基づいて、認識辞書５を参照して
候補文字列例えば第２図（ロ）に示す第１候補ないし第
３候補の候補文字列を生成することを意味している。The character recognition unit 4 in FIG. 1 refers to the recognition dictionary 5 to generate a candidate character string, for example, the first candidate or the first candidate shown in FIG. This means generating a third candidate character string.

図中■は、複雑度テーブル７を参照して複雑度Ｃの計算
を行う状態を示す。これは、第３図を用いて説明したよ
うに、候補文字列の文字毎に複雑度を算出した後、総計
した複雑度例えば“回動車”の複雑度を“２３”として
算出することを意味している。In the figure, ■ indicates a state in which the complexity level C is calculated with reference to the complexity level table 7. This means that, as explained using Figure 3, the complexity is calculated for each character in the candidate character string, and then the total complexity is calculated, for example, the complexity of "rotating wheel" is set as "23". are doing.

図中■は、領域指示テーブル８を参照して単語辞書９中
の該当する領域を求める状態を示す。これは、第５図を
用いて説明したように、候補文字列の文字数および複雑
度例えば候補文字“回動車゛の文字数“３゛および複雑
度“２３”に該当する単語が単語辞書９中に格納されて
いる先頭番号および終了番号を検索して求めることを意
味している。In the figure, ■ indicates a state in which the corresponding area in the word dictionary 9 is determined by referring to the area designation table 8. This is because, as explained using FIG. 5, there are words in the word dictionary 9 that correspond to the number of characters and complexity of the candidate character string, for example, the number of characters in the candidate character "turning wheel" is "3" and the complexity is "23". This means searching and finding the stored starting and ending numbers.

図中■は、単語辞書９を参照する状態を示す。■ in the figure indicates a state in which the word dictionary 9 is referred to.

これは、第１図星語辞書９例えば第４図に示す単語辞書
９中の候補文字列の文字数および複雑度に対応する単語
、例えば候補文字“回動車”の文字数“３”および複雑
度”２３”に対応する卑語が格納されている領域中の単
語を順次読み出して、候補文字列と照合することを意味
している。This is a word corresponding to the number of characters and complexity of the candidate character string in the word dictionary 9 shown in FIG. This means that the words in the area where the vulgar words corresponding to `` are stored are sequentially read out and compared with candidate character strings.

図中■は、単語がをったか否かを判別する状態を示す。In the figure, ■ indicates a state in which it is determined whether a word has been spelled or not.

これは、図中■で該当する領域から順次読み出した単語
と、候補文字列とを照合して合致するもの、あるいは類
似度の高いものが有ったか否かを判別することを意味し
ている。ＹＥＳの場合には、図中■でその結果を認識結
果として出力例えばプリントアウトする゛。ＮＯの場合
には、図中■で複雑度Ｃの値を“＋１”および“−１”
して図中■以下を実行して、近傍の複雑度に該当する単
語との照合を行う。これにより、少し位画数を間違って
手書き文字を読み取った（認識した）としても、正しい
単語を見つけ出して出力することが可能となり、単語の
認識率を向上させることができる。This means that the words sequentially read out from the corresponding area in the figure are compared with candidate character strings to determine whether there is a match or a highly similar word. . In the case of YES, the result is outputted as a recognition result, for example, printed out, as indicated by ■ in the figure. In the case of NO, set the value of complexity C to “+1” and “-1” at ■ in the figure.
Then perform the following steps (■ in the figure) to match words that correspond to the complexity level in the vicinity. As a result, even if a handwritten character is read (recognized) with a slightly incorrect number of strokes, the correct word can be found and output, and the word recognition rate can be improved.

〔Effect of the invention〕

以上説明したように、本発明によれば、単語の文字数お
よび複雑度順に単語を格納した単語辞書を設け、候補文
字列の文字数および複雑度に該当する単語辞書中の領域
およびその近傍を検索して正しい単語を見つけ出す構成
を採用しているため、候補文字列に画数などの些細な誤
認識があったとしても容易に該当する正しい卑語を見つ
け出すことが可能となり、単語の認識率の向上を図るこ
とができる。As explained above, according to the present invention, a word dictionary is provided in which words are stored in order of the number of characters and complexity of the word, and an area in the word dictionary corresponding to the number of characters and complexity of a candidate character string and its vicinity are searched. Since the system uses a configuration to find the correct word using a combination of words, even if there is a small misrecognition such as the number of strokes in the candidate character string, it is possible to easily find the correct vulgar word, which improves the word recognition rate. be able to.

[Brief explanation of drawings]

第１図は本発明の原理構成図、第２図は候補文字列説明
図、第３図は複雑度テーブル例、第４図は単語辞書の内
容例、第５図は領域指示テーブル例、第６図は本発明の
動作説明図、第７図は候補文字例、第８図は従来の単語
辞書の内容例を示す。図中、４は文字認識部、６は単語照合部、７は複雑度テ
ーブル、８は領域指示テーブル、９は単語辞書を表す。Fig. 1 is a diagram showing the basic structure of the present invention, Fig. 2 is an explanatory diagram of candidate character strings, Fig. 3 is an example of a complexity table, Fig. 4 is an example of the contents of a word dictionary, Fig. 5 is an example of an area specification table, FIG. 6 is an explanatory diagram of the operation of the present invention, FIG. 7 is an example of candidate characters, and FIG. 8 is an example of the contents of a conventional word dictionary. In the figure, 4 represents a character recognition unit, 6 represents a word matching unit, 7 represents a complexity table, 8 represents an area instruction table, and 9 represents a word dictionary.

Claims

[Claims] A word recognition device that recognizes and outputs a word corresponding to a candidate character string, comprising: a character recognition unit (4) that recognizes each input character and generates a candidate character string; A complexity table (7) that defines the degree of complexity, a word dictionary (9) that stores words in order of the number of characters in the word and the complexity of the word, and a word dictionary (9) that stores words that correspond to the number of characters and complexity of the candidate string. The complexity of the candidate character string calculated by referring to the area instruction table (8) that stores the area information of which area it is stored in, the number of characters in the candidate character string, and the complexity table (7). The dictionary (
9) a word matching unit (6) that matches the candidate character string with the word read from the corresponding area and its vicinity in the area to find the corresponding word; A word recognition device characterized in that it is configured to output words.