JPH0221387A

JPH0221387A - Word reader

Info

Publication number: JPH0221387A
Application number: JP63172208A
Authority: JP
Inventors: Yasuhiro Okada; 康裕岡田; Kozo Tomono; 伴野　浩三
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1988-07-11
Filing date: 1988-07-11
Publication date: 1990-01-24

Abstract

PURPOSE:To determine a word with high accuracy even when the noise, etc., of a non-character exist on a document by performing an operation by integrating the rate of the likelihood of the noise into various parameters and selecting words having high coincidence degree under a ranked state by a word determining means. CONSTITUTION:When the noise, etc., of the non-character exist in the picture data of a character, etc., a pseudo noise type character detecting means 6 detects the rate of the likelihood of the noise. The rate of the likelihood of the noise is additionally integrated into conventional various parameters as a parameter, the operation is performed using a prescribed operation expression, a word determining means 59 selects with which word out of words 12 and 13 recognition candidate character strings 10 and 11 can respectively have the highest coincidence degree, and it is determined that the words 12 and 13 having high ranks are the character strings inputted from a document 1. Thus, even when the noise, etc., exist on the entered character frame of the document, the words can be accurately collated with each other.

Description

[Detailed description of the invention]

〔産業上の利用分野〕この発明は、住所・氏名などの単語を読み取って認識す
る単語読取装置、特に単語を構成する文字を１文字ごと
に認識し、その認識結果を用いて単語を修正する単語読
取装置に関するものである。（従来の技術）近年、計算機などへの大量かつ高速のデータエントリー
手段として単語読取装置が注目されており、日本語にお
いては漢字やカナや英数字などが複雑に混在しているこ
とから、高精度に単語の読み取りができる単語読取装置
の開発が望まれている。この単語読取装置としては、光
学式文字読取装置（ＯＣＲ）が知られており、第１２図
に、例えば（ｒＯｃＨのための知識処理方式」　（研究
実用化報告Ｖｏ１．３２．Ｎｏ、４　（１９８３））な
どに示された従来の単語読取装置の構成図を示す。図において、１は文字が記載されている帳票、２は帳票
１を光電変換して読み取る走査手段、３は入力文字を１
文字ごとに切り出して認識し、認識候補文字を出力する
文字認識手段、４は基準となる単語を格納した単語辞書
、５は文字認識手段３から出力された認識候補文字と単
語辞書４内の単語とを照合して単語を決定する単語決定
手段である。従来の単語読取装置は以上のように構成され、前記認識
候補文字のうち順位が上位で、かつ単語辞書４に登録さ
れた単語を優先的に選択したものを単語読取の結果とし
て出力するものである。第３図は読み取り対象となる一般的な帳票１の一例を示
す説明図である。図において、７は入力単語文字列“オ
カヤマ”、８は入力単語文字列“オカヤ”、９は帳票上
に付着した（非文字である）ノイズである。この帳票上の文字に対して、文字認識手段３から出力さ
れる認識候補文字の一例が第４図、第５図に示されてい
る。第４図は、入力単語文字列“オカヤマ”７の各文字に対
して文字認識手段３で出力される認識候補文字１０を示
しており、入力単語第１文字目の“オ”に対しては、“
オ”の１個の認識候補文字が、第２文字目の“力”に対
しては、“力゛の１個の認識候補文字が、第３文字目の
“ヤ”に対しては、“マ”５　“ヤ”の２個の認識候補
文字が、第４文字目の“マ”に対しては、“ヌ”の１個
の認識候補文字が存在す企。第５図は、入力単語文字列
“オカヤ”８の各文字と帳票上に付着したノイズ９に対
して文字認識手段３で出力される認識候補文字１１を示
しており、入力単語“オカヤ”の第１文字目の“オ”に
対しては、“オ”の１個の認識候補文字が、第２文字目
の“力”に対しては、“力”の１個の認識候補文字が、
第３文字目の“ヤ”に対しては、′マ”、′ヤ”の２個
の認識候補文字が、帳票上に付着したノイズ９に対して
は、“ノ°の１個の認識候補文字が存在する。第６図は単語読取装置におけるカタカナの姓について一
般的な単語辞書４内の単語の一例を示す説明図である。図において、１２は“オカヤ”なる単語、１３は“オカ
ヤマ”なる単語である。第７図は単語読取装置の単語決定手段５に使用される一
般的なダイナミックプログラミングの考え方を用いる単
語決定手法の説明図である。単語を構成する文字列ｓｌ
、ｓ□・・・Ｓ、とｊｌ＋　　ｔ２・・・１．との照合
を行う場合に部分文字列Ｓｌ＋　　Ｓ！・・・Ｓ３とｔ
、’、ｔ、・・・ｔｊとが最もよくマツチした時のずれ
の量をｆ　（ｉ＋ｊｌ　　としたとき図において１４は
’　（ｉ−１＋ｊ−１１＋　１５はｆ　＜＝−＋、ｊ１
．１６は’　Ｎ、＝−ｔ）　、１７はｆ（＝、＝＞であ
り、１８はｆ　（ｉ１＋　ｊ−１１とｆ　（ｉ＋’ｊ＞
　　との間の距離ｄ、、１９は「（ｉ−１＋ｊ）　とｆ
　（Ｌｊ）　　との間の距離ｄｇ、２０はｆ（ｉ＋ｊ−
１）　とｆ　（ｉ＋ｊ）　　との間の距離ｄ、である。第８図は、入力単語文字列“オカヤマ゛７と単語辞書４
内の単語“オカヤ”１２との照合を説明した説明図であ
る０図において、２１は入力単語文字列“オカヤマ”７
の第１文字目の文字と単語辞書４内の単語“オカヤ”１
２の“オ゛との距離２２は第２文字目の文字力と単語辞
書４内の単語の“力”との距離、２３は第３文字目の文
字ヤと単語辞書内の単語の“ヤ”との距離、２４は文字
挿入誤りに対する距離であり、２５は最短距離を与える
経路である。第９図は、入力単語文字列“オカヤマ”７と単語辞書４
内の単語“オカヤマ”１３との照合ヲ説明した説明図で
ある。図において、２６は入力単語文字列“オカヤマ”
７の第１文字目の文字と単語辞書内の単語“オカヤマ３
１３の“オ”との距離、２７は第２文字目の文字力と単
語辞書４内の単語の“力”との距離、２８は第３文字目
の文字ヤと単語辞書４内の単語の“ヤ”との距離、２９
は第４文字目の文字マと単語辞書内の単語の“マ”との
距離であり、３０は最短距離を与える経路である。第１０図は、入力単語文字例“オカヤ”８及びノイズ９
と単語辞書４内の単語“オカヤ”１２との照合を説明し
た説明図である。図において、３１は入力単語文字列“
オカヤ”８の第１文字目の文字オと単語辞書内の単語“
オカヤ″１２の“オ”との距離、３２は入力単語文字列
の第２文字目の文字力と単語辞書４内の単語の“力”と
の距離、３３は人力単語の第３文字目の文字ヤと単語辞
書４内の“ヤ”との距離、２４は第８図と同様に１文字
挿入誤りに対する距離であり、３４は最短距離を与える
経路である。第１１図は、入力単語文字列“オカヤ”８及び帳票に付
着したノイズ９と単語辞書４内の単語“オカヤマ”１３
との照合を説明した説明図である。図において３５は入
力単語文字列“オカヤ”８の第１文字目の文字オと単語
辞書４内の単語“オカヤマ”１３の“オ”との距離、３
６は第２文字目の文字力と単語辞書４内の単語の“力”
との距離、３７は第３文字目の文字力と単語辞書内の単
語の“ヤ”との距離、３８は帳票に付着したノイズ９と
単語辞書４内の単語の“マ”との距離であり、３９は最
短距離を与える経路である。次に従来装置の動作について説明する。第３図中の帳票に記入された入力単語文字列と第６図に
示す単語辞書４内の単語との照合を単語決定手段５にて
行う、単語の同一性の確率について順位決定はダイナミ
ックプログラミングの考え方を用いる下記の手法で行う
。〈単語決定手法〉入力単語文字列？＋　　８３””＋　、３ｍ・・・Ｓ、
と単語辞書４内の単語Ｔ＝ｔ１．ｔｔ・・・ｔ、との照
合を行うとき、文字Ｓｉとｔ、との距離という概念を導
入し、これをｄ、とする０部分文字列５ＩｎＳ！・・・
３ｉと”Ｉ＋”ｔ・・・ｔＪとが最もよくマツチしたと
きのずれの量ｆ（ｉ＋ｊ）を導入し、これを次の漸化式
によって計算する。ｆ（Ｏ・０）＝Ｏｆ（ｉ＋Ｊ）　　”ｍｌ　ｎ　（ｆ　１ｉ−１＋ｊ）　
　”　ｄｔ　＊’　（ｉ＋Ｊ−１１”　ｄｌ　＋　　ｆ
　（ｉ−１＋ｊ−１）　　”　ｄｌ　）ｄｌ　：文字Ｓ
ｉに対する認識候補文字中に１１があればその順位、認
識候補文字中に１、がなければＰ（定数）ｄ！　：定数ｄ、：定数ｍ　ｉ　ｎ　（Ｘ、Ｙ、Ｚ）　：　Ｘ、　Ｙ、　　Ｚの
中の最小値すなわち、第７図に示すようにｆ　＜＝−＋
、Ｊ、１５からｆｔ＝、Ｊ＋１７への経路はｓｔ、ｓｉ
・・・３　ｉ−１と１．．１．・・・ｔＪまでの照合が
すんでいて、そのずれの程度が’　（トＩｎハ　１５で
あり文字Ｓ、とＴの中の文字との照合を行わない（１文
字の挿入誤りを許す）ことを意味する。この場合、文字
Ｓ、は余分に入った文字であると考え、そのために罰則
ｄ２１９を与え、’　（ｉ＋ｊｌ　　＝ｆ　（ｉ−１’
＋ｊ＋　　”ｄｔとする。ｆ　＜ｒ、Ｊ−＋）１６からｆ　ｔ＝、ｉ、１７への経
路も同様で１．は余分な文字として入力単語文字列Ｓの
中の文字とは照合せず（１文字の欠如誤りを許す）、そ
のかわりに罰則ｄｓ２０を与えｒ　Ｔｉ＋　ｊｌ　　−
ｆ　ＴＬｊ−１１＋ｄ３　とする。ｆ　（ｉ−＋＋ｊ−Ｈ１４からｆ山、）の経路は文字Ｓ
、と文字１Ｊの照合を行う経路で、この照合の度合をｄ
、とする。ｄ、は以下の値をとる。この漸化式により得られるｆ　（１＆＋Ｒ）を入力単語
文字列ｓｌ、ｓ、・・・Ｓ、と単語辞書内の単語ｔＩ＋
ｔ２・・・ｔ７とのずれの量として、ずれ量を単語辞書
内の全単語に対して求め、ずれの量の小さい順に並べ換
え、候補単語を出力する。前記単語決定手法を用いて、予め定めたｄ、＝３０、ｄ
、＝３０．Ｐ＝２０とした場合について説明する。第４
図中の入力単語文字列“オカヤマ”７と第６図に示す単
語辞書４内の単語との照合を単語決定手段５にて行う。まず、単語辞書４内の単語“オカヤ”１２と照合をとる
。第８図に示すように入力単語文字列Ｓを“オカヤマ”
７とし、単語辞書内の単語Ｔを“オカヤ“１２とする。ｆ（。＋Ｘ）　　（Ｘ≧１）の値を求める時漸化式は、
’　（Ｏｎ　Ｘ）　　””　’　（０，Ｘ−１１＋ｄ：
ｌとなり、ｆ、。ｎ＋１　　＝’　　ｄｌ　＝３９ｆ（
。、ｚ＞　　＝２ｄ３　＝６０ｆ（。、３）　　＝３ｄｓ　＝９０となる。同様にｆ　ＴＩｌ＋。、　（Ｘ≧１）の値を求める時漸
化式は、ｆ　（Ｘ、。、＝ｆ、ヨー１．。、＋ｄ２とな
り、ｆ（−・ｏ＞　　”　　　ｄｚ　　＝３０ｒ　、、
、。、　　＝２ｄｚ　　＝６０ｆ（３，。、　　＝３ｄ
ｚ　　＝９０ｆ（４，。＞　　””４ｄｔ　　＝１２０となる。次に、Ｓ、（人力単語７“オカヤマ”の“オ”）と１＋
　　（単語辞書４内の単語１２“オカヤ°の“オ”）の
照合をとると８１の認識候補文字１０の第１位にｔｌが
あるので、第８図のｄ１＝１（２１）となり ’　（１，ｌ）　　＝”　’　”　　（’　（６＋ｌｌ
　　＋ｄｔ　＋　　’　（ｉｏｌ＋１＋ｄ、、ｆ、。、
。、＋１）＝ｍｉｎ　（３０＋３０．３０＋３０．１）＝１となる。以下、同様にｓｌと＋２の照合をとるとＳ、の認識候補文字１０中に
ｔ！が存在しないのでｄｌ＝２０となり、ｆ　（１，１
）　　＝ｍ　ｉ　ｎ　（ｆ　（１１，１１＋ｃＬ　＋　
　ｆ　１１．１）＋ａ、、ｒ（。、ｉ、＋２０）＝ｍｉｎ　　（６０＋３０．　１＋３０゜３０＋２０）＝３１となる。３ｔとｔｌの照合をとるとＳ！の認識候補文字１０中に
１．が存在しないのでａ、＝２０となり、’　（２１１
＝ｍｉｎ（’　（Ｉ１１＋　　十ｄｌ　＋　　’　（２
，０１”　ｄ　２　ｒ　　’　（１，。、　　＋２０）
＝ｍｉｎ　（１＋３０．６０＋３０゜３０＋２０）＝３１となる。ｓ２と１ｔの照合をとるとｓｔの認識候補文字ｌＯの第
１位に＋２があるので第８図のｄ、＝１（２２）となり ’　（１＋り　　””ｍ’　ｎ（’　（Ｉｎり　　十ｄ
ｌ　＋　　’　１２．ｌｌ＋ｄ、・　ｆｕ、ｎ　　＋１
）＝ｍｉｎ　（３１＋３０．３１＋３０゜１＋１）＝２となる。Ｓ、とｔ、の照合をとると３１のｉ！識候補文字ｌＯ中
に＋２が存在しないのでｄ＋＝２０となり、’　１．３
＞　　”ｍ’　ｎ（’　（ａ＋３）　　＋ｄｔ　＋　　
’　（１＋２１＋ｄ３＋ｆ（。、ｚ）＋２０）＝ｍｉ　ｎ　（９０＋３０．　３１　＋３０゜６０＋２
０）＝６１となる。Ｓ、とｔｌの照合をとるとＳ、の認識候補文字ｌＯ中に
ｔ、が存在しないのでａ＋＝２０となり、’　　ＩＩ・
　＋ｌ　　　＝ｍ　　ｌ　　ｎ　　　（ｆ　　（ｔ・　
＋１　　　＋　　ｄ　、　　ｌ　　　ｆ　　（３１０）
＋ｄ　２　＋　　’　ｌ！＋。、＋２０）＝ｍｉｎ　（
３１＋３０．９０＋３０゜６０＋２０）＝６１となる。３ｔとｔ、の照合をとると３２の認識候補文字ｌＯ中に
[Industrial Application Field] This invention relates to a word reading device that reads and recognizes words such as addresses and names, and in particular, recognizes each character that makes up a word and uses the recognition results to modify the word. This invention relates to a word reading device. (Prior art) In recent years, word reading devices have been attracting attention as a means of high-speed, large-volume data entry into computers. It is desired to develop a word reading device that can read words with high accuracy. As this word reading device, an optical character reading device (OCR) is known. )) etc. In the figure, 1 is a form on which characters are written, 2 is a scanning means for photoelectrically converting and reading the form 1, and 3 is a scanning means for reading input characters.
Character recognition means that cuts out and recognizes each character and outputs recognition candidate characters; 4 is a word dictionary that stores reference words; 5 is recognition candidate characters output from character recognition means 3 and words in word dictionary 4; This is a word determining means that determines a word by comparing the words. The conventional word reading device is configured as described above, and outputs words that are ranked higher among the recognition candidate characters and are registered in the word dictionary 4 and are preferentially selected as the word reading results. be. FIG. 3 is an explanatory diagram showing an example of a general form 1 to be read. In the figure, 7 is the input word string "Okayama", 8 is the input word string "Okaya", and 9 is noise (non-character) attached to the form. Examples of recognition candidate characters output from the character recognition means 3 for the characters on this form are shown in FIGS. 4 and 5. FIG. 4 shows the recognition candidate characters 10 output by the character recognition means 3 for each character of the input word character string "Okayama" 7, and for the first character "O" of the input word. ,“
One recognition candidate character for ``o'' is for the second character ``chi,'' one recognition candidate character for ``power'' is for the third character ``ya.''``Ma'' 5 There are two recognition candidate characters for ``ya,'' and for the fourth character ``ma,'' there is one recognition candidate character for ``nu.'' FIG. 5 shows the recognition candidate characters 11 output by the character recognition means 3 for each character of the input word character string "OKAYA" 8 and the noise 9 attached on the form. For the first character "o", one recognition candidate character is "o", and for the second character "power", one recognition candidate character is "power".
For the third character “Ya”, there are two recognition candidate characters, “Ma” and “Ya”, and for Noise 9 attached on the form, there is one recognition candidate character, “No°”. Figure 6 is an explanatory diagram showing an example of words in a general word dictionary 4 regarding katakana surnames in a word reading device.In the figure, 12 is the word "Okaya", 13 is the word "Okayama" 7 is an explanatory diagram of a word determination method using the idea of general dynamic programming used in the word determination means 5 of the word reading device.The character string sl constituting the word
, s□...S, and jl+ t2...1. When performing matching with substring Sl+S! ...S3 and t
, ', t, ... tj are best matched and the amount of deviation is f (i+jl) In the figure, 14 is ' (i-1+j-11+ 15 is f <=-+, j1
．． 16 is 'N,=-t), 17 is f(=,=>, and 18 is f (i1+ j-11 and f (i+'j>
The distance d,,19 between (i-1+j) and f
(Lj) and the distance dg, 20 is f(i+j-
1) The distance d between and f (i+j). Figure 8 shows the input word character string “Okayama 7” and the word dictionary 4.
In Figure 0, which is an explanatory diagram illustrating the matching with the word "Okayama" 12 in , 21 is the input word string "Okayama" 7
The first letter of and the word “okaya” in word dictionary 4
The distance 22 from ``O'' in 2 is the distance between the character power of the second character and the "power" of the word in the word dictionary 4, and 23 is the distance between the character ya of the third character and the "power" of the word in the word dictionary 4. ”, 24 is the distance for character insertion errors, and 25 is the path that gives the shortest distance. Figure 9 shows the input word string “Okayama” 7 and the word dictionary 4.
FIG. 3 is an explanatory diagram illustrating a comparison with the word “Okayama” 13 in the above. In the figure, 26 is the input word string “Okayama”
The first letter of 7 and the word “Okayama 3” in the word dictionary
13 is the distance from “o”, 27 is the distance between the second character power and the word “power” in the word dictionary 4, and 28 is the distance between the third character ya and the word in the word dictionary 4. Distance from “Ya”, 29
is the distance between the fourth character ``ma'' and the word ``ma'' in the word dictionary, and 30 is the route that provides the shortest distance. Figure 10 shows input word character examples “okaya” 8 and noise 9.
FIG. 4 is an explanatory diagram illustrating a comparison between the word "OKAYA" 12 in the word dictionary 4 and the word "OKAYA" 12 in the word dictionary 4. In the figure, 31 is the input word string "
Okaya "8's first letter O and the word in the word dictionary"
32 is the distance between the character power of the second character of the input word string and the "power" of the word in the word dictionary 4, 33 is the distance of the third character of the human word The distance between the character ya and "ya" in the word dictionary 4, 24, is the distance for a single character insertion error as in FIG. 8, and 34 is the path that gives the shortest distance. Column “Okayama” 8, noise attached to the form 9, and word “Okayama” 13 in the word dictionary 4
FIG. In the figure, 35 is the distance between the first character O of the input word string "Okaya" 8 and the "O" of the word "Okayama" 13 in the word dictionary 4, 3
6 is the power of the second character and the “power” of the word in word dictionary 4
37 is the distance between the third letter strength and the word "ya" in the word dictionary, and 38 is the distance between the noise 9 attached to the form and the word "ma" in the word dictionary 4. 39 is the route that provides the shortest distance. Next, the operation of the conventional device will be explained. The word determining means 5 compares the input word character string entered in the form shown in FIG. 3 with the words in the word dictionary 4 shown in FIG. 6, and the ranking of the probability of word identity is determined by dynamic programming. This is done using the following method using the following concept. <Word determination method> Input word string? +83""+, 3m...S,
and the word T in the word dictionary 4 = t1. When performing matching with tt...t, we introduce the concept of the distance between the characters Si and t, and set this as d to create a 0 substring 5InS! ...
The amount of deviation f(i+j) when 3i and "I+"t...tJ are most closely matched is introduced, and this is calculated using the following recurrence formula. f(O・0)=O f(i+J) ”ml n (f 1i-1+j)
"dt *'(i+J-11" dl + f
(i-1+j-1) ” dl ) dl: Letter S
If there is 11 in the recognition candidate characters for i, its rank is 1, if there is no recognition candidate character, P (constant) d! : Constant d, : Constant min (X, Y, Z) : Minimum value among X, Y, Z, that is, as shown in Figure 7, f <=-+
,J,15 to ft=,J+17 is st,si
...3 i-1 and 1. ．． 1. ...If the matching up to tJ has been completed, and the degree of deviation is In this case, the letter S is considered to be an extra character, and therefore a penalty d219 is given, ' (i+jl = f (i-1'
+j+ "dt. f < r, J-+) The route from 16 to f t =, i, 17 is the same, and 1. is treated as an extra character and is not compared with the character in the input word string S. (allowing for a one-letter omission error), and instead giving a penalty of ds20 r Ti+ jl −
Let f TLj-11+d3. The path of f (i-++j-H14 to f mountain,) is the letter S
, and the character 1J, the degree of this matching is d
, and so on. d takes the following values. Input the word character string sl, s, ...S, and the word tI+ in the word dictionary using f (1&+R) obtained by this recurrence formula.
As the amount of deviation from t2...t7, the amount of deviation is calculated for all words in the word dictionary, sorted in descending order of the amount of deviation, and candidate words are output. Using the word determination method, predetermined d,=30,d
,=30. The case where P=20 will be explained. Fourth
The word determining means 5 compares the input word character string "OKAYAMA" 7 shown in the figure with the words in the word dictionary 4 shown in FIG. First, a comparison is made with the word "okaya" 12 in the word dictionary 4. As shown in Figure 8, the input word string S is “Okayama”.
7, and the word T in the word dictionary is "Okaya" 12. When finding the value of f(.+X) (X≧1), the recurrence formula is:
' (On X) ""' (0,X-11+d:
l, and f. n+1 ='dl =39f(
. , z> =2d3 =60 f(.,3) =3ds =90. Similarly f TIl+. , When finding the value of (X≧1), the recurrence formula is f (X,.,=f, yaw1.., +d2, and f(-・o> ” dz =30r,,
,. , =2dz =60f(3,., =3d
z = 90 f(4,.> ””4dt = 120. Next, S, (“o” of human power word 7 “Okayama”) and 1+
(If we compare the word 12 "O" in word dictionary 4), tl is in the first position of 81 recognition candidate characters 10, so d1=1(21) in Figure 8' ( 1, l) = “ ' ” (' (6+ll
+dt + '(iol+1+d,,f,.,
. , +1) =min (30+30.30+30.1)=1. Similarly, when sl and +2 are compared, t is found among the 10 recognition candidate characters of S. does not exist, so dl=20, and f (1,1
) = min (f (11,11+cL +
f11.1)+a,,r(.,i,+20)=min(60+30.1+30°30+20)=31. When comparing 3t and tl, S! 1 out of 10 recognition candidate characters. does not exist, so a, = 20, and' (211
=min(' (I11+ 10dl + '(2
,01" d 2 r' (1,., +20)
=min (1+30.60+30°30+20) =31. When comparing s2 and 1t, +2 is in the first position of the recognition candidate character lO of st, so d in Figure 8 becomes = 1 (22)' (1+ri ``”m'n(' (Inri ten d
l + '12. ll+d,・fu,n+1
) =min (31+30.31+30°1+1) =2. If we compare S and t, we get 31 i! Since +2 does not exist in the identification candidate character lO, d+=20, and ' 1.3
>"m'n(' (a+3) +dt +
' (1+21+d3+f(.,z)+20) = min (90+30. 31 +30°60+2
0) =61. When comparing S, and tl, since t does not exist in the recognition candidate characters lO of S, a+=20, 'II.
+l = m l n (f (t・
+1 + d, l f (310)
+d2+'l! +. , +20)=min (
31+30.90+30°60+20) =61. When comparing 3t and t, among the 32 recognition candidate characters lO

【、が存在しないのでｄ、＝２０となり、ｆ　（ｔ、３
）＝ｍ　ｉ　ｎ　　（ｆ　＜＋、ｓ＞　　＋ｄｚ　　ｒ
　　ｆ　＜ｚ、ｔ＋＋ｄｚ　、　ｆ（ｔ、ｉ＞　　＋２
０）＝ｍｉｎ　　（６１＋３０．　２＋３０゜３１＋２
０）＝３２となる。Ｓ、と＋２の照合をとるとＳ、の認識候補文字１０中に
１ｔが存在しないのでａ、＝２０となり、ｆ＋ｚ、ｚ＋
　　＝ｍｉｎ　（ｆ（ｚ、ｔ＋　　＋ｄ、、　　ｒ（３
、。＋ｄ２　、　　ｆ　（！、１）　　＋２０＞＝ｍｉｎ　
（２＋３０．６１＋３０゜３１＋２０）＝３２となる。Ｓユとｔ、の照合をとるとＳ、の認識候補文字１０の第
２位にｔ、があるのでｄ、＝２　（２３）となり、ｆ（３，３１−ｍｉｎ（’　１２・り　　＋ｄｔ　＋　
　’　＋３・２１”ｄｚ　ｒ　　ｆ　ｔｔ、ｚ＞　　＋
２）＝ｍｉｎ　（３２＋３０．３２＋３０゜２＋２）＝４となる。ｓ４とｔｌの照合をとるとＳ４の認識候補文字０中にｔ
、が存在しないのでｄｌ−２０となり、ｆ　（４１１）
　　＝ｍ　ｉ　ｎ　（ｆ　（：１ｌｌ）　　十ｄｌ　１
　　ｆ　（４１１１１＋ｄ３　＋　　’　＋３＋。、＋
２０）＝ｍｉｎ　（６１＋３０．１２０＋３０゜９０＋
２０）＝９１となる。ｓ４とｔ、の照合をとるとＳ、の認識候補文字０中に１
２が存在しないのでｄ、＝２ｏとなり、’　（４１２＞
　””ｍ　ｌ　ｎ　（ｆ　１１＋２１　　十ｄｌ　ｒ　
　ｆ（４＋１）＋ｄｓ＊　　ｆ（３，ｎ　　＋２０）＝ｍｉｎ　（３２＋３０．９１＋３０゜６１＋２０）＝６２となる。ｓ４とｔ、の照合をとるとＳ、の認識候補文字１０中に
ｔ、が存在しないの７！ｄｌ　＝２０となり、’　＋４
＋３１　＝ｍ１　ｎ　（ｆ　（３−３）　　＋ｄｔ　＊
　　ｆ（４＋！＞＋ｄｉｔ　　ｆ＜ｓ、ｔ＞　　＋２０
）＝ｍｉｎ　（４＋３０．６２＋３０゜３２＋２０）＝３４となる。すなわち、入力単語文字列“オカヤマ”７と単語辞書内
の単語“オカヤ”１２とのずれの量ｆ（４・３）は・ｒ　（４，３１＝　１　（第８図中の２１）＋１　（第
８図中の２２）＋２（第８図中の２３）＋３０（第８図中の２４）＝３４となり、最小のずれの量ｆ（４１３＞を与える経路をバ
ックトラッキングにより求めると、（４，３）。（３，３）、（２，２）、（１，１）、　（０，０）の
各点を通る経路となり、２５の経路をたどる。次に単語辞書４内の単語“オカヤマ”１３と照合をとる
。第９図に示すように、入力単語文字列Ｓを“オカヤマ
”７とし、単語辞書内の単語Ｔを“オカヤマ”１３とす
る。Ｓ、（入力単語文字列“オカヤマ０７の“オ”）と
１＋　　（単語辞書内の単語“オカヤマ”１３の“オ°
）の照合をとると文字Ｓ、の認識候補文字ｌＯの第１位
にｔｌがあるので、第９図のｄｌ　＝１　　（２６）と
なりｆ　ｌＩ＋ｌｌ　　”ｍ　１　ｎ　（’　ｌ＠＋１
＞　　＋ｄｔ　＊　　’　（＋＋０１＋ｄ３・　ｒ　（
ｏ・ｏ）＋１）＝ｍｉｎ　（３０＋３０．３０＋３０．１）＝１となる。同様に、Ｓ２と１ｔの照合をとると３２の認識候補文字
ｌＯの第１位に＋２があるので、第９図のａ、＝１　　
（２７）となり ’　＋２．１１　　”ｍｌ　ｎ　（’　（Ｉｎり　　＋
ｄｌ　ｌ　　’　＜１＋１＋”ｄｌ　１　　ｆ　（１１
１１＋　１）＝ｍｉｒｌ　（３１＋３０．３１＋３０゜
１＋１）＝２となる。同様に、Ｓ、とｔ、の照合をとるとＳ、の認識候補文字
ｌＯの第２位にｔ、があるので、第９図のｄｌ　＝２　
（２８）となりｆ（２＋３）　　””ｍ　ｌ　ｎ　（’　（ｔ＊２１　
　”　ｄｌ　＊　　’　（３，り＋ｄ３・　ｆ（！・ｚ
＞”２）＝ｍｉｎ　（３２＋３０．３２＋３０゜２＋２）＋４となる。同様に、Ｓ４と＋４候補文字１０中に＋４のｄ、　＝２０　（２９）（，４−、＝ｍ　ｉ　ｎ＝ｍ１ｎの照合をとると３４の認識が存在しないので、′第９図となり（ｆ＋１．４）＋ｄ２・　【（４・コ）＋ａ、　、　　
ｆ　（３，３１＋２０）（３３＋３０．３４＋３０゜４＋２０）＝２４となる。すなわち、入力単語文字列“オカヤマ”７と単語辞書４
内の単語“オカヤマ″１３とのずれの量’（４，４）はｆ　、ａ、−＋　　＝　１　（第９図中の２６）＋１　
（第９図中の２７）＋２　（第９図中の２８）＋２０　（第９図中の２９）＝２４となり、最短経路は３０となる。上記結果茨り、入力単語文字列“オカヤマ”７と単語辞
書内の単語“オカラ”１２とのずれの量は３４．入力単
語文字列“オカヤマ゛７　と単語辞書内の単語“オカヤ
マ”１３とのずれの量は２４となり、候補単語はずれの
量の小さい、“オカヤマ”１３．”オカラ″１２の順に
出力され、正解単語が上位で得られる。また、第５図中の帳票上に記入された入力単語文字列“
オカラ゛８及び帳票上に付着したノイズ９と第６図に示
す単語辞書４内の単語との照合を単語決定手段にて行う
。前記単語決定手法により、単語辞書内の単語“オカラ”
１２と照合を行うど第１θ図に示すように、入力単語文
字列“オカラ”８及び帳票上に付着したノイズ９と単語
辞書内の単語“オカラ“１２とのずれの量’（４＋ｆｌ
ｌ　は、ｆ　（４，３１＝　１　（第１０図の３１）＋
１　（第１０図の３２）＋２（第１０図の３３）＋３０（第１０図の３４）＝３４となり、最短経路は３４となる。次に、単語辞書４内の単語“オカヤマ”１３と照合を行
うと、第１１図に示すように人力単語文字列“オカラ”
８及びノイズ９と、単語辞書４内の単語“オカヤマ″１
３とのずれの量ｆ　（４＋　４１は、ｆ（４，。＝１　
（第１１図の３５）＋１　（第１１図の３６）＋２（第１１図の３７）＋２０（第１１図の３８）＝２４となり、最短経路は３９となる。上記結果より、入力単語文字列“オカラ”８及びノイズ
９と、単語辞書内の単語“オカラ°１２とのずれの量は
３４．入力単語文字列“オカラ”８及びノイズ９と、単
語辞書内の単語“オカヤマ”１３とのずれの量は２４と
なり、候補単語はずれの量の小さい、３オカヤマ”１３
．“オカラゝ１２の順に出力され、正解単語“オカラ”
１２が上位で得られない。〔発明が解決しようとする課題〕従来の単語読取装置は上記のように構成され、単語決定
において単語を照合する際に文字の挿入誤りに対する罰
則が一定であるために、帳票の記入文字枠上に非文字の
ノイズ等が存在した場合に、ノイズ等まで文字認識の対
象としてしまうので正解単語が得られないという問題点
があった。この発明はかかる問題点を解決するためになされたもの
で、帳票の記入文字枠上にノイズ等が存在しても正解に
単語を照合することができる単語読取装置を得ることを
目的とする。〔課題を解決するための手段〕この発明においては、帳票１上に文字等を光電変換して
画像データを発生する走査手段２と、該文字等の画像デ
ータについて第１次文字認識を行い認識候補文字列１０
．１１を出力する文字認識手段３と、予め各種の単語が
登録された単語辞書４と、認識候補文字列１０．１１に
ついて単語辞書４から選出した単語１２．１３と照合し
各種パラメータを含む所定の演算式を用いて演算し、同
一性の確率値の高い単語を順位付けて選定する単語決定
手段５９とからなる単語読取装置において、文字等の画
像データについて非文字の要素率を検出して出力する疑
似ノイズ性文字検出手段６を設け、単語決定手段５９が
この非文字の要素率を各種パラメータに組み入れて演算
を行い、同一性の確率値の高い単語を順位付けて選定す
る。（作用〕文字等の画像データ中に非文字のノイズ等が存在すれば
、疑似ノイズ性文字検出手段６により非文字の要素率を
検出する。この非文字の要素率をパラメータとして従来
の各種パラメータに追加して組み込んで所定の演算式を
用いて演算し、認識候補文字列１０．１１がどの単語１
２．１３と最も同一性の確率が高いかを単語決定手段５
９にて選定し、高順位の単語１２．１３を帳票１から入
力された文字列であると決定する。〔実施例〕以下、この発明の一実施例を図面に基づいて説明する。第１図において、１は文字が記載されている帳票、２は
この帳票ｌを光電変換により読み取る走査手段、３は入
力文字を１文字毎に切り出して認識し、第１次の認識候
補文字を出力する文字認識手段、４は基準単語を格納し
た単語辞書、６は帳票に記入された入力文字毎の画像を
観測し、非文字の画像例えばノイズと予測される文字を
検出する疑似ノイズ性文字検出手段、５９は疑似ノイズ
性文字検出手段６から出力された疑似ノイズ性文字の情
報を用いて、文字認識手段３から出力された認識候補文
字と単語辞書４内の基準単語とを照合、して読取った単
語を決定する単語決定手段である。疑似ノイズ性文字検出手段６は走査手段２から入力され
た個々の画像データについて、その高さと幅のデータを
個々に計測して出力するものである。本発明の単語読取装置でも、第３図に示す一般的な帳票
１を入力するものとする。又文字認識手段３は第３図の
帳票１の入力により、第４図に示す認識候補文字１０及
び第５図の入力候補文字１１を出力する。更に単語辞書
４内には第６図に示すような単語が予め登録されている
。第２図は、単語読取装置の疑似ノイズ性文字検出手段６
の一例を説明するための説明図で、一般的な帳票１を示
す第３図中の７．８．９の各文字やノイズに対する画像
の幅、高さを示すものである６図において、４０は入力
単語文字列“オカヤマ″の第１文字目“オ゛の画像の幅
、高さ、４１は第２文字目“力”の画像の幅、高さ、４
２は第３文字目“ヤ゛の画像の幅、高さ、４３は第４文
字目“マ”の画像の幅、高さ、４４は入力単語文字列“
オカラ”の第１文字目“オ”の画像の幅。高さ、４５は第２文字目“力”の画像の幅、高さ、４６
は第３文字目“ヤ”の画像の幅、高さ、４７は帳票に付
着したノイズの画像の幅、高さである。第１０図は、疑似ノイズ性文字に対する文字挿入誤りの
罰２則を小さくした時、第３図の入力単語文字列“オカ
ラ”８及びノイズ９と第６図の単語辞書４内の単語“オ
カラ”１２との照合を説明し゛た説明図である。図にお
いて、４８は文字挿入誤りの罰則が小さくなる経路、３
１は入力単語文字列“オカラ”８の第１文字目“オ″と
単語辞書内の単語“オカラ”１２の“オ″との距離、３
２は入力単語文字列の第２文字目“力”と単語辞書４内
の“力”との距離、３３は人力単語文字列の第３文字目
“ヤ”と単語辞書４内の“ヤ”との距離、２４は１文字
挿入誤りに対する距離であり、３４は最短距離を与える
経路である。第１１図は、疑似ノイズ性文字に対する文字挿入誤りの
罰則を小さくした時、第３図の入力単語文字列“オカラ
”８及びノイズ９と第６図の単語辞書４内の単語“オカ
ヤマ”１３との照合を説明した説明図である０図におい
て、４８は第１０図同様、文字挿入誤りの罰則が小さく
なる経路、３５は入力単語文字列“オカラ”８の第１文
字目“オ”と単語辞書４内の単語“オヵヤマ”１３の“
オ”との距離、３６は入力単語文字列の第２文字目 “力°と単語辞書４内の単語の“力′との距離、３７は
入力単語文字列の第３文字目“ヤ”と単語辞書４内の単
語の“ヤ”との距離、３８は帳票上に付着したノイズ９
と単語辞書内の単語の“マ”との距離であり、３９は最
短距離を与える経路である。単語決定手段５９は疑似ノイズ性文字検出手段６からの
第２図に示す出力信号を参照して第７図に示される一般
的なダイナミックスプログラミングの手法を使用する。次に動作について説明する。疑似ノイズ性文字検出手段
６における検出基準を例えば、下記のようにする。「文字の画像の幅、高さが双方共に４以下のとき、その
文字を疑似ノイズ性文字とする。」この基準を第２図に
示した人力文字の幅、高さに適用すると、図中４０〜４
６に示される幅、高さはいずれも４より大きく、入力単
語文字列“オカヤマ”及び“オカヤ”の各文字は疑似ノ
イズ性文字とはならないが、４７に示される幅、高さは
いずれも４を下回っており、疑似ノイズ性文字と判定さ
れる。次に、単語決定手段５９は単語辞書４との照合を行う時
に疑似ノイズ性文字に対する即ち非文字の要素率として
文字挿入誤りの罰則を小さくする。つまり、単語決定手法の漸化式を下記のようにする。〈単語決定手法〉帳票からの入力単語文字列７．８Ｓ＝ｓ、。Ｓ２・・・Ｓ７と単語辞書４内の単語Ｔ＝ｔ＋　、Ｌｘ
　。・・・Ａ９との照合を行うとき、文字Ｓｉ　と１．との
距離の概念を導入し、これをｄｌとする０部分文字列Ｓ
ｌ＋３！・・・Ｓｉ　とｊｌ＋’ｍ・・・ｔｊとが最も
マツチしたときのずれ量ｆ　（ｉｎｊ）を導入し、これ
を次の漸化式によって計算する。ｆ　（０・１１）＝０ ’　（ｊ・ｊ）　＝ｍ　１　ｎ　（’　１ｉ−１−Ｊ）
　　十ｄｔ　＋ｆ　（ｌ・ｊ−１１＋ｄ３・　ｆ（ム−
１＋　Ｊ−１）　　”　ｄ　１　）ｄ、二文字Ｓｉに対
する認識候補文字中に１゜があればその順位、認識候補
文字中に１ｊがなければＰ（定数）ｄ２　：定数ｄ、：定数ｍｌ　ｎ（ＬＹ４）：　Ｘ、Ｙ、Ｚの中の最小値すなわ
ち、第７図に示すようにｆ　＋ｉ−ｔ、ｊ＋　　１５か
らｒ　ｔｔ、＞　　１７への経路はｓＩ＋　　３２・・
・Ｓ　ｉ−１とＬｌｉｉｍ・・・１ｊまでの照合がすん
でいて、そのずれの程度が’　（ｉ−１ｎハ　１５であ
り文字Ｓ、とＴの中の文字との照合を行わない（１文字
の挿入誤りを許す）ことを意味する。この場合、文字Ｓ
。は余分に入った文字であると考え、そのために罰則ｄ２
１９を与え、ｆ（ｉｎ　Ｊ）　＝’　（ｉ−１，ｊ）　
　十ｄ２とする。ｆ　ｔｔ、ｊ−ｎ　　１６からｒ（Ａ１）　１７への経
路も同様で１．は余分な文字として入力単語文字列Ｓの
中の文字とは照合せず（１文字の欠如誤りを許す）、そ
のかわりに罰則ｄ、２０を与え’（ｉ、ｊ）＝ｒ（ｉ、
ｊ−Ｈ＋ｄ　２・とする。そこで、ｄ、を以下のように単語決定手段５９は、と指定し、第４図中の入力単語文字列“オカヤマ”７と第６図の単
語辞書４内の単語との照合を行う。この場合は、入力単
語文字列“オカヤマ”７のいずれの文字も疑似ノイズ性
文字でないので、漸化式において、予め定めたｄｔ　”
ｄｔｂ＝３０．ｄ３　＝３０、Ｐ＝２０とし、従来例と
全く同一の動作をする。入力単語文字列“オカヤマ゛７
と単語辞書内の単語“オカヤ”１２とのずれの量ｆ（ａ
、３＋　は、ｆ　１４．３＋　　”　ｌ　　（第８図中
の２１）＋１（第８図中の２２）＋２（第８図中の２３）＋３０（第８図中の２４）＝３４となり、最小のずれの量’（４＋３１を与える経路をバ
ンクトラッキングにより求めると、（４，３）。（３，３）、（２，２）、（１，１）、（０，０）の各
点を通る経路となり、２５の経路をたどる。又、入力単語文字列“オカヤマ゛７と単語辞書４内の単
語“オカヤマ”１３とのずれのｆｆｉ　ｆ　（４，４１
はｆ　（４，４１＝　１　　（第９図中の２６）＋１（第
９図中の２７）＋２（第９図中の２８）＋２０（第９図中の２９）となり、最短経路は３０となる。従って単語決定手段５９から候補単語はずれの小さい単
語辞書４内の“オカヤマ”１３とその次に“オカヤ゛１
２の順に出力され、正解単語が上位で得られる。第５図中の入力単語文字列“オカヤ゛８及びノイズ９と
、第６図の単語辞書４内の単語との照合を単語決定手段
にて行う場合、帳票上に付着したノイズ９が疑似ノイズ
性文字と判定されていて、漸化式において、非文字の要
素率を入れて、ｄ８＝ｄｘ−＋１０．ｄｚｂ＝３０．ｄ
３　＋３０．Ｐ＋２０とする。以下、照合動作の説明を
行う。第５図中の入力文字と単語辞書内４の単語“オカヤ”８
及びノイズ９となり、第６図の単語辞書４内の単語Ｔ−
は“オカヤ″１２となる。Ｓｌ　　（入力単語文字列“オカヤ°８の“オ”）と１
１　　（単語辞書内の単語“オカヤ°１２の“オ”）の
照合をとると３１の認識候補文字１１の第１位に１．が
あるので、ｄ、−１（３１）となり ’　＋１１１　−ｍ’　ｎ（’　（ｌｕｌｌ　　”　ｄ
ｔｂ＋　　’　（Ｉｃｅ）＋ｄ、、ｆ、。、。、　　＋
　１　）＝ｍｉｎ　（３０＋３０．３０＋３０．ｌ）となる。同様に、Ｓ２と１ｔの照合をとると３２の認識候補文字
１１の第１位に１ｔがあるので、ｄ、＋１（３２）とな
りｆ山り　　””’　ｌ　ｒｌ　　（ｆ　（１＋ｔ）＋ｄ
ｚｂ＊　　ｆ　１１口）＋ｄｓ　＊　　ｆ　（１，１）
　　＋　１）＝ｍｉｎ　（３１＋３０．３１＋３０゜１
＋１）＝２となる。同様に、Ｓ３とｔ、の照合をとるとＳ、の認識候補文字
１１の第２位にｔ、があるので、ｄ、＝２（３３）とな
り ’　（３＋り　　”ｍ’　ｎ（’　（ｔ・２）　　十ｄ
Ｚｂ＋　　’　（３＋ｇ）＋ｄ３＊　　ｆ　ｔｚ＋ｔ＞
　　＋２）＝ｍｉｎ　（３２＋３０．３２＋３０゜２＋
２）＝４となる。同様に、Ｓ４と＋４の照合をとると３４の認識候補文字
ｌｌ中に＋４が存在しないので、ｄ、＋２０、またＳ４
は疑似ノイズ性文字なのでｆ（４，。（ｘ＝０．１，２．３）を求める時の文字挿入誤りの経
路４８の罰則はｄｚ＝ｄｚ−＋１０となるのでｆ　（４，３）　　”’ｍ　ｉ　ｎ　　（ｆ　（＊、３
）　　＋ｄｚａ、　　ｆｎ、ｚ）＋　ｄ　ロ１　　・　
　ｆ　　（＝奪・　コと、　　　＋ｄ　　皿　　）＝ｍ
ｉｎ　　（４＋１０．　４２＋３０゜３２＋２０）＋１４となる。すなわち、入力単語文字列“オカヤ”８及びノイズ９と
、単語辞書４内の単語“オカヤ”１２とのずれの量ｆ（
４，３３はｆ　＜ａ、ｓ＋　　＝　１　　（第１０図中の３１）＋
１（第１Ｏ図中の３２）＋２（第１０図中の３３）＋１０（第１０図中の２４）＋１４となり、最短経路は３４となる。一方、第６図の単語辞書４内°の単語“オカヤマ。１３と第５図の入力文字とノイズを照合すると、第１１
図に示すように入力単語文字列Ｓは“オヵヤ”８及びノ
イズ９となり、単語辞書内の単語Ｔは“オカヤマ″１３
となる。！、（入力単語文字列“オカヤ”８の“オ”）と１＋　
　（単語辞書内の単語“オカヤマ”１３の“オ”）の照
合をとるとｓ、の認識候補文字１１の第１位にｔ鳳があ
るので、ａ、−ｔ　（３５）となり ’　　（ｌｕｌｌ　　　＝ｍ　　’　　ｎ　　　（ｆ　
　（Ｏｗｌ）　　　＋　ｄＺｂ＋　　　ｆ　　（１，６
１＋ｄ、　　　・　　　ｆ　　（Ｏ・　＋１＞＋１）＝
ｍｉｎ　　（３０＋３０．３０＋３０．１）＋１となる。同様に、ｓ２とｔ！の照合をとると３３の認識候補文字
１１の第１位に１ｔがあるので、ｄ、＋１（３６）とな
り ’　　（！＋　鵞１　　　”ｍ　　”　　　（ｆ　　（
＋＋２）　　　”　　ｄ！ｂ＋　　　’　　（ｌｕｌｌ
＋ｄｓ＊　　ｆ（＋、＋＋　　＋１）＝ｍｉｎ　　（３１＋３０．　３１＋３０゜１　＋１）＝２となる。同様に、Ｓ、とｔ、の照合をとると３３の認識候補文字
１１の第２位にｔ、があるので、ｄ、＝２（３７）とな
り ’　（３＋３＞　　＝ｍ１　ｎ　　（ｆ（１＋２１　　
＋ｄｔｂ＊　　’　＋３・２）＋ｄ３　、　　ｆ　（ｔ
、ｔ＋　　＋２）＝ｍｉｎ　（３２＋３０．３２＋３０
゜２＋２）＝４となる。同様に、Ｓ４と＋４の照合をとると５４の認識候補文字
ｌｌ中に＋４が存在しないのでｄ、＝２０、またＳ４は
疑似ノイズ性文字なのでｆ　（４，１（ｘ＝０．１，２
．３）を求める時の文字挿入誤りの経路４８の罰則はｄ
、＝ｄ、、＝ｔｏとなり、’　（４・４１　　＝ｍ１　
ｎ　（Ｅ　＜３＋４＋　　”　ｄ！１１＋　　’　（４
，３１＋ｄ３・　ｆ’　（３・ｓ）＋２’Ｏ’）＝ｍｉ
ｎ　　（３３＋１０．　３４＋３０゜４＋２０）＝２４となる。すなわち、入力単語文字列“オカヤ”８及びノイズ９と
、単語辞書内の単語“オカヤマ”１３とのずれの量ｆ（
４，。はｆ、４．。＝１　（第１１図中の３５）＋１　（第１１
図中の３６）＋２（第１１図中の３７）＋２０（第１１図中の３８）となり、最短経路は３９となる。上記結果より、入力単語文字列“オカヤ”８及びノイズ
９と、単語辞書内の単語“オカヤ”１２とのずれの量は
１４、一方入力単語文字列“オカヤ”８及びノイズ９と
、単語辞書内の単語“オカヤマ”１３とのずれの量は２
４となり、候補単語はずれの量の小さい“オカヤ”の次
に“オカヤマ”の順に出力され、正解単語“オカヤ１が
上位で得られる。このように、疑似ノイズ性文字検出手段６を用いること
により入力文字毎の画像を観測してノイズと予測される
文字を検出し、疑似ノイズ性文字と判定された文字の文
字挿入誤りに対する罰則を小さくして単語照合を行うの
で帳票の記入文字枠上のノイズが存在しても正確に単語
を照合できる。なお、上記実施例では疑似ノイズ性文字検出手段６の検
出基準として画像の幅、高さを用いたが、画像の黒点数
や画像の形状など他の検出法を用いることもできる。上記実施例では単語決定手段５９の単語照合法としてグ
イナミソクプログラミングの考え方を用いるＤＰマツチ
ングの手法を用いたが、他の照合手法を用いてもよい。上記実施例では入力単語文字列と単語辞書内の単語を構
成する文字とを照合する゛とき、その照合の度合として
認識候補文字の順位を用いたが、他の情報（例えば、文
字を認識した時に得られる入力文字がその文字である確
率など）を用いてもよい。また、上記実施例ではカタカナの姓について述べたが漢
字・英字などの単語、住所・会社名などの単語に関して
もこの発明を適用することができる。又非文字の要素率としてｄｚ　＝ｄ２−＝　１０とした
が、これに限られるものではなく別の値でもよい。〔発明の効果〕以上説明してきたように、この発明によれば、帳票上の
文字等を光電変換して画像データを発生する走査手段と
、該文字等の画像データについて第１次文字認識を行い
認識候補文字列を出力する文字認識手段と、予め各種の
単語が登録された単語辞書と、認識候補文字列について
単語辞書から選出した単語と照合し各種パラメータを含
む所定の演算式を用いて演算し、同一性の確率値の高い
単語を順位付けて選定する単語決定手段とからなる単語
読取装置において、文字等の画像データについて非文字
の要素率を検出して出力する疑似ノイズ性文字検出手段
を設け、単語決定手段がこの非文字の要素率を前記各種
パラメータに組み入れて演算を行い、同一性の確率値の
高い単語を順位付けて選定するようにしたので、帳票上
に非文字のノイズ等が存在した場合でも正確に入力され
た文字列と基準単語との照合を行い、高精度に単語を決
定することができる。【図面の簡単な説明】第１図はこの発明の実施例による単語読取装置の構成図
、第２図は非文字の要素率の値を示す図、第３図は単語
読取装置の帳票の一例を示す説明図、第４図及び第５図
は単語読取装置における入力文字及び認識候補文字の一
例を示す説明図、第６図は単語読取装置における単語辞
書内の単語列の一例を示す説明図、第７図は単語読取装
置における単語決定手法の一例を示す説明図、第８図・
第９図・第１Ｏ図及び第１１図は単語読取装置における
単語照合の一例を示す説明図、第１２図は従来の単語読
取装置の構成図である。１・・・・・・帳票、２・・・・・・走査手段、３・・
・・・・文字認識手段、４・・・・・・単語辞書、６・
・・・・・疑イ以ノイズ性文字検出手段、７．８・・・
・・・入力文字、１０．１１・・・・・・認識候補文字
列、１２．１３・・・・・・基準単語、５９単語決定手
段。代理人　　大暑　増雄（ほか２名）鴇４０第１０目ｎｔＺｌ ′Ｉ！３８Ｉ！ｌ第９図２、発明の名称３．補正をする者単手語続読取補装正置書（自発）＆補正の対象明細書全文、図面の欄。Ｇ補正の内容＋１１明細書全文を別紙のとおり補正する。（２）図面、第２図を別紙のとおり補正する。以上代表者志岐守哉明　　細　　書　（全文補正）１、発明の名称単語読取装置２、特許請求の範囲帳票上の文字等を光電変換して画像データを発生する走
査手段と、該文字等の画像データについて文学圧１を行
い認識候補文字列を出力する文字認識手段と、予め各種
の単語が登録された単語辞書と、前記認識候補文字列に
ついて単語辞書から選出した単語と照合し各種パラメー
タを含む所定の演算式を用いて演算し、二致渡の高い単
語を順位付けて選定する単語決定手段とからなる単語読
取装置において。前記文字等の画像データについて２ム乙ス」Ｌし立α割
冶を検出して出力する疑似ノイズ性文字検出手段を設け
、前記単語決定手段がこの２Ｌイノ○筐旦り公凱査を前
記各種パラメータに組み入れて前記演算を行い、前記二
致崖の高い単語を順位つけて選定するようにしたことを
特徴とする単語読取装置。３、発明の詳細な説明〔産業上の利用分野〕この発明は、住所・氏名などの単語を読み取って認識す
る単語読取装置、特に単語を構成する文字を１文字ごと
に認識し、その認識結果を用いて単語を修正する単語読
取装置に関するものである。〔従来の技術〕近年、計算機などへの大量かつ高速のデータエントリー
手段として単語読取装置が注目されており、日本語にお
いては漢字やカナや英数字などが複雑に混在しているこ
とから、高精度に単語の読み取りができる単語読取装置
の開発が望まれている。この単語読取装置としては、光
学式文字読取装置（ＯＣＲ）が知られており、第１２図
に、例えば（ｒＯｃＲのための知識処理方式」　（研究
実用化報告Ｖｏ１．３２．Ｎｏ、４　（１９８３））な
どに示された従来の単語読取装置の構成図を示す。図において、１は文字が記載されている帳票、２は帳票
１を光電変換して読み取る走査手段、３は入力文字を１
文字ごとに切り出して認識し、認識候補文字を出力する
文字認識手段、４は基準となる単語を格納した単語辞書
、５は文字認識手段３から出力された認識候補文字と単
語辞書４内の単語とを照、合して単語を決定する単語決
定手段である。従来の単語読取装置は以上のように構成され、前記認識
候補文字のうち順位が上位で、かつ単語辞書４に登録さ
れた単語を優先的に選択したものを単語読取の結果とし
て出力するものである。第３図は読み取り対象となる一般的な帳票１の一例を示
す説明図である０図において、７は入力単語文字列“オ
カヤマ”、８は入力単語文字列“オカヤ１．９は帳票上
に付着した（非文字である）ノイズである。この帳票上の文字に対して、文字認識手段３から出力さ
れる認識候補文字の一例が第４図、第５図に示されてい
る。第４図は、入力単語文字列“オカヤマ”７の各文字に対
して文字認識手段３で出力される認識候補文字ｌＯを示
しており、入力単語第１文字目の“オ”に対しては、“
オ”の１個の認識候補文字が、第２文字目の“力゛に対
しては、“力”の１個の認識候補文字が、第３文字目の
“ヤ”に対しては、“マ”、“ヤ”の２個の認識候補文
字が、第４文字目の１７３に対しては、“ヌ”の１個の
認識候補文字が存在する。第５図は、入力単語文字列“
オカヤ”８の各文字と帳票上に付着したノイズ９に対し
て文字認識手段３で出力される認識候補文字１１を示し
ており、入力単語“オカヤ”の第１文字目の“オ”に対
しては、“オ”の１個の認識候補文字が、第２文字目の
“力”に対しては、“力”の１個の認識候補文字が、第
３文字目の９ヤ”に対しては、“７３．“ヤ”の２個の
認識候補文字が、帳票上に付着したノイズ９に対しては
、“ノ“の１個の認識候補文字が存在する。第６図は単語読取装置におけるカタカナの姓について一
般的な単語辞書４内の単語の一例を示す説明図である６
図において、１２は“オカヤ”なる単語、１３は“オカ
ヤマ”なる単語である。第７図は単語読取装置の単語決定手段５に使用される一
般的なダイナミックプログラミングの考え方を用いる単
語決定手法の説明図である。単語を構成する文字列ＳＩ
＋！！・・・Ｓ、とｔｌ＋ｌ！・・・ｔゎとの照合を行
う場合に部分文字列ＳＩ＋３！・・・Ｓｉとｊｌ＋Ｌ！
・・・１．とが最もよくマツチした時のずれの量をｆ（
！、ハとしたとき図において１４はｆ（トｔ、　Ｊ−１
＞　、　１５は’　（ｉ−１＋ｊ＞　＋　１６はｒ（ｉ
ｎｊ−１）　＋　　１７はｆ（ｉ、Ｊｌ　　で、あり、
１８はｆ　（ｉ−１・ｊ−１）　とｆ由ハ　との間の距
離ｄ、・　１９はｆ。−１，５，と’＋！＋ｊ）　　と
の間の距離ａｔ、２ｏはｆ（ｉｎ　ｊ−１１とｆ（！、
ｊ１　　との間の距離ｄ、である。第８図は、入力単語文字列“オカヤマ”７と単語辞書４
内の単語“オカヤ”１２との照合を説明した説明図であ
る０図において、２１は入力単語文字列“オカヤマ”７
の第１文字目の文字と単語辞書４内の単語“オカヤ”１
２の“第１との距離２２は第２文字目の文字力と単語辞
書４内の単語の“力”との距離、２３は第３文字目の文
字ヤと単語辞書内の単語の“ヤ”との距離、２４は文字
挿入膜りに対する距離であり、２５は最短距離を与える
経路である。第９図は、入力単語文字列“オカヤマ”７と単語辞書４
内の単語１オカヤマ”１３との照合を説明した説明図で
ある０図において、２６は入力単語文字列“オカヤマ”
７゛の第１文字目の文字と単語辞書内の単語“オカヤマ
”１３の“オ”との距離、２７は第２文字目の文字力と
単語辞書４内の単語の“力”との距離、２８は第３文字
目の文字ヤと単語辞書４内の単語の“ヤ”との距離、２
９は第４文字目の文字マと単語辞書内の単語の“マ“と
の距離であり、３０は最短距離を与える経路である。第１０図は、入力単語文字例“オカヤ”８及びノイズ９
と単語辞書４内の単語“オカヤ”１２との照合を説明し
た説明図である。図において、３１は入力単語文字列“
オカヤ”８の第１文字目の文字オと単語辞書内の単語“
オカヤ′″１２の“オ゛との距離、３２は入力単語文字
列の第２文字目の文字力と単語辞書４内の単語の“力”
との距離、３３は入力単語の第３文字目の文字ヤと単語
辞書４内の“ヤ”との距離、２４は第８図と同様に１文
字挿入誤りに対する距離であり、３４は最短距離を与え
る経路である。第１１図は、入力単語文字列“オカヤ”８及び帳票に付
着したノイズ９と単語辞書４内の単語“オカヤマ”１３
との照合を説明した説明図である。図において３５は入
力単語文字列“オカヤ”８の第１文字目の文字オと単語
辞書４内の単語“オカヤマ”１３の“オ”との距離、３
６は第２文字目の文字力と単語辞書４内の単語の“力”
との距離、３７は第３文字目の文字力と単語辞書内の単
語の“ヤ”との距離、３８は帳票に付着したノイズ９と
単語辞書４内の単語の“マ”との距離であり、３９は最
短距離を与える経路である。次に従来装置の動作について説明する。第３図中の帳票に記入された入力単語文字列と第６図に
示す単語辞書４内の単語との照合を単語決定手段５にて
行う、単語の一致度について順位決定はダイナミックプ
ログラミングの考え方を用いる下記の手法で行う。〈単語決定手法〉入力単語文字列７，８Ｓ−８１，Ｓ、・・・Ｓ、と単語
辞書４内の単語Ｔ＝ｔｌ、ｔｔ・・・＋７との照合を行
うとき、文字ＳＬとｔ、との距離という概念を導入し、
これをｄｌとする０部分文字列ＳＩ＋ｓ２・・・３１　
とＬＩ＋Ｌｆｆｉ・・・１．とが最もよくマツチしたと
きのずれの量ｆ　’（！、　ｊ）を導入し、これを次の
漸化式によって計算する。ｆ（Ｏ・０）＝Ｏｆ　（１，Ｊ）　　＝ｍ　ｉ　ｎ　（ｆ　＋ｉ−＋、ｊ
＋　　＋　ｄｔ　＋ｆ　（蓋・　ｊ−１１＋ａ　　３　
　・　　　ｆ　（五−Ｉ・　ｊ−雷）＋ｄ＋）ｄｌ　：
文字３！に対する認識候補文字中に１゜があればその順
位、認識候補文字中にｔ、がなければＰ（定数）ｄ２　：定数ｄ、：定数ｍ　ｌ　ｎ　（Ｘ、Ｙ、Ｚ）　：　Ｘ、　Ｙ、　　Ｚの
中の最小値すなわち、第７図に示すようにｆ　（ｉ−＋
、ｊ）１５からｆ（！、ハ　１７への経路はＳｌ＋５１
・・・Ｓ、−１と１．．１．・・・Ｌ、までの照合がす
んでいて、そのずれの程度がｆ　（ｉ−１゜）　１５で
あり文字ＳｉとＴの中の文字との照合を行わない（１文
字の挿入誤りを許す）ことを意味する。この場合、文字
３ｉは余分に入った文字であると考え、そのために罰則
ｄｚ１９を与え、’　（ｉ＋ｊ＋　　＝’　（ｉ−１＋
ｊ）　　＋ｄ２とする。ｆ（ｉ、ｊ−１，１６からｆ。、ハ　１７への経路も同
様で１．は余分な文字として入力単語文字列Ｓの中の文
字とは照合せず（１文字の欠如誤りを許す）そのかわり
に罰則ｄ３２０を与えｆ（ｉ、ｊ）　　＝　ｆ（！＋ｊ
−１）　＋ｄ　３　とする。ｆ　（ｉ−１＋ｊ−１）　　１４からｆ（ム＋ｊ）の経
路は文字Ｓムと文字１ｊの照合を行う経路で、この照合
の度合をｄ、とする。ｄｌは以下の値をとる。この漸化式により得られるｆ　（ｓｏ　ａｌ　を入力単
語文字列ｓｌ＋　　Ｓｌ・・・Ｓｌと単語辞書内の単語
ｔＩｎ１ｔ・・・ｔｌｌとのずれの量として、ずれ量を
単語辞書内の全単語に対して求め、ずれの量の小さい順
に並べ換え、候補単語を出力する。前記単語決定手法を用いて、予め定めたｄｔ＝３０、ｄ
３　＝３０．Ｐ＝２０とした場合について説明する。第
４図中の入力単語文字列“オカヤマ。７と第６図に示す単語辞書４内の単語との照合を単語決
定手段５にて行う。まず、単語辞書４内の単語“オカヤ
”１２と照合をとる。第８図に示すように入力単語文字
列Ｓを“オカヤマ”７とし、単語辞書内の単語Ｔを“オ
カヤ”１２とする。ｆ（。、Ｘ）　　（Ｘ≧１）の値を求める時漸化式は、
ｆ（。、Ｉｔ）＝ｆ（。ｌｌｌ−１）　　＋ｄ、となり
、ｆ、。、＋１　　＝　　ｄｓ　＝３０ｆｔｏ、ｔ＞　　＝２ｄ３　＝６０ｆ（。、３）　　＝３ｄｉ　＝９０となる。同様にｆ。、。）　　（Ｘ≧１）の値を求める時漸化式
は、ｆ　ＩＸ、＋１）　　”　ｆ　（Ｘ−１１０１”　
ａ、となり、ｆ　　（１，。＊＝ｄ＊＝３０ｆ（！、。＞　　”＝２ｄ＊　　＝６０ｆ　Ｎ・・＞　
　＝３ｄ冨　−９０ｆ（４，。）　霧４ａ、＝１２０となる。次に、ＳＩ　（入力単語７“オカヤマ°の“オ”）と１
＋　　（単語辞書４内の単語１２“オカヤ”の１オ”）
の照合をとると３１の認識候補文字１０の第１位にｔｌ
があるので、第８図のｄ、＋１（２１）となりｆ　　１．、、）　　　ｘｍ　　ｉ　　ｎ　　　（ｆ　
　（ｏｕｔ＞　　　＋ｄ、　　　ｌ　　　ｆ　　ｔｌｌ
ｌｌ＞＋ｄｓ、ｆ（ｏ、ｅ＞　　＋１） −ｍｉｎ　（３０＋３０．３０＋３０．１）＋１となる。以下、同様にＳ、と＋８の照合をとると３１の認識候補文字１０中に
１が存在しないのでｄｌ　＋２０となり、’　　（１，
ｆｆｉ＞　　　−ｒｎ　　ｌ　　ｎ　　　（ｆ　　（６
＋り　　　”　　ｄ　ｔ　　＋　　　ｆ　　（Ｉｎｌ１
＋ｄ３．ｆ（。、１）＋２０） −ｍｉｎ　　（６０＋３０．　　ｔ＋ａｏ。３０＋２０）＋３１となる。ｓｔとｔｌの照合をとるとｓｔの認識候補文字０中にｔ
ｌが存在しないのでｄｌ＝２０となり、’　（ｔｏｎ　
−ｍ　ｌ　ｎ　（ｆ　（ｉｌｌ＞　　＋ａ、　＋　　ｆ
　（ｚｏｏ）＋ｄ３・　ｒ＜＋・ｏ）　＋２０）＝ｍｉｎ　（１＋３０．６０＋３０゜３０＋２０）＋３１となる。ｓｔと＋８の照合をとると３２の認識候補文字１０の第
１位にｔ！があるので第８図のｄ＋’＝１（２２）とな
り ’　＜ｔｉｌ＞　　””’　Ｉ　ｎ　（ｆ　（Ｉｎ冨）
　　＋ｄｔ　＋　　’　（１−１１＋　ｄ３＋ｆ（凰１
１）　　　＋　１）＝ｍｉｎ　　（３１＋３０．３１＋
３０゜１＋１）＝２となる。Ｓｌとｔｊの照合をとると３１の認識候補文字０中にｔ
、が存在しないのでｄｌ＝２０となり、’　＜Ｉｎ！ｌ
　　”ｍｌ　ｎ　（’　（Ｏｒ２＞　　十ｄ！　ｒ　　
’　（＋−２＞＋ｄ、、ｆ（。、ｚ＞＋２０）＝ｍｉｎ　（９０＋３０．３１＋３０゜６０＋２０）＋６１となる。Ｓ、とｔｌの照合をとるとＳ、の認識候補文字Ｏ中にｔ
ｌが存在しないのでｄ＋＝２０となり、’　（：ｌ＋１
）　　＝＝ｍ’　ｎ（’　（！ｎｉｌ　　＋ｄｔ　＋　
　’　（３＋＠）＋４３ｒ　　ｆ＋ｇ、ｏ＋　　＋２０
）＝ｍｉｎ　（３１＋３０．９０＋３０゜６０＋２０）＋６１となる。３ｔとｔ、の照合をとると３２の認識候補文字Ｏ中に（
、が存在しないのでｄｌ＝２０となり、ｆ　　＜ｔ’ｓ
）　　＝ｍ　　ｉ　　ｎ　　　（ｆ　　＜＋、、＞　　
　＋ｄｔ　　　＋　　　　ｆ　　（ｔ、ｔ）＋ｃｔ、、
　　ｆＨ＋り　　＋２０）＝ｍｉｎ　（６１＋３０．２＋３０゜３１＋２０）＝３２となる。Ｓ、と１ｔの照合をとるとＳ、の認識候補文字１０中に
＋２が存在しないのでａ、＝２Ｏとなり、’　（３＋２
＋　　”ｆｒｙ　ｌ　ｎ　（ｆ　（！、り　　”　ｄｌ
　＋　　’　＜：ｉｌＨ＋ｄ、　、　　ｆ　、ｔ、ｎ　
　＋２０）＝ｍｉｎ　（２＋３０．６１＋３０゜３１＋２０）＝３２となる。Ｓ、とｔ、の照合をとるとＳ、の認識候補文字ｌＯの第
２位に＋３があるのでｄ、＝２　（２３）となり、ｒ　（３，３１’＝ｍ　ｉ　ｎ　（ｆ　＜ｔ、ｘ、＋ｄ
ｚ　、　　ｒ　ｔｓ、ｚ。＋ｄ３・　ｆ　＋ｔ、ｔ＞　　＋　２）＝ｍｉｎ　（３
２＋３０．３２＋３０゜２＋２）＝４となる。ｓ４とｔｌの照合をとると３４の認識候補文字ｌＯ中に
ｔｌが存在しないのでａ、＝２０となり、’　（４＋１
）　　”ｍ’　ｎ（’　（３＋ｌ）　　＋ｄ＊　＋　　
ｆ　＜４＋＠＞＋　ｄ　ｓ　＊　　ｆ　ｌｌ＋。）＋２
０）＝ｍｉｎ　（６１＋３０．１２０＋３０゜９０＋２
０）＝９１となる。ｓ４とｔ！の照合をとると３４の認識候補文字ｌＯ中に
ｔ２が存在しないのでｄ、−２０となり、ｆ　（４１り
　　＝ｍ　ｌ　ｒｌ　（ｆ　（３・ｆｆｉ＋　　”ｄｚ
　＋　　ｆ　（４１１）＋ｄ、　、　　ｆ　＋ｓ、ｔ＞
　　＋２０）＝ｍｉｎ　（３２＋３０．９１＋３０゜６
１＋２０）＝６２となる。ｓ４とｔ、の照合をとると３４の認識候補文字ｌＯ中に
ｔ、が存在しないのでｄ、＝２０となり、’　　（４＋
３１　　４　ｍ　’　　”　　　（’　　（３−３）　
　　＋　ｄ　ｔ　　　＊　　　　’　　（４，り＋ｄ、
・　ｒ　ｔｓ・雪）＋２０）＝ｍｌｎ　（４＋３０．６２＋３０゜３２＋２０）＝３４となる。すなわち、入力単語文字列“オカヤマ”７と単語辞書内
の単語“オカヤ”１２とのずれの量ｒ　＜ａ・３）は・ｆ　＜ａ、ｓ＋　　−１（第８図中の２１）＋１（第８
図中の２２）＋２（第８図中の２３）＋３０（第８図中の２４）となり、最小のずれの量ｆ（４１，）　を与える経路を
バックトラッキングにより求めると、（４，３）。（３，３）、（２，２）、（１，１）、（０，０）の各
点を通る経路となり、２５の経路をたどる。次に単語辞書４内の単語“オカヤマ”１３と照合をとる
。第９図に示すように、入力単語文字列Ｓを１オカヤマ
”７とし、単語辞書内の単語Ｔを“オカヤマ”１３とす
る。３．（入力単語文字列“オカヤマ°７の“オ”）と
１＋　　（単語辞書内の単語“オカヤマ“１３の“オ°
）の照合をとると文字３１の認識候補文字１０の第１位
にｔｌがあるので、第９図のｄ＋　＝１　　（２６）と
なり’　（１＋１＞　　”ｍｌ　ｎ　（’　（ｌｌ＋１
）　　＋ｄ１　＊　　ｆ　（ｌｏｌｌ＞＋ｄ、・　ｆ　
（ｏ・・）　＋１） −ｍｉｎ　（３０＋３０．３０＋３０．１）＝１となる。同様に、ｓｔとｔ２の照合をとると３！の認識候補文字
１０の第１位に１ｔがあるので、第９図のｄ、＝１　　
（２７）となりｆ　ｔｔ＋ｚ＋　　＝ｍ　ｉ　ｎ　（ｆ　＜ｒ＋ｚ＞　
　＋ｄ寞＋　　ｆ　（！＋ｌ）＋ｄ、・　ｆ（１・Ｉ）
＋１）＝ｍｉｒｌ　（３１＋３０．３１＋３０゜１＋１）＝ｍ＝２となる。同様に、Ｓ、とｔ、の照合をとるとＳ、の認識候補文字
ｌＯの第２位にｔ、があるので、第９図のｄ＋　−２（
２８）となりｆ　＜ｓ、ｓ＞　　”ｍ　ｉ　ｎ　（ｆ　（＊、ｘ＞　
　＋ｄｚ　、　　ｆ　＜ｚ、ｔ）＋ｄコ・　ｆ　（ｔ・
ｚ＞＋２） −ｍｉｎ　（３２＋３０．３２＋３０゜２＋２）＝４となる。同様に、Ｓ４とｔ４の照合をとると３４の認識候補文字
１０中にｔ４が存在しないので、第９図のｄ、＝２０　
（２９）となりｆ　＜ａ＋ａ＋　　＝ｍｌｎ　（ｆ　（１４）　　＋ａ
、　ｌ　　ｆ　（４１３＋＋ａ、　、　　ｆ　１３．ｓ
＞　　＋２０）−ｍｉｎ　（３３＋３０．３４＋３０゜
４＋２０）＝２４となる。すなわち、入力単語文字列“オカヤマ”７と単語辞書４
内の単語“オカヤマ”１３とのずれの量ｆ　（４＋４１
はｆ　＜ａ、ａ＋　　−１（第９図中の２６）＋１　（第
９図中の２７）＋２（第９図中の２８）＋２０（第９図中の２９）冨２４となり、最短経路は３０となる。上記結果より、入力単語文字列“オカヤマ”７と単語辞
書内の単語“オカヤ”１２とのずれの量は３４．入力単
語文字列“オカヤマ°７　と単語辞書内の単語“オカヤ
マ”１３とのずれの量は２４となり、候補単語はずれの
量の小さい、“オカヤマ”１３．　　“オカヤ”１２の
順に出力され、正解単語が上位で得られる。また、第５図中の帳票上に記入された入力単語文字列“
オカヤ”８及び帳票上に付着したノイズ９と第６図に示
す単語辞書４内の単語との照合を単語決定手段にて行う
。前記単語決定手法により、単語辞書内の単語“オカヤ”
１２と照合を行うと第１０図に示すように、入力単語文
字列“オカヤ”８及び帳票上に付着したノイズ９と単語
辞書内の単語“オカヤ”１２とのずれの量ｆ（４，コ）
　は、ｆ、　＜ａ、ｓ＞　＝　１　（第１０図の３１）＋１（
第１０図の３２）＋２（第１０図の３３）＋３０（第１０図の３４）＝３４となり、最短経路は３４となる。次に、単語辞書４内の単語“オカヤマ”１３と照合を行
うと、第１１図に示すように入力単語文字列“オカヤ”
８及びノイズ９と、単語辞書４内の単語“オカヤマ”１
３とのずれの量ｆ（４，ｎ）は、ｆ　１４．４＞　＝　
１　（第１１図の３５）＋１　（第１１図の３６）＋２（第１１図の３７）＋２０（第１１図の３８）＝２４となり、最短経路は３９となる。上記結果より、入力単語文字列“オカヤ”８及びノイズ
９と、単語辞書内の単語“オカヤ゛１２とのずれの量は
３４．入力単語文字列“オカヤ”８及びノイズ９と、単
語辞書内の単語“オカヤマ”１３とのずれの量は２４と
なり、候補単語はずれの量の小さい、“オカヤマ″１３
．′オカヤ１１２の順に出力され、正解単語“オカヤ゛
１２が上位で得ら°れない。〔発明が解決しようとする課題〕従来の単語読取装置は上記のように構成され、単語決定
において単語を照合する際に文字の挿入誤りに対する罰
則が一定であるために、帳票の記入文字枠上に非文字の
ノイズ等が存在した場合に、正解単語が得られないとい
う問題点があった。この発明はかかる問題点を解決するためになされたもの
で、帳票の記入文字枠上にノイズ等が存在しても正確に
単語を照合することができる単語読取装置を得ることを
目的とする。〔課題を解決するための手段〕この発明においては、帳票ｌ上に文字等を光電変換して
画像データを発生する走査手段２と、該文字等の画像デ
ータについて文字認識を行い認識候補文字列１０．１１
を出力する文字認識手段３と、予め各種の単語が登録さ
れた単語辞書４と、認識候補文字列１０．１１について
単語辞書４から選出した単語１２．１３と照合し各種パ
、ラメータを含む所定の演算式を用いて演算し、−政変
の高い単語を順位付けて選定する単語決定手段５９とか
らなる単語読取装置において、文字等の画像データにつ
いてノイズらしさの割合を検出して出力する疑似ノイズ
性文字検出手段６を設け、単語決定手段５９がこのノイ
ズらしさの割合を各種パラメータに組み入れて演算を行
い、−政変の高い単語を順位付けて選択する。〔作用〕文字等の画像データ中に非文字のノイズ等が存在すれば
、疑似ノイズ性文字検出手段６によりノイズらしさの割
合を検出する。このノイズらしさの割合をパラメータと
して従来の各種パラメータに追加して組み込んで所定の
演算式を用いて演算し、認識候補文字列１０．１１がど
の単語１２゜１３と最も一致度が高いかを単語決定手段
５９にて選定し、高順位の単語１２．１３を帳票ｌから
入力された文字列であると決定する。〔実施例〕以下、この発明の一実施例を図面に基づいて説明する。第１図において、１は文字が記載されている帳票、２は
この帳票ｌを光電変換により読み取る走査手段、３は入
力文字を１文字毎に切り出して認識し、第１次の認識候
補文字を出力する文字！！識手段、４は基準単語を格納
した単語辞書、６は帳票に記入された入力文字毎の画像
を観測し、非文字の画像例えばノイズと予測される文字
を検出する疑似ノイズ性文字検出手段、５９は疑似ノイ
ズ性文字検出手段６から出力された疑似ノイズ性文字の
情報を用いて、文字認識手段３から出力された認識候補
文字と単語辞書４内の基準単語とを照合して読取った単
語を決定する単語決定手段である。疑似ノイズ性文字検出手段６は走査手段２から入力され
た個々の画像データについて、その高さと幅のデータを
個々に計測して出力するものである。本発明の単語読取装置でも、第３図に示す一般的な帳票
１を入力するものとする。又文字認識手段３は第３図の
帳票１の入力により、第４図に示す認識候補文字１０及
び第５図の入力候補文字１１を出力する。更に単語辞書
４内には第６図に示すような単語が予め登録されている
。第２図は、単語読取装置の疑似ノイズ性文字検出手段６
の一例を説明するための説明図で、−船釣な帳票ｌを示
す第３図中の７．８．９の各文字やノイズに対する画像
の幅、高さを示すものである０図において、４０は入力
単語文字列“オカヤマ”の第１文字目“オ”の画像の幅
、高さ、４１は第２文字目“力”の画像の幅、高さ、４
２は第３文字目“ヤ”の画像の幅、高さ、４３は第４文
字目“マ”の画像の幅、高さ、４４は入力単語文字列“
オカヤ°の第１文字目“オ”の画像の幅。高さ、４５は第２文字目“力”の画像の幅、高さ、４６
は第３文字目“ヤ１の画像の幅、高さ、４７は帳票に付
着したノイズの画像の幅、高さである。第１０図は、疑似ノイズ性文字に対する文字挿入誤りの
罰則を小さくした時、第３図の入力単語文字列“オカヤ
”８及びノイズ９と第６図の単語辞書４内の単語“オカ
ヤ”１２との照合を説明した説明図である。図において
、４８は文字挿入誤りの罰則が小さくなる経路、３１は
入力単語文字列“オカヤ゛８の第１文字目“オ”と単語
辞書内の単語“オカヤ”１２の“オ”との距離、３２は
入力単語文字列の第２文字目“力”と単語辞書４内の“
力”との距離、３３は入力単語文字列の第３文字目“ヤ
”と単語辞書４内の“ヤ”との距離、２４は１文字挿入
誤りに対する距離であり、３４は最短距離を与える経路
である。第１１図は、疑似ノイズ性文字に対する文字挿入誤りの
罰則を小さくした時、第３図の入力単語文字列“オカヤ
”８及びノイズ９．と第６図の単語辞書４内の単語“オ
カヤマ”１３との照合を説明した説明図である。図にお
いて、４８は第１０図同様、文字挿入誤りの罰則が小さ
くなる経路、３５は入力単語文字列“オカヤ”８の第１
文字目“オ”と単語辞書４内の単語“オカヤマ”１３の
“オ”との距離、３６は入力単語文字列の第２文字目“
力”と単語辞書４内の単語の“力′との距離、３７は入
力単語文字列の第３文字目“ヤ”と単語辞書４内の単語
の“ヤ”との距離、３８は帳票上に付着したノイズ９と
単語辞書内の単語の“マ”との距離であり、３９は最短
距離を与える経路である。単語決定手段５９は疑似ノイズ性文字検出手段６からの
第２図に示す出力信号を参照して第７図に示される一般
的なダイナミックプログラミングの手法を使用する。次に動作について説明する。疑似ノイズ性文字検出手段
６における検出基準を例えば、下記のようにする。「文字の画像の幅、高さが双方共に４以下のとき、その
文字を疑似ノイズ性文字とする。」この基準を第２図に
示した入力文字の幅、高さに適用すると、図中４０〜４
６に示される幅、高さはいずれも４より大きく、入力単
語文字列“オカヤマ”及び“オカヤ”の各文字は疑似ノ
イズ性文字とはならないが、４７に示される幅、高さは
いずれも４を下回っており、疑似ノイズ性文字と判定さ
れる。次に、単語決定手段５９は単語辞書４との照合を行う時
に疑似ノイズ性文字に対する文字挿入誤りの罰則を小さ
くする。つまり、単語決定手法の漸化式を下記のように
する。〈単語決定手法〉帳票からの入力単語文字列？、８Ｓ＝ｓｔ　。ｓ２・・・ｓ、１と単語辞書４内の単語Ｔ＝ｔ、、ｔ！
。・・・ｔ７との照合を行うとき、文字ｓ１と１ｊとの距
離の概念を導入し、これをｄｌとする。部分文字列Ｓ１
，３□・・・３１とｔＩｎ　　ｔＪ・・・１ｊとが最も
マツチしたときのずれ量ｆ（ム＋ｊ）を導入し、これを
次の漸化式によって計算する。ｆ（。、。）干０ｆ　（直、Ｊ）　　＝ｍ　ｉ　ｎ　　（ｆ　（！−ｈＪ
）　　十ｄ寡−ｆ　（轟＋ｊ−１１＋ｄ　　３１　　　
ｆ　　１ｌ−１１Ｊ−１１＋ｄ　　葛　　）ｄｌ　：文
字Ｓ！に対する認識候補文字中にｔｊがあればその順位
、認識候補文字中に１ＪがなければＰ（定数）ｄ８　：定数ｄ、：定数ｍ　ｉ　ｎ（Ｘ、Ｙ、Ｚ）　　：　Ｘ、　Ｙ、　　Ｚの
中の最小値すなわち、第７図に示すようにｆ　（ｉ−１
，１１５からｆ　（１＋１１７への経路は３１，３□・
・・Ｓト。とｊｌ＋　　を冨・・・１．までの照合がすんでいて、
そのずれの程度がｆ　（１−１１Ｊ）　　１５であり文
字３ｉとＴの中の文字との照合を行わない（１文字の挿
入誤りを許す）ことを意味する。この場合、文字Ｓ１は
余分に入った文字であると考え、そのために罰則ｄ雰１
９を与え・ｆ山ハ　＝ｒ（ト鳳・ハ　＋ｄ２とする。ｆ　（ｉ＋Ｊ−１）　　１６からｆ山ハ　１７への経路
も同様でｔＪは余分な文字として入力単語文字列Ｓの中
の文字とは照合せず（１文字の欠如誤りを許す）、その
かわりに罰則ｄ、２０を与え’山ｊ）”’　ｒ　（ｉ、
Ｊ−１１＋ｄ　２　とする。そこで、ｄオを以下のように単語決定手段５９は、と指定し、第４図中の入力単語文字列“オカヤマ”７と第６図の単
語辞書４内の単語との照合を行う、この場合は、入力単
語文字列“オカヤマ”７のいずれの文字も疑似ノイズ性
・文字でないので、漸化式において、予め定めたｄｔ＝
ｄｚ、＝３０．ｄｓ　＝３０、Ｐ＝２０とし、従来例と
全く同一の動作をする。入力単語文字列“オカヤマ”７
と単語辞書内の単語“オカヤ”１２とのずれの量ｆ＜ａ
、ｘ＞　は、ｆ　＜ａ、ｓ＋　　＝　１　　（第８図中
の２１）＋１（第８図中の２２）＋２（第８図中の２３）＋３０（第８図中の２４）！＝３４となり、最小のずれの量’（４＋３）を与える経路をバ
ックトラッキングにより求めると、（４，３）。（３，３）、（２，２）、（１，１）、（０，０）の各
点を通る経路となり、２５の経路をたどる。又、入力単語文字列“オカヤマ”７と単語辞書４内の単
語“オカヤマ′１３とのずれの量ｆ（４，。はｆ　（４，４）　　＝　１　　（第９図中の２６）＋１
　（第９図中の２７）＋２　（第９図中の２８）＋２０（第９図中の２９）＝２４となり、最短経路は３０となる。従って単語決定手段５９から候補単語はずれの小さい単
語辞書４内の１オカヤマ”１３とその次に“オカヤ”１
２の順に出力され、正解単語が上位で得られる。第５図中の入力単語文字列“オカヤ”８及びノイズ９と
、第６図の単語辞書４内の単語との照合を単語決定手段
にて行う場合、帳票上に付着したノイズ９が疑領ノイズ
性文字と判定されていて、漸化式において、ノイズらし
さの割合を入れて、ｄｓ　＝ｄｚ−＝１０．ｄｚｂ＝３
０．ｄ３　＝３０゜Ｐ＝２０とする。以下、照合動作の
説明を行う。入力単語文字列Ｓは第５図中の“オカヤ”８及びノイズ
９となり、第６図の単語辞書４内の単語Ｔは“オカヤ”
１２となる。ｓｔ　　（入力単語文字列“オカヤ°８の“オ”）と１
＋　　（単語辞書内の単語“オカヤ”１２の“オ”）の
照合をとると３１の認識候補文字１１の第１位に１．が
あるので、ｄ、＝１　　（３１）となり ’　（１１）　　−ｍ’　ｎ（’　（１１＋ｌｌ　　＋
ｄｔｂ＊　　’　（１＋ｅｌ＋ｄ、、ｆ、。、。、＋１
）＝ｍｉｎ　　（３０＋３０．３０＋３０．１）＝１となる。同様に、３．とｔ、の照合をとると３２の認識候補文字
１１の第１位にｔ２があるので、ｄ、＝１（３２）とな
りｆ　（！＋り　　”ｍ　ｊ　ｎ　（ｆ　ｌｌ＋り　　＋
ｄｔｂｒ　　’　（！＋＋）＋ｄ３・　ｆ＜ｒ、ｎ　　
＋１）＝ｍｉｎ　（３１＋３０．３１＋３０゜１＋１）＝２となる。同様に、Ｓ、とｔ、の照合をとるとＳ、の認識候補文字
１１の第２位にｔ、があるので、ｄ、＝２（３３）とな
りｆ　　＋ｓ＊ｓ）　　　＝ｍ　　ｉ　　　ｎ　　　（ｆ
　　（！＋３）　　　　＋ｄｔｂ＋　　　　ｆ　　（１
＋！１＋ｄ３＋　　ｆ　、ｚ＋ｔ＞　　＋２）＝ｍｉｎ
　（３２＋３０．３２＋３０゜２＋２）＝４となる。同様に、３．とｔ４の照合をとるとＳ、の認識候補文字
ｌｌ中にｔ４が存在しないので、ｄ、＝２０、またｓ４
は疑似ノイズ性文字なのでｆ、４．。（ｘ＝０．１．２．３）を求める時の文字挿入誤りの経
路４８の罰則はａ！＝ａｚ、＝　１０となるので ’　＋４＋２＞　　”””　ｎ（’　（１＋３）　　”
　ｄｔａ＋　　’　４＋ｔ＞＋ｄ３　、　　ｆ　＜ｓ、
＊＞　　＋（Ｌ　　）＝ｍｉｎ　　（４＋１０．　４２
＋３０゜３２＋２０）となる。すなわち、入力単語文字列“オカヤ”８及びノイズ９と
、単語辞書４内の単語“オカヤ”１２とのずれの量ｆ１
４＋２１　はｒ　ｔ４．＊＞　　＝　１　　（第１Ｏ図中の３１）＋
１　（第１０図中の３２）＋２（第１０図中の３３）＋１０（第１０図中の２４）となり、最短経路は３４となる。一方、第６図の単語辞書４内の単語“オカヤマ”１３と
第５図の入力文字とノイズを照合すると、第１１図に示
すように入力単語文字列Ｓは“オカヤ′″８及びノイズ
９となり、単・語辞書内の単語Ｔは“オカヤマ”１３と
なる。３、（入力単語文字列“オカヤ”８の“オ”）と１１　
　（単語辞書内の単語“オカヤマ”１３の“オ”）の照
合をとるとｓ、の認識候補文字１１の第１位にｔ、があ
るので、ｄ、＝１　　（３５）となり ’　（１，１１””ｍ　１　ｎ　　（ｆ　（０＋ｌｌ　
　＋ｄｔｂ＋　　’　Ｔｌ＋６１”ｄｓ　、　　ｆ　（
（１，０＞　　＋　１）＝ｍｉｎ　　（３０＋３０．３
０＋３０．１）ｌとなる。同様に、Ｓ２とｔ２の照合をとると３２の認識候補文字
１１の第１位にｔ２があるので、ｄ、＝１（３６）とな
りｆ　ｕ、ｔ＋　　＝ｍ　ｌ　ｎ　（ｆ　ｎ、ｔ、＋ｄｔ
ｂ＊　　ｆ　＜ｔ、＋。＋ｄ３・　ｆｉｌ・＋１＋１）＝ｍｉｎ　（３１＋３０．３１＋３０゜１＋１）＝２となる。同様に、Ｓ、とも、の照合をとるとＳ、の認識候補文字
１１の第２位にｔ、があるので、ｄ、＝２（３７）とな
り ’　（３＋３１　　”ｒｎ　ｌ　ｎ　（’　（！＋１）
　　”　ｄＲｂｒ　　’　＋３．２１＋ｄ、・　ｆ（！
・り＋２）＝ｍｉｎ　（３２＋３０．３２＋３０゜２＋２）となる。同様に、ｓ４とｔ４の照合をとると３４の認識候補文字
ｌｌ中にｔ４が存在しないのでｄ、＝２０、またＳ４は
疑似ノイズ性文字なのでｆ（４，。（ｘ＝０．１．２．３）を求める時の文字挿入誤りの経
路４８の罰則はｄｔ＝ａｚ、＝１０となり、ｒ　　（４
，４１＝ｍ　　ｉ　　ｎ　　（ｆ　　＜ｓ、ａ＋　　　
＋　ｄ２１１＋　　　ｒ　　＜ａ、ｓ＋＋ｄ３・　ｆ　
（３・３）＋２０）＝ｍｉｎ　（３３＋１０．３４＋３０゜４＋２０）＝２４となる。すなわち、入力単語文字列“オカヤ”８及びノイズ９と
、単語辞書内の単語“オカヤマ”１３とのずれの量’（
４＋４）はｆ　＜ａ、ａ＞　　”　１’　（第１１図中の３５）＋
１　（第１１図中の３６）＋２（第１１図中の３７）＋２０（第１１図中の３８）＝２４となり、最短経路は３９となる。上記結果より、入力単語文字列“オカヤ”８及びノイズ
９と、単語辞書内の単語“オカヤ”１２とのずれの量は
１４、一方入力単語文字列“オカヤ”８及びノイズ９と
、単語辞書内の単語“オカヤマ”１３とのずれの量は２
４となり、候補単語はずれの量の小さい６オカヤ”の次
に１オカヤマ。の順に出力され、正解単語“オカヤ”が上位で得られる
。このように、疑似ノイズ性文字検出手段６を用いること
により入力文字毎の画像を観測してノイズと予測される
文字を検出し、疑似ノイズ性文字と判定された文字の文
字挿入誤りに対する罰則を小さくして単語照合を行うの
で帳票の記入文字枠上のノイズが存在しても正確に単語
を照合できる。なお、上記実施例では疑似ノイズ性文字検出手段６の検
出基準として画像の幅、高さを用いたが、画像の黒点数
や画像の形状など他の検出法を用いることもできる。上記実施例では単語決定手段５９の単語照合法としてグ
イナミソクプログラミングの考え方を用いるＤＰマツチ
ングの手法を用いたが、他の照合手法を用いてもよい。上記実施例では入力単語文字列と単語辞書内の単語を構
成する文字とを照合するとき、その照合の度合として認
識候補文字の順位を用いたが、他の情報（例えば、文字
を認識した時に得られる入力文字がその文字である確率
など）を用いてもよい。また、上記実施例ではカタカナの姓について述べたが漢
字・英字などの単語、住所・会社名などの単語に関して
もこの発明を適用することができる。又ノイズらしさの割合としてｄ、　＝ｄ！、＝　１０と
したが、これに限られるものではなく別の値でもよい。〔発明の効果〕以上説明してきたように、この発明によれば、帳票上の
文字等を光電変換して画像データを発生する走査手段と
、該文字等の画像データについて文字認識を行い認識候
補文字列を出力する文字認識手段と、予め各種の単語が
登録された単語辞書と、認識候補文字列について単語辞
書から選出した単語と照合し各種パラメータを含む所定
の演算式を用いて演算し、−政変の高い単語を順位付け
て選定する単語決定手段とからなる単語読取装置におい
て、文字等の画像データについてノイズらしさの割合を
検出して出力する疑似ノイズ性文字検出手段を設け、単
語決定手段がこのノイズらしさの割合を前記各種パラメ
ータに組み入れて演算を行い、−政変の高い単語を順位
付けて選定するようにしたので、帳票上に非文字のノイ
ズ等が存在した場合でも正確に入力された文字列と基準
単語との照合を行い、高精度に単語を決定することがで
きる。Since [, does not exist, d, = 20, and f (t, 3
)=min (f <+, s> +dz r
f <z, t++dz, f(t, i> +2
0)=min (61+30. 2+30°31+2
0) =32. When comparing S, with +2, since 1t does not exist in the 10 recognition candidate characters of S, a, = 20, and f + z, z +
=min (f(z, t+ +d,, r(3
,. +d2, f (!, 1) +20>=min
(2+30.61+30°31+20) =32. When comparing Syu and t, t is in the second position among the 10 recognition candidate characters of S, so d, = 2 (23), f(3,31-min(' 12・ri +dt +
'+3・21"dz r f tt, z> +
2)=min (32+30.32+30°2+2)=4. When comparing s4 and tl, t is found in recognition candidate character 0 of S4.
, does not exist, so it becomes dl-20, and f (411)
= min (f (:1ll) 1 dl 1
f (41111+d3 + ' +3+., +
20)=min (61+30.120+30°90+
20) =91. When s4 and t are compared, 1 out of 0 recognition candidate characters for S is found.
2 does not exist, so d,=2o,'(412>
””m l n (f 11+21 10 dl r
f(4+1)+ds* f(3, n +20) = min (32+30.91+30°61+20) =62. When we compare s4 and t, we find that t does not exist among the 10 recognition candidate characters for S! dl = 20, '+4
+31 = m1 n (f (3-3) +dt *
f(4+!>+dit f<s, t> +20
)=min (4+30.62+30°32+20) =34. That is, the amount of deviation f(4.3) between the input word character string "Okayama" 7 and the word "Okayama" 12 in the word dictionary is: r (4,31=1 (21 in Figure 8) + 1 ( 22 in Fig. 8) +2 (23 in Fig. 8) +30 (24 in Fig. 8) = 34, and when finding the path that gives the minimum deviation amount f(413>) by backtracking, we get (4 , 3). The route passes through the points (3, 3), (2, 2), (1, 1), (0, 0), and follows 25 routes. Next, the word " As shown in FIG. 9, the input word string S is “Okayama” 7, and the word T in the word dictionary is “Okayama” 13. S, (Input word string “ “O” in Okayama 07) and 1+ (“O°” in word “Okayama” 13 in the word dictionary)
), tl is in the first position of the recognition candidate character lO of the character S, so dl = 1 (26) in Figure 9 becomes f lI+ll ”m 1 n (' l@+1
> +dt * ' (++01+d3・r (
o・o)+1) =min (30+30.30+30.1)=1. Similarly, when comparing S2 and 1t, there is +2 in the first position of the 32 recognition candidate characters lO, so a, = 1 in Fig. 9
(27) Next' +2.11 "ml n (' (Inri +
dl l'<1+1+”dl 1 f (11
11+1)=mirl (31+30.31+30°1+1)=2. Similarly, when comparing S and t, t is in the second position among the recognition candidate characters lO of S, so dl = 2 in Figure 9.
(28) becomes f(2+3) ””m l n (' (t*21
"dl * '(3,ri+d3・f(!・z
＞”2) =min (32+30.32+30°2+2) +4.Similarly, S4 and +4 +4 d in 10 candidate characters, =20 (29) (,4-,=min = m1n matching) If we take , there is no recognition of 34, so we get 'Figure 9 (f+1.4)+d2・[(4・ko)+a, ,
f (3,31+20)(33+30.34+30°4+20) =24. In other words, the input word string "Okayama" 7 and the word dictionary 4
The amount of deviation from the word "Okayama" 13 in '(4, 4) is f, a, -+ = 1 (26 in Figure 9) + 1
(27 in FIG. 9) +2 (28 in FIG. 9) +20 (29 in FIG. 9) =24, and the shortest route is 30. As a result of the above, the amount of deviation between the input word string "Okayama" 7 and the word "Okara" 12 in the word dictionary is 34. The amount of deviation between the input word string "Okayama" 7 and the word "Okayama" 13 in the word dictionary is 24, and the candidate words are output in the order of "Okayama" 13 and "Okara" 12, which have the smallest amount of deviation, and are correct. In addition, the input word string written on the form in Figure 5 “
The noise 9 adhering to the noise 8 and the form is compared with the words in the word dictionary 4 shown in FIG. 6 by the word determining means. By using the word determination method, the word “Okara” in the word dictionary is
As shown in FIG.
l is f (4, 31= 1 (31 in Figure 10) +
1 (32 in FIG. 10) +2 (33 in FIG. 10) +30 (34 in FIG. 10) =34, and the shortest path is 34. Next, when the word “Okayama” 13 in the word dictionary 4 is compared, the human-powered word character string “Okara” is found as shown in FIG.
8 and noise 9, and the word “okayama” 1 in the word dictionary 4
The amount of deviation from 3 f (4+ 41 is f (4,.=1
(35 in FIG. 11) +1 (36 in FIG. 11) +2 (37 in FIG. 11) +20 (38 in FIG. 11) =24, and the shortest path is 39. From the above results, the amount of deviation between the input word string "Okara" 8 and noise 9 and the word "Okara°12" in the word dictionary is 34. The amount of deviation from the word “Okayama” 13 is 24, and the candidate word is 3 Okayama” 13, which has a smaller amount of deviation.
．． “Okara” Output in the order of 12, the correct word “Okara”
12 is not available at the top. [Problem to be Solved by the Invention] Conventional word reading devices are configured as described above, and because the penalty for incorrect insertion of characters when matching words in word determination is fixed, There is a problem in that when there is non-character noise etc. in the text, the correct word cannot be obtained because the noise etc. are also included in character recognition. The present invention has been made to solve such problems, and an object of the present invention is to provide a word reading device that can correctly match words even if there is noise or the like on the entry character frame of a form. [Means for Solving the Problems] The present invention includes a scanning means 2 that photoelectrically converts characters, etc. on a form 1 to generate image data, and a scanner that performs primary character recognition on the image data of the characters, etc. Candidate string 10
．． 11, a word dictionary 4 in which various words are registered in advance, and a character recognition means 3 that outputs a word 12.13 selected from the word dictionary 4 for the recognition candidate character string 10.11. A word reading device comprising a word determining means 59 that calculates using an arithmetic expression and ranks and selects words with a high probability value of identity, detects and outputs the non-character element rate of image data such as characters. A pseudo-noise character detection means 6 is provided, and a word determination means 59 performs calculations incorporating the non-character element ratio into various parameters, and ranks and selects words with a high probability of identity. (Operation) If non-character noise etc. is present in image data such as characters, the pseudo-noisy character detection means 6 detects the non-character element rate.Using this non-character element rate as a parameter, various conventional parameters are determined. , and calculate it using a predetermined calculation formula to determine which word 1 is the recognition candidate character string 10.11.
2. Word determining means 5 determines which word has the highest probability of identity with 13.
9, and the high ranking words 12 and 13 are determined to be the character string input from form 1. [Example] Hereinafter, an example of the present invention will be described based on the drawings. In Fig. 1, 1 is a form on which characters are written, 2 is a scanning means that reads this form l by photoelectric conversion, and 3 is a scanning means that cuts out and recognizes input characters one by one, and selects the first recognition candidate character. 4 is a word dictionary that stores reference words; 6 is a pseudo-noise character that observes an image of each input character written on a form and detects non-character images, such as characters that are predicted to be noise; The detection means 59 uses the information of the pseudo-noise characters outputted from the pseudo-noise character detection means 6 to compare the recognition candidate characters outputted from the character recognition means 3 with the reference words in the word dictionary 4. This is a word determining means that determines the word read by the user. The pseudo-noise character detection means 6 measures the height and width data of each piece of image data input from the scanning means 2 and outputs the measured data. In the word reading device of the present invention, it is assumed that a general form 1 shown in FIG. 3 is input. Further, the character recognition means 3 outputs recognition candidate characters 10 shown in FIG. 4 and input candidate characters 11 shown in FIG. 5 in response to input of the form 1 shown in FIG. Furthermore, words as shown in FIG. 6 are registered in advance in the word dictionary 4. FIG. 2 shows the pseudo-noise character detection means 6 of the word reading device.
This is an explanatory diagram for explaining an example. In Figure 6, which shows the width and height of the image for each character and noise in 7, 8, and 9 in Figure 3 showing a general form 1, 40 are the width and height of the image of the first character "O" of the input word string "Okayama", 41 are the width and height of the image of the second character "Power", 4
2 is the width and height of the image of the third character "Y", 43 is the width and height of the image of the fourth character "Ma", and 44 is the input word string "
The width of the image of the first character "o" in "Okara". Height, 45 is the width, height of the image of the second character "power", 46
are the width and height of the image of the third character "Y", and 47 are the width and height of the image of noise attached to the form. FIG. 10 shows the input word character string "Okara" 8 and noise 9 in FIG. 3 and the word "Okara" in the word dictionary 4 in FIG. 12 is an explanatory diagram illustrating comparison with "12. In the figure, 48 is the path where the penalty for character insertion errors is small;
1 is the distance between the first character “o” of the input word string “Okara” 8 and the “o” of the word “Okara” 12 in the word dictionary, 3
2 is the distance between the second character “Riki” in the input word string and “Riki” in the word dictionary 4, and 33 is the distance between the third character “Ya” in the human word string and “Ya” in the word dictionary 4. 24 is the distance for a single character insertion error, and 34 is the path that provides the shortest distance. FIG. 11 shows the input word character string "Okara" 8 and noise 9 in FIG. 3 and the word "OKAYAMA" 13 in the word dictionary 4 in FIG. 6 when the penalty for character insertion errors for pseudo-noise characters is reduced. In Figure 0, which is an explanatory diagram illustrating matching with ``O'', 48 is a path that reduces the penalty for character insertion errors, as in Figure 10, and 35 is a path with the first character "o" of the input word string "Okara" 8. Word “Okayama” 13 “ in word dictionary 4
36 is the distance between the second character "R" of the input word string and the word "R" in the word dictionary 4, and 37 is the distance between the third character "Y" of the input word string. The distance from the word “ya” in the word dictionary 4, 38, is the noise attached to the form 9
and the word "ma" in the word dictionary, and 39 is the route that gives the shortest distance. The word determining means 59 uses the general dynamics programming method shown in FIG. 7 with reference to the output signal shown in FIG. 2 from the pseudo-noisy character detecting means 6. Next, the operation will be explained. For example, the detection criteria for the pseudo-noise character detection means 6 are as follows. "When the width and height of the character image are both 4 or less, the character is considered a pseudo-noise character." When this criterion is applied to the width and height of the human-powered characters shown in Figure 2, 40-4
The width and height shown in 47 are both larger than 4, and the characters in the input word strings "Okayama" and "Okaya" do not become pseudo-noise characters, but the width and height shown in 47 are both larger than 4. 4 and is determined to be a pseudo-noise character. Next, when the word determining means 59 performs a comparison with the word dictionary 4, it reduces the penalty for character insertion errors as an element rate for pseudo-noise characters, that is, non-characters. In other words, the recurrence formula for the word determination method is as follows. <Word determination method> Input word character string from the form 7.8S=s. S2...S7 and the word T=t+, Lx in the word dictionary 4
. ...When comparing with A9, the characters Si and 1. Introducing the concept of distance from
l+3! . . Si and jl+'m . . . tj are introduced, and the deviation amount f (inj) is introduced and calculated using the following recurrence formula. f (0・11)=0' (j・j)=m 1 n (' 1i-1-J)
10dt +f (l・j−11+d3・f(mu−
1+ J-1) ” d 1) d, if there is 1 degree in the recognition candidate characters for the two characters Si, its rank, if there is no 1j in the recognition candidate characters, P (constant) d2: constant d,: constant ml n (LY4): The minimum value among X, Y, and Z, that is, the path from f + i-t, j + 15 to r tt, > 17 as shown in Fig. 7 is sI + 32...
・The comparison between S i-1 and Lliim...1j has been completed, and the degree of deviation is (allowing insertion errors). In this case, the letter S
. is considered to be an extra character, and therefore the penalty is d2.
19, f(in J) =' (i-1,j)
It shall be ten d2. Similarly, the route from f tt, j-n 16 to r(A1) 17 is 1. does not match a character in the input word string S as an extra character (allowing a missing one character error), but instead gives a penalty d, 20'(i, j) = r(i,
Let j−H+d 2・. Therefore, the word determining means 59 specifies d as follows, and compares the input word character string "OKAYAMA" 7 in FIG. 4 with the words in the word dictionary 4 in FIG. 6. In this case, since none of the characters in the input word character string "Okayama" 7 is a pseudo-noise character, in the recurrence formula, the predetermined dt "
dtb=30. Setting d3 = 30 and P = 20, the operation is exactly the same as the conventional example. Input word string “Okayama 7
The amount of deviation f(a
, 3+ is f 14.3+ "l (21 in Figure 8) + 1 (22 in Figure 8) + 2 (23 in Figure 8) + 30 (24 in Figure 8) = 34, and the minimum If the path that gives the amount of deviation '(4+31) is found by bank tracking, the path is (4, 3). 25 routes are followed. Also, the difference between the input word character string "Okayama" 7 and the word "Okayama" 13 in the word dictionary 4 is ffi f (4, 41
is f (4,41= 1 (26 in Figure 9) + 1 (27 in Figure 9) + 2 (28 in Figure 9) + 20 (29 in Figure 9), and the shortest path is 30. Therefore, the candidate words from the word determining means 59 are "Okayama" 13 in the word dictionary 4 with a small deviation, followed by "Okayama 1".
The words are output in the order of 2, and the correct word is obtained at the top. When the input word character string "OKAYA" 8 and noise 9 in FIG. 5 are compared with the words in the word dictionary 4 in FIG. It is determined that it is a gender character, and in the recurrence formula, by including the non-character element rate, d8=dx-+10.dzb=30.d
3 +30. Let it be P+20. The verification operation will be explained below. Input characters in Figure 5 and word 4 in the word dictionary “Okaya” 8
and noise 9, and the word T- in the word dictionary 4 in FIG.
becomes “Okaya” 12. SL (input word string “O” of “Okaya°8”) and 1
1 (the word "o" in the word "OKAYA°12" in the word dictionary), 1. is in the first position of the 31 recognition candidate characters 11, so d, -1 (31) becomes ' +111 -m 'n(' (lull ” d
tb+' (Ice)+d,,f,. ,. , ＋
1) = min (30+30.30+30.l). Similarly, when comparing S2 and 1t, 1t is in the first position of the 32 recognition candidate characters 11, so d, +1 (32) becomes f mountain ``”' l rl (f (1+t) + d
zb * f 11 ports) + ds * f (1, 1)
+ 1)=min (31+30.31+30゜1
+1) =2. Similarly, when comparing S3 and t, t is in the second position of the recognition candidate character 11 of S, so d, = 2 (33)' (3+ri ``m''n(' (t・2) 10d
Zb+ ' (3+g)+d3* f tz+t>
+2)=min (32+30.32+30°2+
2) =4. Similarly, when comparing S4 and +4, +4 does not exist among the 34 recognition candidate characters ll, so d, +20, and S4
is a pseudo-noise character, so f(4,. When finding (x=0.1, 2.3), the penalty for character insertion error path 48 is dz=dz-+10, so f(4,3)"'m i n (f (*, 3
) +dza, fn, z)+ d ro1 ・
f (=take, ko, +d plate)=m
in (4+10.42+30°32+20) +14. That is, the amount of deviation f(
4, 33 is f < a, s+ = 1 (31 in Figure 10)+
1 (32 in Figure 1O) +2 (33 in Figure 10) +10 (24 in Figure 10) +14, and the shortest path is 34. On the other hand, when the word "Okayama.
As shown in the figure, the input word string S is "Okayama" 8 and noise 9, and the word T in the word dictionary is "Okayama" 13.
becomes. ! , (“o” of input word string “Okaya” 8) and 1+
When comparing (the "o" in the word "Okayama" 13 in the word dictionary), t-o is ranked first among the recognition candidate characters 11 for s, so a, -t (35) becomes ' (lull = m' n (f
(Owl) + dZb + f (1,6
1+d, ・f (O・+1>+1)=
min (30+30.30+30.1)+1. Similarly, s2 and t! When comparing , 1t is in the first position of 33 recognition candidate characters 11, so d, +1 (36)' (!+ Goose 1 "m" (f (
++2) ” d!b+ ' (lull
+ds* f(+, ++ +1) = min (31+30. 31+30°1 +1) =2. Similarly, when comparing S and t, t is in the second position of the 33 recognition candidate characters 11, so d = 2 (37)' (3 + 3> = m1 n (f (1 + 21
+dtb*' +3・2)+d3, f (t
, t+ +2)=min (32+30.32+30
゜2+2) = 4. Similarly, when comparing S4 and +4, there is no +4 among the 54 recognition candidate characters, so d, = 20, and since S4 is a pseudo-noise character, f (4, 1 (x = 0.1, 2
．． 3) The penalty for path 48 for incorrect character insertion is d
,=d,,=to,' (4・41=m1
n (E <3+4+ ” d!11+ ’ (4
,31+d3・f'(3・s)+2'O')=mi
n (33+10. 34+30°4+20) = 24. That is, the amount of deviation f(
4,. is f, 4. . = 1 (35 in Figure 11) + 1 (11th
36 in the figure) +2 (37 in Figure 11) +20 (38 in Figure 11), and the shortest route is 39. From the above results, the amount of deviation between the input word string "OKAYA" 8 and noise 9 and the word "OKAYA" 12 in the word dictionary is 14, while the amount of deviation between the input word string "OKAYA" 8 and noise 9 and the word "OKAYA" 12 in the word dictionary is 14. The amount of deviation from the word “Okayama” in 13 is 2.
4, and the candidate words are output in the order of "Okaya" with the smallest amount of deviation, followed by "Okayama", and the correct word "Okaya 1" is obtained at the top. In this way, by using the pseudo-noise character detection means 6, The image of each input character is observed to detect characters predicted to be noise, and word matching is performed by reducing the penalty for character insertion errors for characters determined to be pseudo-noise characters. Words can be matched accurately even in the presence of noise. In the above embodiment, the width and height of the image were used as detection criteria for the pseudo-noisy character detection means 6, but the number of black dots in the image, the shape of the image, etc. Other detection methods may also be used. In the above embodiment, the word determination means 59 uses a DP matching method using the concept of Guinamisoku programming, but other matching methods may be used. In the above example, when matching the input word string with the characters constituting the word in the word dictionary, the ranking of the recognition candidate characters is used as the degree of matching, but other information (for example, the rank of the recognized character (e.g., the probability that the input character is that character) may also be used.In addition, although the above embodiment deals with katakana surnames, the present invention also applies to words such as kanji and English characters, and words such as addresses and company names. can be applied. Also, although dz = d2 - = 10 is set as the element ratio of non-characters, it is not limited to this and other values may be used. [Effect of the invention] As explained above, this According to the invention, a scanning device generates image data by photoelectrically converting characters, etc. on a form, a character recognition device performs primary character recognition on the image data of the characters, etc., and outputs a recognition candidate character string; A word dictionary in which various words are registered in advance is compared with words selected from the word dictionary for recognition candidate character strings, and calculations are performed using a predetermined calculation formula that includes various parameters, and words with a high probability of identity are ranked. In a word reading device, a pseudo-noise character detection means is provided which detects and outputs a non-character element rate in image data such as characters, and the word determination means detects and outputs a non-character element ratio in image data such as characters. By incorporating the ratio into the various parameters mentioned above, the words with a high probability of identity are ranked and selected, so even if there is non-character noise etc. on the form, the characters can be accurately input. The word can be determined with high accuracy by comparing the string with the reference word. [Brief Description of the Drawings] Fig. 1 is a block diagram of a word reading device according to an embodiment of the present invention, and Fig. 2 is a non-standard diagram. FIG. 3 is an explanatory diagram showing an example of a form for a word reading device; FIGS. 4 and 5 are explanatory diagrams showing examples of input characters and recognition candidate characters in the word reading device. , FIG. 6 is an explanatory diagram showing an example of a word string in a word dictionary in a word reading device, FIG. 7 is an explanatory diagram showing an example of a word determination method in a word reading device, and FIG.
FIG. 9, FIG. 1O, and FIG. 11 are explanatory diagrams showing an example of word matching in a word reading device, and FIG. 12 is a configuration diagram of a conventional word reading device. 1... Form, 2... Scanning means, 3...
...Character recognition means, 4...Word dictionary, 6.
...Suspicious noisy character detection means, 7.8...
... input character, 10.11 ... recognition candidate character string, 12.13 ... reference word, 59 word determining means. Agent Masuo Ohatsu (and 2 others) Toki 40 10th ntZl 'I! 38I! l Figure 9 2, Title of the invention 3. Single word continuation reading correction device for the person making the amendment (spontaneous) & the entire text of the specification to be amended, and the drawing section. Contents of G Amendment + 11 The entire text of the specification will be amended as shown in the attached sheet. (2) The drawings and Figure 2 shall be amended as shown in the attached sheet. Representative Moriya Shiki Specification (Full text correction) 1. Invention title word reading device 2. Scanning means for generating image data by photoelectrically converting the characters, etc. on the claims form, and A character recognition means performs literary pressure 1 on image data and outputs a recognition candidate character string, a word dictionary in which various words are registered in advance, and the recognition candidate character string is compared with words selected from the word dictionary and various parameters are determined. In a word reading device comprising a word determining means for performing calculation using a predetermined calculation formula including a predetermined calculation formula, and ranking and selecting words with a high double match. Pseudo-noise character detection means is provided for detecting and outputting 2M OTS'L and Tate α Wariji in the image data of the above-mentioned characters, and the word determination means detects and outputs the 2L INO○ A word reading device characterized in that the calculation is performed by incorporating various parameters, and the words with the highest dichotomy are ranked and selected. 3. Detailed Description of the Invention [Field of Industrial Application] This invention relates to a word reading device that reads and recognizes words such as addresses and names, and in particular, recognizes each character that constitutes a word, and reads the recognition results. This invention relates to a word reading device that corrects words using . [Prior Art] In recent years, word reading devices have been attracting attention as a means of entering large amounts of high-speed data into computers, etc., and since Japanese has a complex mixture of kanji, kana, alphanumeric characters, etc. It is desired to develop a word reading device that can read words with high accuracy. As this word reading device, an optical character reading device (OCR) is known. )) etc. In the figure, 1 is a form on which characters are written, 2 is a scanning means for photoelectrically converting and reading the form 1, and 3 is a scanning means for reading input characters.
Character recognition means that cuts out and recognizes each character and outputs recognition candidate characters; 4 is a word dictionary that stores reference words; 5 is recognition candidate characters output from character recognition means 3 and words in word dictionary 4; This is a word determining means that determines a word by comparing the words. The conventional word reading device is configured as described above, and outputs words that are ranked higher among the recognition candidate characters and are registered in the word dictionary 4 and are preferentially selected as the word reading results. be. FIG. 3 is an explanatory diagram showing an example of a general form 1 to be read. In FIG. This is attached (non-character) noise. Examples of recognition candidate characters output from the character recognition means 3 for characters on the form are shown in FIGS. 4 and 5. The figure shows the recognition candidate characters lO output by the character recognition means 3 for each character of the input word character string "Okayama" 7, and for the first character "o" of the input word, "
One recognition candidate character for ``o'' is used for the second character ``power'', one recognition candidate character for ``power'' is used for the third character ``ya'', There are two recognition candidate characters, ``ma'' and ``ya'', and one recognition candidate character, ``nu'', exists for the fourth character 173. Figure 5 shows the input word string “
Recognition candidate characters 11 output by the character recognition means 3 are shown for each character of "Okaya" 8 and noise 9 attached on the form, and for the first character "O" of the input word "Okaya". In this example, one recognition candidate character for “o” is used for the second character, “power”, and one recognition candidate character for “power” is for the third character, “9ya”. In this case, there are two recognition candidate characters "73. "Ya", and there is one recognition candidate character "ノ" for noise 9 attached to the form. Figure 6 shows the word reading. 6 is an explanatory diagram showing an example of words in a general word dictionary 4 regarding katakana surnames in the device; FIG.
In the figure, 12 is the word "Okaya", and 13 is the word "Okayama". FIG. 7 is an explanatory diagram of a word determination method using the general dynamic programming concept used in the word determination means 5 of the word reading device. Character string SI that constitutes a word
+! ! ...S, and tl+l! ...When checking with tゎ, partial string SI+3! ...Si and jl+L!
...1. The amount of deviation when the two best match is f(
! , 14 in the figure is f(tt, J-1
> , 15 is '(i-1+j> + 16 is r(i
nj-1) + 17 is f(i, Jl, and
18 is the distance d between f (i-1・j-1) and f Yuha, and 19 is f. -1, 5, and '+! +j) The distance between at, 2o is f(in j-11 and f(!,
j1 is the distance d. Figure 8 shows the input word character string “Okayama” 7 and the word dictionary 4.
In Figure 0, which is an explanatory diagram illustrating the matching with the word "Okayama" 12 in , 21 is the input word string "Okayama" 7
The first letter of and the word “okaya” in word dictionary 4
The distance 22 from the first character in 2 is the distance between the character power of the second character and the "power" of the word in the word dictionary 4, and 23 is the distance between the character y of the third character and the "power" of the word in the word dictionary. ”, 24 is the distance to the character insertion film, and 25 is the path that gives the shortest distance. Figure 9 shows the input word string “Okayama” 7 and the word dictionary 4.
In Figure 0, which is an explanatory diagram illustrating matching with word 1 ``Okayama'' in 13, 26 is the input word string ``Okayama''.
The distance between the first character of 7゛ and the "O" of the word "Okayama" 13 in the word dictionary, and 27 is the distance between the power of the second character and the word "power" in the word dictionary 4. , 28 is the distance between the third character ya and the word "ya" in the word dictionary 4, 2
9 is the distance between the fourth character ``ma'' and the word ``ma'' in the word dictionary, and 30 is the route that provides the shortest distance. Figure 10 shows input word character examples “okaya” 8 and noise 9.
FIG. 4 is an explanatory diagram illustrating a comparison between the word "OKAYA" 12 in the word dictionary 4 and the word "OKAYA" 12 in the word dictionary 4. In the figure, 31 is the input word string "
Okaya "8's first letter O and the word in the word dictionary"
The distance from ``O'' of Okaya'''12, 32 is the character power of the second character of the input word string and the "power" of the word in the word dictionary 4.
33 is the distance between the third character ya of the input word and "ya" in the word dictionary 4, 24 is the distance for a single character insertion error as in FIG. 8, and 34 is the shortest distance. This is the route that gives FIG. 11 shows the input word character string "Okaya" 8, the noise 9 attached to the form, and the word "Okayama" 13 in the word dictionary 4.
FIG. In the figure, 35 is the distance between the first character O of the input word string "Okaya" 8 and the "O" of the word "Okayama" 13 in the word dictionary 4, 3
6 is the power of the second character and the “power” of the word in word dictionary 4
37 is the distance between the third letter strength and the word "ya" in the word dictionary, and 38 is the distance between the noise 9 attached to the form and the word "ma" in the word dictionary 4. 39 is the route that provides the shortest distance. Next, the operation of the conventional device will be explained. The word determining means 5 compares the input word character string entered in the form shown in FIG. 3 with the words in the word dictionary 4 shown in FIG. This is done using the following method. <Word determination method> When comparing the input word character strings 7, 8S-81, S,...S with the words T=tl, tt...+7 in the word dictionary 4, the characters SL and t, Introducing the concept of distance from
0 substring SI+s2...31 with this as dl
and LI+Lffi...1. Introduce the amount of deviation f'(!, j) when the two best match, and calculate this using the following recurrence formula. f(O・0)=Of(1,J)=min(f+i−+,j
+ + dt +f (lid/j-11+a 3
・ f (5-I・j-lightning)+d+)dl:
Character 3! If there is 1° in the recognition candidate character, its rank, t in the recognition candidate character, otherwise P (constant) d2: constant d,: constant m l n (X, Y, Z): X, Y, The minimum value in Z, that is, f (i-+
, j) The route from 15 to f(!, c 17 is Sl+51
...S, -1 and 1. ．． 1. ...The matching up to L has been completed, and the degree of deviation is f (i - 1°) 15, so the character Si and the character in T are not matched (allowing for one character insertion error). It means that. In this case, character 3i is considered to be an extra character, and therefore a penalty of dz19 is given, '(i+j+ =' (i-1+
j) Set to +d2. The route to f(i, j-1, 16 to f., c) is similar, and 1. is an extra character and is not matched with a character in the input word string S (allowing for a missing one character error). ) Instead, a penalty of d320 is given and f(i, j) = f(!+j
−1) +d 3 . The path from f(i-1+j-1) 14 to f(mu+j) is a path for matching the character Sm with the character 1j, and the degree of this matching is defined as d. dl takes the following values. Let f (so al ) obtained by this recurrence formula be the amount of deviation between the input word string sl+ Sl...Sl and the words tIn1t...tll in the word dictionary, and calculate the amount of deviation to all words in the word dictionary. are determined, rearranged in descending order of the amount of deviation, and output candidate words. Using the word determination method, predetermined dt=30, d
3 = 30. The case where P=20 will be explained. The word determining means 5 compares the input word character string "Okayama. 7" shown in FIG. 4 with the words in the word dictionary 4 shown in FIG. Verify. As shown in Figure 8, the input word string S is "Okayama" 7, and the word T in the word dictionary is "Okayama" 12. Value of f(.,X) (X≧1) When calculating, the recurrence formula is
f(.,It)=f(.llll-1)+d, and f,. , +1 = ds = 30 fto, t> = 2d3 = 60 f(., 3) = 3di = 90. Similarly f. ,. ) When calculating the value of (X≧1), the recurrence formula is f IX, +1) ” f (X-1101”
a, and f (1,.*=d*=30 f(!,.> ”=2d* =60f N...>
=3dFu -90 f(4,.) Fog 4a, =120. Next, SI (input word 7 “O” in Okayama°) and 1
+ (1 o of word 12 “Okaya” in word dictionary 4)
tl is ranked first among 31 recognition candidate characters 10.
Therefore, d in Figure 8 becomes +1 (21) and f 1. ,,) xm i n (f
(out> +d, l f tll
ll>+ds, f(o, e> +1) -min (30+30.30+30.1)+1. Similarly, when comparing S and +8, there is no 1 among the 31 recognition candidate characters 10, so dl +20, ' (1,
ffi> −rn l n (f (6
+ri ” d t + f (Inl1
+d3. f(., 1)+20) -min (60+30. t+ao. 30+20) +31. When st and tl are compared, t is found among recognition candidate characters 0 of st.
Since l does not exist, dl=20, and ' (ton
−m l n (f (ill> +a, + f
(zoo)+d3・r<+・o) +20) =min (1+30.60+30°30+20) +31. When st and +8 are compared, t is the first of 32 recognition candidate characters 10! Therefore, d+'=1(22) in Figure 8'<til>""' I n (f (In)
+dt + '(1-11+d3+f(凰1
1) + 1)=min (31+30.31+
30°1+1) =2. When comparing Sl and tj, t is found among 31 recognition candidate characters 0.
, does not exist, so dl=20, and '<In! l
"ml n ('(Or2> 10d! r
'(+-2>+d,, f(., z>+20) = min (90+30.31+30°60+20) +61. When comparing S and tl, t is found in the recognition candidate character O of S.
Since l does not exist, d+=20, '(:l+1
) ==m'n(' (!nil +dt +
' (3+@)+43r f+g, o+ +20
)=min (31+30.90+30°60+20) +61. When comparing 3t and t, among the 32 recognition candidate characters O, (
, does not exist, so dl=20, and f <t's
) = min (f <+,,>
+dt + f (t, t) + ct,,
fH + +20) = min (61 + 30.2 + 30°31 + 20) = 32. When comparing 1t with S, there is no +2 in the 10 recognition candidate characters of S, so a, = 2O, ' (3+2
+ ”fly l n (f (!, ri ” dl
+'<:ilH+d, , f, t, n
+20)=min (2+30.61+30°31+20)=32. When comparing S, and t, there is +3 in the second position of the recognition candidate character lO of S, so d, = 2 (23), and r (3, 31' = min (f < t, x, +d
z, r ts, z. +d3・f +t, t> + 2)=min (3
2+30.32+30°2+2) =4. When s4 and tl are compared, tl does not exist among the 34 recognition candidate characters lO, so a, = 20, and ' (4+1
) ”m'n(' (3+l) +d* +
f <4+@>+ d s * f ll+. )+2
0)=min (61+30.120+30°90+2
0) =91. s4 and t! When t2 does not exist among the 34 recognition candidate characters lO, it becomes d, -20, and f (41ri = m l rl (f (3・ffi+ ”dz
+ f (411) + d, , f +s, t>
+20)=min (32+30.91+30゜6
1+20) =62. When s4 and t are compared, t does not exist among the 34 recognition candidate characters lO, so d=20, and' (4+
31 4 m '''(' (3-3)
+ d t * ' (4, ri + d,
・rts・snow)+20) = mln (4+30.62+30°32+20) =34. That is, the amount of deviation r < a, 3) between the input word character string "Okayama" 7 and the word "Okaya" 12 in the word dictionary is f < a, s+ -1 (21 in Figure 8) + 1 ( 8th
22) in the figure +2 (23 in Figure 8) +30 (24 in Figure 8), and when the path that gives the minimum deviation amount f(41,) is found by backtracking, (4,3) . The route passes through the points (3,3), (2,2), (1,1), and (0,0), resulting in 25 routes. Next, a comparison is made with the word "Okayama" 13 in the word dictionary 4. As shown in FIG. 9, let the input word string S be 1 OKAYAMA"7, and the word T in the word dictionary be "OKAYAMA"13. 3. ("O" in the input word string "OKAYAMA°7") and 1+ (word “Okayama” in the word dictionary 13 “O°”
), tl is in the first position of recognition candidate character 10 of character 31, so d+ = 1 (26) in Figure 9'(1+1>"ml n (' (ll+1
) +d1 * f (roll>+d,・f
(o...) +1) -min (30+30.30+30.1)=1. Similarly, when we check st and t2, we get 3! Since 1t is in the first place among the recognition candidate characters 10, d in FIG. 9 is 1.
(27), f tt+z+ = min (f <r+z>
+d寞+f (!+l)+d,・f(1・I)
+1) =mirl (31+30.31+30°1+1) =m=2. Similarly, when comparing S and t, t is in the second position of the recognition candidate character lO of S, so d+ -2(
28) Then f <s, s> ”min (f (*, x>
+dz, f <z, t)+dco・f (t・
z>+2) -min (32+30.32+30°2+2) =4. Similarly, when comparing S4 and t4, t4 does not exist among the 34 recognition candidate characters 10, so d in FIG. 9 = 20
(29) becomes f <a+a+ =mln (f (14) +a
, l f (413++a, , f 13.s
> +20)-min (33+30.34+30°4+20) =24. In other words, the input word string "Okayama" 7 and the word dictionary 4
The amount of deviation f (4+41
is f < a, a+ -1 (26 in Figure 9) +1 (27 in Figure 9) +2 (28 in Figure 9) +20 (29 in Figure 9) 24, and the shortest path is It will be 30. From the above results, the amount of deviation between the input word character string "Okayama" 7 and the word "Okayama" 12 in the word dictionary is 34. The amount of deviation between the input word string "Okayama°7" and the word "Okayama" 13 in the word dictionary is 24, and the candidate words are output in the order of "Okayama" 13 and "Okayama" 12, which have the smallest amount of deviation, and are correct. In addition, the input word string written on the form in Figure 5 “
The word determination means compares the noise 9 adhering to the word "OKAYA" 8 and the form with the words in the word dictionary 4 shown in FIG. 6. By the word determination method, the word "OKAYA" in the word dictionary is
12, as shown in FIG. 10, the amount of deviation f (4, )
is f, <a, s> = 1 (31 in Figure 10) + 1(
32 in FIG. 10) +2 (33 in FIG. 10) +30 (34 in FIG. 10) =34, and the shortest path is 34. Next, when the word "Okayama" 13 in the word dictionary 4 is compared, the input word string "Okayama" is found as shown in FIG.
8 and noise 9, and the word “Okayama” 1 in the word dictionary 4
The amount of deviation f(4, n) from 3 is f 14.4>=
1 (35 in FIG. 11) +1 (36 in FIG. 11) +2 (37 in FIG. 11) +20 (38 in FIG. 11) =24, and the shortest path is 39. From the above results, the amount of deviation between the input word string "OKAYA" 8 and noise 9 and the word "OKAYA" 12 in the word dictionary is 34. The amount of deviation from the word “okayama” 13 is 24, and the candidate word is “okayama” 13, which has a smaller amount of deviation.
．． ``OKAYA112'', and the correct word ``OKAYA112'' cannot be obtained at the top. [Problem to be solved by the invention] The conventional word reading device is configured as described above, and the word reading device is configured as described above. Since the penalties for incorrectly inserting characters during verification are fixed, there is a problem in that if non-character noise, etc. is present in the input character frame of the form, the correct word cannot be obtained.This invention was developed in order to solve this problem, and the purpose is to obtain a word reading device that can accurately match words even if there is noise etc. on the entry character frame of a form. Means for Solving the Problem] The present invention includes a scanning means 2 that photoelectrically converts characters, etc. on a form l to generate image data, and performs character recognition on the image data of the characters, etc. to generate recognition candidate character strings 10. 11
A character recognition means 3 that outputs a character recognition means 3, a word dictionary 4 in which various words are registered in advance, and a recognition candidate character string 10.11 is compared with a word 12.13 selected from the word dictionary 4 and a predetermined character string including various parameters is compared. In a word reading device comprising a word determining means 59 which calculates using the calculation formula and ranks and selects words with a high degree of political change, pseudo noise is detected and outputted as a noise-like ratio of image data such as characters. Gender character detection means 6 is provided, and word determination means 59 incorporates this noise-likeness ratio into various parameters and performs calculations to rank and select words with a high degree of -political change. [Operation] If non-character noise or the like is present in image data such as characters, the pseudo-noise character detection means 6 detects the noise-likeness ratio. This noise-likeness ratio is added to the conventional various parameters as a parameter, and calculated using a predetermined calculation formula to determine which word 12, 13 the recognition candidate character string 10.11 has the highest degree of matching. The determining means 59 selects and determines the high-ranking word 12.13 to be the character string input from the form l. [Example] Hereinafter, an example of the present invention will be described based on the drawings. In Fig. 1, 1 is a form on which characters are written, 2 is a scanning means that reads this form l by photoelectric conversion, and 3 is a scanning means that cuts out and recognizes input characters one by one, and selects the first recognition candidate character. Characters to output! ! 4 is a word dictionary storing reference words; 6 is pseudo-noise character detection means for observing an image of each input character written on a form and detecting non-character images, such as characters predicted to be noise; 59 is a word read by comparing the recognition candidate character outputted from the character recognition means 3 with the reference word in the word dictionary 4 using the information of the pseudo-noise character outputted from the pseudo-noise character detection means 6; This is a word determining means for determining. The pseudo-noise character detection means 6 measures the height and width data of each piece of image data input from the scanning means 2 and outputs the measured data. In the word reading device of the present invention, it is assumed that a general form 1 shown in FIG. 3 is input. Further, the character recognition means 3 outputs recognition candidate characters 10 shown in FIG. 4 and input candidate characters 11 shown in FIG. 5 in response to input of the form 1 shown in FIG. Furthermore, words as shown in FIG. 6 are registered in advance in the word dictionary 4. FIG. 2 shows the pseudo-noise character detection means 6 of the word reading device.
This is an explanatory drawing for explaining an example. 40 is the width and height of the image of the first character "O" of the input word string "Okayama", 41 is the width and height of the image of the second character "Chikara", 4
2 is the width and height of the image of the third character "Y", 43 is the width and height of the image of the fourth character "Ma", and 44 is the input word string "
Width of the image of the first character “O” in Okaya°. Height, 45 is the width and height of the image of the second character "power", 46
are the width and height of the image of the third character "Y1", and 47 are the width and height of the noise image attached to the form. Figure 10 shows how to reduce the penalty for character insertion errors for pseudo-noise characters This is an explanatory diagram illustrating how the input word character string "OKAYA" 8 and noise 9 in FIG. 3 are matched with the word "OKAYA" 12 in the word dictionary 4 in FIG. 6 when 31 is the distance between the first character "O" of the input word string "OKAYA" 8 and "O" of the word "OKAYA" 12 in the word dictionary, and 32 is the input word character. The second character in the column “power” and “ in the word dictionary 4
33 is the distance between the third character "ya" in the input word string and "ya" in the word dictionary 4, 24 is the distance for a single character insertion error, and 34 gives the shortest distance. Fig. 11 shows that when the penalty for character insertion errors for pseudo-noise characters is reduced, the input word character string "Okaya" 8 and noise 9 in Fig. 3 and the word dictionary 4 in Fig. 6 are It is an explanatory diagram explaining the collation with the word "Okayama" 13. In the figure, 48 is the path where the penalty for character insertion error is reduced as in FIG. 10, and 35 is the first path of the input word string "Okayama" 8.
The distance between the character "o" and the "o" of the word "Okayama" 13 in the word dictionary 4, 36 is the second character of the input word string "
37 is the distance between the third character "Ya" of the input word string and the word "Ya" in the word dictionary 4, and 38 is the distance from the word "Ya" in the word dictionary 4. This is the distance between the noise 9 attached to the word "ma" in the word dictionary, and 39 is the route that gives the shortest distance. The word determining means 59 uses the general dynamic programming method shown in FIG. 7 with reference to the output signal shown in FIG. 2 from the pseudo-noise character detecting means 6. Next, the operation will be explained. For example, the detection criteria for the pseudo-noise character detection means 6 are as follows. "When the width and height of the character image are both 4 or less, the character is considered a pseudo-noise character." When this criterion is applied to the width and height of the input character shown in Figure 2, 40-4
The width and height shown in 47 are both larger than 4, and the characters in the input word strings "Okayama" and "Okaya" do not become pseudo-noise characters, but the width and height shown in 47 are both larger than 4. 4 and is determined to be a pseudo-noise character. Next, the word determining means 59 reduces the penalty for character insertion errors for pseudo-noise characters when performing comparison with the word dictionary 4. In other words, the recurrence formula for the word determination method is as follows. <Word determination method> Input word string from form? , 8S=st. s2...s, 1 and the word T in the word dictionary 4 = t,, t!
. ...When performing a comparison with t7, we introduce the concept of the distance between characters s1 and 1j, and let this be dl. Partial string S1
, 3□...31 and tIn tJ...1j are introduced, and the deviation amount f(mu+j) is calculated using the following recurrence formula. f (.,.) 0 f (direct, J) = min (f (!-hJ
) 10d-f (Todoroki+j-11+d 31
f 1l-11J-11+d Kuzu) dl: Letter S! If there is tj among the recognition candidate characters, its rank; if there is no 1J among the recognition candidate characters, then P (constant) d8: constant d,: constant min (X, Y, Z): of X, Y, Z In other words, as shown in FIG. 7, f (i-1
,115 to f (the route to 1+117 is 31,3□・
...S. and jl+ to wealth...1. The verification has been completed,
The degree of deviation is f (1-11J) 15, which means that the character 3i is not compared with the characters in T (one character insertion error is allowed). In this case, the character S1 is considered to be an extra character, and therefore the penalty d atmosphere is 1.
Given 9, f mountain = r (Toho, ha + d2. f (i + J - 1) The route from 16 to f mountain, 17 is the same, and tJ is an extra character in the input word string S. It does not match the characters (allowing for the missing error of one character), but instead gives a penalty of d, 20.
Let it be J-11+d2. Therefore, the word determination means 59 specifies dO as follows, and matches the input word character string "Okayama" 7 in FIG. 4 with the words in the word dictionary 4 in FIG. 6. In this case, since none of the characters in the input word string "Okayama" 7 is a pseudo-noise character, in the recurrence formula, the predetermined dt=
dz,=30. ds = 30 and P = 20, and the operation is exactly the same as the conventional example. Input word string “Okayama” 7
The amount of discrepancy between and the word “Okaya” 12 in the word dictionary f<a
, x> is f <a, s+ = 1 (21 in Figure 8) + 1 (22 in Figure 8) + 2 (23 in Figure 8) + 30 (24 in Figure 8)! = 34, and if we find the path that gives the minimum amount of deviation '(4+3) by backtracking, we get (4, 3). The route passes through the points (3,3), (2,2), (1,1), and (0,0), resulting in 25 routes. Also, the amount of deviation f(4,) between the input word string "Okayama" 7 and the word "Okayama'13" in the word dictionary 4 is f (4,4) = 1 (26 in Figure 9) + 1.
(27 in FIG. 9) +2 (28 in FIG. 9) +20 (29 in FIG. 9) =24, and the shortest route is 30. Therefore, from the word determination means 59, the candidate word is ``Okayama'' 13 in the word dictionary 4 with a small deviation, and then ``Okayama'' 1.
The words are output in the order of 2, and the correct word is obtained at the top. When the input word character string "OKAYA" 8 and noise 9 in FIG. 5 are compared with the words in the word dictionary 4 in FIG. It is determined that the character is a noise character, and in the recurrence formula, by including the noise-likeness ratio, ds = dz-=10. dzb=3
0. Let d3=30°P=20. The verification operation will be explained below. The input word character string S becomes "okaya" 8 and noise 9 in FIG. 5, and the word T in the word dictionary 4 in FIG. 6 is "okaya".
It becomes 12. st (input word string “O” in “Okaya°8”) and 1
+ (“O” of the word “Okaya” 12 in the word dictionary) is compared, and it is ranked 1st out of 31 recognition candidate characters 11. Therefore, d,=1 (31)' (11) -m'n(' (11+ll +
dtb*' (1+el+d,,f,.,.,+1
) =min (30+30.30+30.1)=1. Similarly, 3. When comparing and t, t2 is in the first position of the 32 recognition candidate characters 11, so d, = 1 (32) and f (!+ri ”m j n (f ll+ri +
dtbr'(!++)+d3・f<r,n
+1) =min (31+30.31+30°1+1) =2. Similarly, when comparing S and t, t is in the second position among the recognition candidate characters 11 of S, so d, = 2 (33) and f + s * s) = min (f
(!+3) +dtb+ f (1
+! 1+d3+f,z+t>+2)=min
(32+30.32+30°2+2) =4. Similarly, 3. When comparing t4 with
are pseudo-noise characters, so f, 4. . (x=0.1.2.3), the penalty for character insertion error path 48 is a! = az, = 10, so '+4+2>"""n('(1+3)"
dta+'4+t>+d3,f<s,
*> +(L)=min (4+10.42
+30°32+20). That is, the amount of deviation f1 between the input word character string "OKAYA" 8 and the noise 9 and the word "OKAYA" 12 in the word dictionary 4 is
4+21 is r t4. *> = 1 (31 in Figure 1O) +
1 (32 in FIG. 10) +2 (33 in FIG. 10) +10 (24 in FIG. 10) The shortest route is 34. On the other hand, when the word "Okayama" 13 in the word dictionary 4 of FIG. 6 is compared with the input characters and noise of FIG. Therefore, the word T in the word dictionary becomes "Okayama" 13. 3. (“O” of input word string “Okaya” 8) and 11
("O" in the word "Okayama" 13 in the word dictionary), t is in the first position of the recognition candidate characters 11 of s, so d, = 1 (35) and ' (1, 11””m 1 n (f (0+ll
+dtb+ 'Tl+61"ds, f (
(1,0>+1)=min (30+30.3
0+30.1)l. Similarly, when comparing S2 and t2, t2 is in the first position of the 32 recognition candidate characters 11, so d, = 1 (36), and f u, t+ = m l n (f n, t, +dt
b* f <t, +. +d3・fil・+1+1) =min (31+30.31+30°1+1)=2. Similarly, when comparing S and Tomo, there is t in the second position of the recognition candidate character 11 of S, so d,=2(37)' (3+31 ``rn l n (' (!+1 )
"dRbr' +3.21+d, f(!
・ri+2) = min (32+30.32+30°2+2). Similarly, when comparing s4 and t4, t4 does not exist among the 34 recognition candidate characters, so d, = 20, and S4 is a pseudo-noise character, so f(4,. (x = 0.1.2 .3), the penalty for character insertion error path 48 is dt=az,=10, and r (4
, 41=min (f <s, a+
+ d211+ r <a, s++d3・f
(3・3)+20) = min (33+10.34+30°4+20) =24. In other words, the amount of deviation between the input word character string "Okaya" 8 and the noise 9 and the word "Okayama" 13 in the word dictionary '(
4+4) is f <a, a> ” 1' (35 in Figure 11)+
1 (36 in FIG. 11) +2 (37 in FIG. 11) +20 (38 in FIG. 11) =24, and the shortest path is 39. From the above results, the amount of deviation between the input word string "OKAYA" 8 and noise 9 and the word "OKAYA" 12 in the word dictionary is 14, while the amount of deviation between the input word string "OKAYA" 8 and noise 9 and the word "OKAYA" 12 in the word dictionary is 14. The amount of deviation from the word “Okayama” in 13 is 2.
4, and the candidate words are output in the order of ``6 OKAYA'' with a small amount of deviation, followed by 1 OKAYAMA.The correct word ``OKAYA'' is obtained at the top.In this way, by using the pseudo-noise character detection means 6. The image of each input character is observed to detect characters predicted to be noise, and word matching is performed by reducing the penalty for character insertion errors for characters determined to be pseudo-noise characters. Words can be matched accurately even in the presence of noise. In the above embodiment, the width and height of the image were used as detection criteria for the pseudo-noisy character detection means 6, but the number of black dots in the image, the shape of the image, etc. Other detection methods may also be used. In the above embodiment, the word determining means 59 uses a DP matching method using the concept of Guinamisoku programming as a word matching method, but other matching methods may be used. In the above embodiment, when matching the input word string with the characters constituting the word in the word dictionary, the ranking of the recognition candidate characters is used as the degree of matching, but other information (for example, when the characters are recognized (e.g., the probability that the obtained input character is that character).In addition, in the above embodiment, katakana surnames were described, but this invention can also be applied to words such as kanji and English characters, and words such as addresses and company names. In addition, although the ratio of noise-likeness is set to d, = d!, = 10, it is not limited to this and other values may be used. [Effects of the Invention] As explained above, this According to the invention, the scanning means generates image data by photoelectrically converting characters, etc. on a form, the character recognition means performs character recognition on the image data such as the characters, and outputs a recognition candidate character string; A word dictionary in which words are registered is compared with words selected from the word dictionary for recognition candidate character strings, and calculations are performed using a predetermined calculation formula including various parameters, and words with high political change are ranked and selected. A word reading device comprising pseudo-noise character detection means for detecting and outputting a noise-likeness ratio of image data such as characters, and a word determining means incorporating this noise-likeness ratio into the various parameters. By performing calculations, we ranked and selected words with high political change, so even if there is non-character noise etc. on the form, the accurately input character strings are compared with the reference words, and the words with high political change are selected. Able to determine words with precision.

[Brief explanation of the drawing]

第１図はこの発明の実施例による単語読取装置の構成図
、第２図はノイズらしさの割合の値を示す図、第３図は
単語読取装置の帳票の一例を示す説明図、第４図及び第
５図は単語読取装置における入力文字及び認識候補文字
の一例を示す説明図、第６図は単語読取装置における単
語辞書内の単語列の一例を示す説明図、第７図は単語読
取装置における単語決定手法の一例を示す説明図、第８
図・第９図・第１０図及び第１１図は単語読取装置にお
ける単語照合の一例を示す説明図、第１２図は従来の単
語読取装置の構成図である。ｌ・・・・・・帳票、２・・・・・・走査手段、３・・
・・・・文字認識手段、４・・・・・・単語辞書、６・
・・・・・疑慎ノイズ性文字検出手段、７，８・・・・
・・入力文字、１０．１１・・・・・・認識候補文字列
、’１２．１３・・・・・・基準単語、５９単語決定手
段。FIG. 1 is a configuration diagram of a word reading device according to an embodiment of the present invention, FIG. 2 is a diagram showing the value of the noise-likeness ratio, FIG. 3 is an explanatory diagram showing an example of a form of the word reading device, and FIG. 4 5 is an explanatory diagram showing an example of input characters and recognition candidate characters in a word reading device, FIG. 6 is an explanatory diagram showing an example of a word string in a word dictionary in a word reading device, and FIG. 7 is an explanatory diagram showing an example of a word string in a word dictionary in a word reading device. Explanatory diagram showing an example of the word determination method in 8th
9, 10, and 11 are explanatory diagrams showing an example of word matching in a word reading device, and FIG. 12 is a configuration diagram of a conventional word reading device. l...Form, 2...Scanning means, 3...
...Character recognition means, 4...Word dictionary, 6.
...Suspicious noise character detection means, 7, 8...
...Input character, 10.11...Recognition candidate character string, '12.13...Reference word, 59 word determination means.

Claims

[Scope of Claims] A scanning means for photoelectrically converting characters, etc. on a form to generate image data, and a character recognition means for performing primary character recognition on the image data of the characters, etc. and outputting a recognition candidate character string. , the word dictionary in which various words are registered in advance is compared with the words selected from the word dictionary for the recognition candidate character string, and the words with a high probability of identity are calculated using a predetermined calculation formula including various parameters. A word reading device comprising a word determining means for ranking and selecting a word, a pseudo-noise character detecting means for detecting and outputting a non-character element ratio for the image data such as characters, and a word determining means for A word reading device characterized in that the calculation is performed by incorporating a non-character element rate into the various parameters, and words with a high probability value of identity are ranked and selected.