JP2001266074A

JP2001266074A - Device for recognizing character

Info

Publication number: JP2001266074A
Application number: JP2000074791A
Authority: JP
Inventors: Aki Sugawara; 亜紀菅原
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2000-03-16
Filing date: 2000-03-16
Publication date: 2001-09-28

Abstract

PROBLEM TO BE SOLVED: To prevent increase in load on an operator, even when multiple similar words are comprised in a word dictionary. SOLUTION: A character string 13 on a slip is read by a picture input part 1 and segmented by a character segmenting part 2. A character-recognizing part 3 recognizes the character string 13 and outputs a character recognition result candidate 9. A word-collating part 4 inspects the matching rate 12 of the candidate 9, with each word in the word dictionary 6 and the word with the high matching rate 12 is outputted as a word recognition result candidate 10 which is the recognition result candidate of the character string 13. When the plurality of word recognition result candidates 10 exist and the similar word recognition result candidates 10 exist among them, a similar word determining part 5 performs selection from the similar candidates 10 to obtain a word recognition result 11, which is the recognition result of the character string 13.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は文字認識装置に関
し、特に文字列の認識を行う文字認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device, and more particularly to a character recognition device for recognizing a character string.

【０００２】[0002]

【従来の技術】従来、この種の文字認識装置は、帳票に
記載された単語を読み取るために用いられている。2. Description of the Related Art Heretofore, this type of character recognition apparatus has been used for reading words written on a form.

【０００３】この従来の文字認識装置のブロック図であ
る図１０を参照すると、従来の文字認識装置は、帳票を
読み取り画像データを出力する画像入力部と、画像デー
タから文字列を切り出す文字切り出し部と、この文字列
の文字を一文字毎に文字列内の文字数分認識し認識した
それぞれの文字に対応するそれぞれの文字認識結果候補
を出力する文字認識部と、この文字列に対応する文字認
識結果候補と帳票に記入される複数の単語を予め格納し
た単語辞書内の各単語との一致の度合いを示す一致度を
それぞれ調べ一致度の高い単語を文字列の認識結果であ
る単語認識結果として出力する単語照合部と、を備えて
いる。そして、単語照合部は、一致度の高い単語を文字
列の単語認識結果として出力するが、類似単語を含む単
語辞書を使用した場合、この一致度の高い単語が複数得
られ、単語認識結果として単語辞書内のこの類似した複
数の単語を出力することが多く、このときには、オペレ
ータが、帳票から読み取った画像データを参照しこの画
像データに対応する単語を選択して単語認識結果とす
る。Referring to FIG. 10 which is a block diagram of this conventional character recognition device, the conventional character recognition device includes an image input section for reading a form and outputting image data, and a character cutout section for cutting out a character string from the image data. And a character recognition unit that recognizes the characters of this character string for each character by the number of characters in the character string and outputs respective character recognition result candidates corresponding to the recognized characters, and a character recognition result corresponding to the character string. Each word in the word dictionary in which candidates and a plurality of words to be entered in the form are stored in advance is checked for the degree of matching, and words with a high degree of matching are output as word recognition results, which are character string recognition results. And a word collating unit that performs Then, the word matching unit outputs a word having a high degree of matching as a word recognition result of a character string. When a word dictionary including similar words is used, a plurality of words having a high degree of matching are obtained, and as a result of the word recognition. In many cases, the similar words in the word dictionary are output. At this time, the operator refers to the image data read from the form and selects a word corresponding to the image data to obtain a word recognition result.

【０００４】[0004]

【発明が解決しようとする課題】上述した従来の文字認
識装置は、単語照合部により、一致度の高い単語を単語
認識結果として出力するため、類似単語を含む単語辞書
を使用した場合、この一致度の高い単語が複数得られこ
の類似した複数の単語を単語認識結果として出力し、オ
ペレータが、帳票から読み取った画像データを参照しこ
の画像データに対応する単語を単語照合部が出力した複
数の単語から選択して単語認識結果とするようにしてい
るので、単語辞書内に類似単語が多く含まれる場合、オ
ペレータの負担増となるという問題がある。In the above-mentioned conventional character recognition apparatus, a word matching section outputs a word having a high degree of coincidence as a word recognition result. A plurality of words with high degrees are obtained and the similar words are output as word recognition results, and the operator refers to the image data read from the form and outputs a word corresponding to the image data by the word matching unit. Since a word is selected from the words and the word recognition result is used, there is a problem that the load on the operator increases when the word dictionary contains many similar words.

【０００５】本発明の目的はこのような従来の欠点を除
去するため、単語辞書内に類似単語が多く含まれる場合
でもオペレータの負担増とならない文字認識装置を提供
することにある。An object of the present invention is to provide a character recognition apparatus which does not increase the burden on an operator even when a word dictionary contains many similar words in order to eliminate such conventional disadvantages.

【０００６】[0006]

【課題を解決するための手段】本発明の第１の文字認識
装置は、文字列を読み取って認識したときにこの文字列
の認識結果候補が複数ありこれら複数の認識結果候補の
中に類似した前記認識結果候補が存在するときに、この
類似した認識結果候補の中から選択して前記文字列の認
識結果とするようにしている。According to a first character recognition device of the present invention, when a character string is read and recognized, there are a plurality of recognition result candidates for the character string, and a similar one among the plurality of recognition result candidates. When the recognition result candidate is present, the recognition result candidate is selected from the similar recognition result candidates and used as the character string recognition result.

【０００７】本発明の第２の文字認識装置は、単語を示
す文字列を読み取って認識したときにこの文字列の認識
結果候補が複数ありこれら複数の認識結果候補の中に類
似した前記認識結果候補が存在するときに、この類似し
た認識結果候補の中から選択して前記単語の認識結果と
するようにしている。According to a second character recognition device of the present invention, when a character string indicating a word is read and recognized, there are a plurality of recognition result candidates for the character string, and the recognition result similar to the plurality of recognition result candidates. When there is a candidate, the candidate is selected from the similar recognition result candidates and used as the recognition result of the word.

【０００８】本発明の第３の文字認識装置は、帳票上に
記載された単語を示す文字列を読み取って認識したとき
にこの文字列の認識結果候補が複数ありこれら複数の認
識結果候補の中に類似した前記認識結果候補が存在する
ときに、この類似した認識結果候補の中から選択して前
記単語の認識結果とするようにしたことを特徴とする文
字認識装置。According to the third character recognition device of the present invention, when a character string indicating a word described on a form is read and recognized, there are a plurality of recognition result candidates for the character string, and among the plurality of recognition result candidates, A recognition result candidate that is similar to the above-mentioned word, and selects from the similar recognition result candidates to obtain the word recognition result.

【０００９】また、本発明の第１から第３の文字認識装
置の前記認識結果は、前記類似した認識結果候補どうし
間での相違した文字に対してどの文字が確かかを調べる
「確からしさ」の調査を行い、前記類似した認識結果候
補中の最も確からしい文字を有する前記認識結果候補を
選択して得るようにしている。The recognition results of the first to third character recognition apparatuses of the present invention are used to determine which character is certain for a different character between the similar recognition result candidates. Is performed, and the recognition result candidate having the most likely character among the similar recognition result candidates is selected and obtained.

【００１０】本発明の第４の文字認識装置は、帳票を読
み取り画像データを出力する画像入力部と、前記画像デ
ータから文字列を切り出す文字切り出し部と、前記文字
列の文字を一文字毎に前記文字列内の文字数分認識し認
識したそれぞれの文字に対応するそれぞれの文字認識結
果候補を出力する文字認識部と、前記文字認識部が出力
した前記文字認識結果候補と前記帳票に記入される複数
の単語を予め格納した単語辞書内の各単語との一致の度
合いを示す一致度をそれぞれ調べ前記一致度の高い前記
単語を前記文字列の認識結果候補である単語認識結果候
補として出力する単語照合部と、前記単語照合部が出力
した前記単語認識結果候補が複数ありこれら複数の単語
認識結果候補の中に類似した前記単語認識結果候補が存
在するときに、この類似した単語認識結果候補の中から
選択して前記文字列の認識結果である単語認識結果とす
る類似単語判定部とを備えて構成されている。According to a fourth aspect of the present invention, there is provided an image input unit for reading a form and outputting image data; a character extracting unit for extracting a character string from the image data; A character recognition unit that outputs each character recognition result candidate corresponding to each character recognized and recognized by the number of characters in the character string, and a plurality of the character recognition result candidates output by the character recognition unit and written in the form. Word matching for checking the degree of matching indicating the degree of matching with each word in a word dictionary in which the word is stored in advance, and outputting the word having a high degree of matching as a word recognition result candidate that is a recognition result candidate for the character string When there are a plurality of word recognition result candidates output by the word matching unit and there is a similar word recognition result candidate among the plurality of word recognition result candidates, It is constituted by a similar word determination unit for word recognition result as the recognition result of the character string selected from the similar word recognition result candidates.

【００１１】また、本発明の第４の文字認識装置の前記
単語照合部は、前記文字列の有する文字数と同じ文字数
の前記単語辞書内の前記単語を使用し、前記文字列内の
文字に対応する前記文字認識結果候補と前記文字列内の
前記文字の位置と同じ位置の前記単語内の文字とが同一
か否かを前記文字列内のすべての文字に対して調べ、前
記同一の数を前記一致度とするようにしている。Further, the word collating unit of the fourth character recognition device of the present invention uses the words in the word dictionary having the same number of characters as the number of characters in the character string, and corresponds to the characters in the character string. Whether the character recognition result candidate and the character in the word at the same position as the position of the character in the character string are the same or not for all the characters in the character string, and determine the same number. The degree of coincidence is set.

【００１２】さらに、本発明の第４の文字認識装置の前
記類似単語判定部は、前記類似した単語認識結果候補ど
うし間での相違した文字に対してどの文字が確かかを調
べる「確からしさ」の調査を行い、前記類似した単語認
識結果候補中の最も確からしい文字を有する前記単語認
識結果候補を前記単語認識結果とするようにしている。Further, in the fourth character recognition device of the present invention, the similar word determination unit checks which character is certain for a different character between the similar word recognition result candidates. And the word recognition result candidate having the most probable character among the similar word recognition result candidates is set as the word recognition result.

【００１３】また、本発明の第１から第４の文字認識装
置は、前記文字に対する前記「確からしさ」を、この文
字の類似文字の距離値の平均の１／２の値を前記文字の
距離値に加算して算出するようにしている。Further, the first to fourth character recognition devices of the present invention may be arranged such that the “probability” of the character is determined by calculating a half value of an average of distance values of similar characters of the character. The value is calculated by adding it to the value.

【００１４】[0014]

【発明の実施の形態】次に、本発明の実施の形態につい
て図面を参照して説明する。Next, embodiments of the present invention will be described with reference to the drawings.

【００１５】図１は、本発明の文字認識装置の一つの実
施の形態を示すブロック図である。FIG. 1 is a block diagram showing one embodiment of the character recognition device of the present invention.

【００１６】図１に示す本実施の形態は、帳票等を読み
取り画像データを出力する画像入力部１と、この画像デ
ータから文字列１３を切り出す文字切り出し部２と、こ
の文字列１３の文字を一文字毎に文字列１３内の文字数
分認識し認識したそれぞれの文字に対応するそれぞれの
文字認識結果候補９を出力する（距離値の小さいものを
文字認識結果候補９とし通常複数出力する。ここで距離
値とは、認識した文字と文字認識結果候補９との違いの
度合いを示す。）文字認識部３と、文字認識部３が出力
した文字認識結果候補９と帳票等に記入される複数の単
語を予め格納した単語辞書６内の各単語との一致の度合
いを示す一致度１２をそれぞれ調べ一致度１２の高い単
語を文字列１３の認識結果候補である単語認識結果候補
１０として出力する単語照合部４と、単語照合部４が出
力した単語認識結果候補１０が複数ありこれら複数の単
語認識結果候補１０の中に類似した単語認識結果候補１
０が存在するときに、この類似した単語認識結果候補１
０の中から選択して文字列１３の認識結果である単語認
識結果１１とする類似単語判定部５とにより構成されて
いる。In this embodiment shown in FIG. 1, an image input unit 1 for reading a form or the like and outputting image data, a character extracting unit 2 for extracting a character string 13 from the image data, and a character of the character string 13 Each character recognition result candidate 9 corresponding to each character recognized and recognized by the number of characters in the character string 13 is output for each character (a plurality of candidates having a small distance value are usually output as the character recognition result candidates 9. Here, a plurality of candidates are output. The distance value indicates the degree of difference between the recognized character and the character recognition result candidate 9.) The character recognition unit 3, the character recognition result candidate 9 output by the character recognition unit 3, and a plurality of entries written in a form or the like. The degree of coincidence 12 indicating the degree of coincidence with each word in the word dictionary 6 in which the word is stored in advance is checked, and a word having a high degree of coincidence 12 is output as a word recognition result candidate 10 which is a recognition result candidate of the character string 13. A word collating unit 4, there are a plurality word recognition result candidates 10 the word collating unit 4 is output similar word recognition result among the plurality of word recognition result candidates 10 candidates 1
0, the similar word recognition result candidate 1
And a similar word determination unit 5 which selects a word from 0 and makes a word recognition result 11 which is a recognition result of the character string 13.

【００１７】単語照合部４は、文字列１３の有する文字
数と同じ文字数の単語辞書６内の単語を使用し、文字列
１３内の文字に対応する文字認識結果候補９と文字列１
３内のこの文字の位置と同じ位置の単語内の文字とが同
一か否かを文字列１３内のすべての文字に対して調べ、
この同一の数を一致度１２とするようにしている。The word collating unit 4 uses words in the word dictionary 6 having the same number of characters as the number of characters in the character string 13, and a character recognition result candidate 9 corresponding to the character in the character string 13 and a character string 1.
3 is checked for all the characters in the character string 13 to determine whether the position of this character is the same as the character in the word at the same position.
The same number is set as the coincidence degree 12.

【００１８】類似単語判定部５は、類似した単語認識結
果候補１０どうし間での相違した文字に対してどの文字
が確かかを調べる「確からしさ」の調査を行い、この類
似した単語認識結果候補１０中の最も確からしい文字を
有する単語認識結果候補１０を単語認識結果１１とする
ようにしている。The similar word determination unit 5 conducts a "probability check" to determine which character is certain with respect to a different character between the similar word recognition result candidates 10, and determines the similar word recognition result candidate. The word recognition result candidate 10 having the most probable character in 10 is set as the word recognition result 11.

【００１９】なお、図１には、類似単語判定部５が使用
する類似単語テーブル７と類似文字テーブル８とを併せ
て示している。FIG. 1 also shows a similar word table 7 and a similar character table 8 used by the similar word determination section 5.

【００２０】次に、本実施の形態の文字認識装置の動作
を図２から図９を参照して詳細に説明する。Next, the operation of the character recognition device according to the present embodiment will be described in detail with reference to FIGS.

【００２１】図２は、類似単語判定部の動作の一例を示
すフローチャートである。FIG. 2 is a flowchart showing an example of the operation of the similar word determining unit.

【００２２】図３は、単語辞書の一例を示す図であり、
単語辞書６内に帳票等に記載される単語を単語Ｎｏ順に
予め格納していることを示している。FIG. 3 is a diagram showing an example of the word dictionary.
This indicates that words described in a form or the like are stored in the word dictionary 6 in advance in the order of word numbers.

【００２３】図４は、類似単語テーブルの一例を示す図
であり、単語辞書６の単語群の中で単語長が同じで一文
字のみ異なる単語（類似単語１４）について、類似とな
る単語（類似単語Ｎｏにて指定）毎に、この単語の文字
数，類似単語１４の数及びその差異となる文字（以下、
区別化文字１５）を示す（ａ）と、単語辞書６内の単語
に対応する単語Ｎｏ毎に、区別化文字１５の位置と類似
単語Ｎｏとを示す（ｂ）とにより構成し、予め作成して
おく。FIG. 4 is a diagram showing an example of the similar word table. In the word group of the word dictionary 6, words having the same word length and differing only by one character (similar word 14) are similar words (similar word 14). No.), the number of characters of this word, the number of similar words 14, and the characters that differ therefrom (hereinafter referred to as
(A) indicating the differentiating character 15), and (b) indicating the position of the differentiating character 15 and the similar word No. for each word No. corresponding to the word in the word dictionary 6, and prepared in advance. Keep it.

【００２４】図５は、類似文字テーブルの一例を示す図
であり、図４で示す区別化文字１５毎に、この文字に対
して文字認識部３での文字認識の際に類似度が高いとさ
れる文字について、この文字の数（類似文字数）とこの
文字（類似文字）とを格納しており、文字認識部３の特
性を予め実験してこの結果により作成しておく。FIG. 5 is a diagram showing an example of the similar character table. For each of the differentiating characters 15 shown in FIG. The number of characters (the number of similar characters) and the number of similar characters (similar characters) are stored for the characters to be recognized.

【００２５】図６は、文字認識結果候補の一例を示す図
であり、文字認識部３で帳票に記載された文字（すなわ
ち、切り出し部で切り出された文字）を認識したとき
に、この記載文字に対して類似度の高い順に示してい
る。FIG. 6 is a diagram showing an example of a character recognition result candidate. When the character recognizing unit 3 recognizes a character written on a document (that is, a character cut out by the cutout unit), the character Are shown in descending order of similarity.

【００２６】図７は、単語辞書内の単語の一致度の一例
を示す図である。FIG. 7 is a diagram showing an example of the degree of coincidence of words in the word dictionary.

【００２７】図８は、再認識結果の一例を示す図であ
る。FIG. 8 is a diagram showing an example of the re-recognition result.

【００２８】図９は、「確からしさ」の計算過程と結果
との一例を示す図である。FIG. 9 is a diagram showing an example of a calculation process of “probability” and a result.

【００２９】図１において、画像入力部１により帳票等
を読み取り画像データを出力し、文字切り出し部２によ
り、画像入力部１が出力した画像データから例えば単語
を示す文字列１３を切り出す。文字認識部３により、こ
の文字列１３の文字を一文字毎に文字列１３内の文字数
分認識し認識したそれぞれの文字に対応するそれぞれの
文字認識結果候補９（一個又は複数個）と、これらの文
字認識結果候補９のそれぞれの距離値を出力する。単語
照合部４は、文字列１３に対応する文字認識部３が出力
した文字認識結果候補９と単語辞書６内の各単語との一
致の度合いを示す一致度１２をそれぞれ調べ一致度１２
の高い単語（一個又は複数個）を文字列１３の認識結果
候補である単語認識結果候補１０として出力する。この
とき、文字列１３の有する文字数と同じ文字数の単語辞
書６内の単語を使用し、文字列１３内の文字に対応する
文字認識結果候補９と文字列１３内のこの文字の位置と
同じ位置の単語内の文字とが同一か否かを文字列１３内
のすべての文字に対して調べ、この同一の数を一致度１
２とする。類似単語判定部５は、単語照合部４が出力し
た単語認識結果候補１０を受け、この単語認識結果候補
１０が一つのときにはこの単語認識結果候補１０を文字
列１３の認識結果である単語認識結果１１とする。ま
た、単語照合部４より受けた単語認識結果候補１０が複
数あった場合、これら複数の単語認識結果候補１０の中
に類似した単語認識結果候補１０が存在しないときに、
単語照合部４より受けた複数の単語認識結果候補１０を
単語認識結果１１とし、単語照合部４より受けた複数の
単語認識結果候補１０の中に類似した単語認識結果候補
１０が存在するときに、この類似した単語認識結果候補
１０の中から選択して文字列１３の認識結果である単語
認識結果１１とする。すなわち、この類似した単語認識
結果候補１０どうし間での相違した文字に対してどの文
字が確かかを調べる「確からしさ」の調査を行い、この
類似した単語認識結果候補１０中の最も確からしい文字
を有する単語認識結果候補１０を単語認識結果１１とす
る。In FIG. 1, a form or the like is read by an image input unit 1 to output image data, and a character cutout unit 2 cuts out, for example, a character string 13 indicating a word from the image data output by the image input unit 1. The character recognition unit 3 recognizes the characters of the character string 13 for each character by the number of characters in the character string 13, and each character recognition result candidate 9 (one or more) corresponding to each recognized character, and The distance value of each of the character recognition result candidates 9 is output. The word matching unit 4 checks the degree of coincidence 12 indicating the degree of coincidence between the character recognition result candidate 9 output by the character recognition unit 3 corresponding to the character string 13 and each word in the word dictionary 6, and checks the degree of coincidence 12
Is output as a word recognition result candidate 10 which is a recognition result candidate of the character string 13. At this time, a word in the word dictionary 6 having the same number of characters as the number of characters in the character string 13 is used, and the character recognition result candidate 9 corresponding to the character in the character string 13 and the same position as the position of this character in the character string 13 Is checked for all the characters in the character string 13 to determine whether or not the characters in the word are the same.
Let it be 2. The similar word determining unit 5 receives the word recognition result candidate 10 output by the word matching unit 4, and when the word recognition result candidate 10 is one, the word recognition result candidate 10 is the word recognition result which is the recognition result of the character string 13. It is assumed to be 11. Further, when there are a plurality of word recognition result candidates 10 received from the word matching unit 4, when there is no similar word recognition result candidate 10 among the plurality of word recognition result candidates 10,
When the plurality of word recognition result candidates 10 received from the word matching unit 4 are referred to as word recognition results 11 and the similar word recognition result candidate 10 exists in the plurality of word recognition result candidates 10 received from the word matching unit 4. A word recognition result 11 which is a recognition result of the character string 13 is selected from the similar word recognition result candidates 10. That is, a “probability” check is performed to determine which character is certain for a different character between the similar word recognition result candidates 10, and the most likely character in the similar word recognition result candidate 10 is determined. Is set as the word recognition result 11.

【００３０】ここで、図２を用いて、類似単語判定部５
の動作を更に詳細に説明する。まず、単語照合部４より
単語認識結果候補１０を受け（Ｓ１）、単語認識結果候
補１０中に単語認識結果候補１０どうし間で類似単語１
４が存在するか否かを図４の類似単語テーブル７を参照
して調べる（すなわち、図４に示す類似単語テーブル７
の（ｂ）を参照し、単語認識結果候補１０の単語辞書６
内の単語Ｎｏに対応する類似単語ＮｏをステップＳ１で
受けたすべての単語認識結果候補１０について調べ、例
えば、少なくとも２つ以上の単語認識結果候補１０から
同じ類似単語Ｎｏが得られたときにこれらの単語認識結
果候補１０がこの類似単語Ｎｏによる「類似単語１４で
ある」と判定する。）（Ｓ２）。類似単語１４が存在し
ない場合はステップＳ８へと続ける。類似単語１４が存
在する場合は、この類似単語１４に対応する、図４
（ａ）に示す区別化文字１５とこの区別化文字１５の類
似文字（図５で示す）とに認識するときの照合対象を限
定して、文字列１３中の区別化文字位置の文字を文字認
識部３により再度認識し、文字認識の結果として、図８
に示すように文字認識結果候補９（前述の区別化文字１
５とこの区別化文字１５の類似文字）の文字コードとこ
の文字認識結果候補９の距離値とを得る（Ｓ３）。この
結果より、それぞれの区別化文字１５に対しこれらの文
字の「確からしさ」を「式１」（「式１」：Ｒｃ＝γｃ
＋Σγｘ／（Ｎｃ＋２）、ここでｘ＝１〜Ｎｃ、ただ
し、Ｒｃは区別化文字１５ｃの「確からしさ」，γｃは
区別化文字１５ｃの距離値，γ１〜γＮｃは区別化文字
１５ｃの類似文字の距離値及びＮｃは区別化文字１５ｃ
の類似文字の数である。）により計算する。このとき、
区別化文字１５だけでなく、この区別化文字１５に対応
する類似文字についても、この「確からしさ」の計算に
反映させる。すなわち、「式１」では、文字ｃに対し、
この文字ｃの距離値にこの文字ｃの類似文字の距離値の
平均の１／２の値を加算して文字ｃの「確からしさ」を
算出する。この「確からしさ」も距離値同様数値が小さ
いほど「確からしさ」の度合いが高くなる（Ｓ４）。計
算の結果、これらの区別化文字１５のうちのある区別化
文字１５の「確からしさ」が、例えば、予め定めた値
（予め実験等で求めてきめる）未満でかつ他の区別化文
字１５の「確からしさ」より予め定めた値（予め実験等
で求めてきめる）以上小のときに、この区別化文字１５
を判定結果とし、前記条件を満たさない場合は判定結果
なしとする（Ｓ５）。判定結果の有無をチェックし（Ｓ
６）、判定結果ありの場合はこの判定結果の区分化文字
を含むステップＳ２で調べた類似単語１４中の単語認識
結果候補１０を単語認識結果１１として出力し（Ｓ
７）、判定結果なしの場合は単語照合部４より受けた単
語認識結果候補１０を出力する（Ｓ８）。Here, referring to FIG.
Will be described in more detail. First, the word recognition result candidate 10 is received from the word collating unit 4 (S1), and the similar word 1 between the word recognition result candidates 10 is included in the word recognition result candidate 10.
4 is checked with reference to the similar word table 7 of FIG. 4 (that is, the similar word table 7 shown in FIG. 4).
(B), the word dictionary 6 of the word recognition result candidate 10 is referred to.
The similar word Nos. Corresponding to the word Nos. Are examined for all the word recognition result candidates 10 received in step S1. For example, when the same similar word No. is obtained from at least two or more word recognition result candidates 10, Is determined to be “similar word 14” based on the similar word No. ) (S2). If there is no similar word 14, the process continues to step S8. When the similar word 14 exists, the similar word 14 corresponding to FIG.
The recognition target when recognizing the differentiating character 15 shown in (a) and the similar character (shown in FIG. 5) of this differentiating character 15 is limited, and the character at the differentiating character position in the character string 13 is changed to a character. The character is recognized again by the recognition unit 3, and as a result of character recognition, FIG.
As shown in the figure, the character recognition result candidate 9 (the above-described differentiated character 1
5 and a character value of the character recognition result candidate 9 are obtained (S3). From this result, for each distinctive character 15, the “probability” of these characters is expressed by “Expression 1” (“Expression 1”: Rc = γc
+ Σγx / (Nc + 2), where x = 1 to Nc, where Rc is “probability” of the distinguishing character 15c, γc is the distance value of the distinguishing character 15c, and γ1 to γNc are similar characters of the distinguishing character 15c. The distance value and Nc are differentiating characters 15c.
Is the number of similar characters. ). At this time,
Not only the differentiating character 15 but also the similar character corresponding to the differentiating character 15 is reflected in the calculation of the “probability”. That is, in “Equation 1”, for the character c,
The value of the average of the distance values of similar characters of the character c is added to the distance value of the character c to calculate the “certainty” of the character c. The degree of “probability” increases as the numerical value of “probability” decreases, similarly to the distance value (S4). As a result of the calculation, the “probability” of one of the differentiating characters 15 is, for example, less than a predetermined value (which can be obtained in advance by an experiment or the like) and the other of the differentiating characters 15 When the value is smaller than a predetermined value (determined in advance by experiments or the like) from “probability”, the distinguishing character 15
Is determined as a determination result, and when the above condition is not satisfied, there is no determination result (S5). Check for the judgment result (S
6) If there is a judgment result, the word recognition result candidate 10 in the similar word 14 checked in step S2 including the segmented character of the judgment result is output as the word recognition result 11 (S
7) If there is no determination result, the word recognition result candidate 10 received from the word matching unit 4 is output (S8).

【００３１】さらに、例をあげて、図２にしたがって類
似単語判定部５の動作を説明する。帳票等に記載され文
字切り出し部２により切り出された文字列１３を「府中
西小学校」とし、文字認識部３が出力した文字認識結果
候補９が図６に示すとおりとする。そして、単語照合部
４によりこの図６で示す文字認識結果候補９を参照して
単語照合を行うと、「府中南小学校」、「府中北小学
校」、「府中西小学校」の３つの単語の一致度１２がも
っとも高く単語認識結果候補１０となる。すなわち、図
３で示す単語辞書６内の各単語と図６で示す文字認識結
果候補９とを比較し、単語Ｎｏが１の「青葉台小学校」
のまず一文字目「青」が、文字認識結果候補９の一文字
目「府」，「底」，「庶」，「麻」及び「庇」と一致す
るか否かを調べると、結果は一致してない。二文字目も
同様に「葉」が、「中」，「申」，「甲」，「巾」及び
「や」と一致するか否かを調べると、結果は一致してな
い。三文字目以降も同様に調べると、この単語Ｎｏが１
の「青葉台小学校」の場合、四，五及び六文字目の
「小」，「学」及び「校」に対応する文字認識結果候補
９内に図６に示すように単語辞書６と同一の漢字
「小」，「学」及び「校」が存在するため、この「青葉
台小学校」の単語内で一致した文字数は「３」すなわち
一致度１２が３となる。同様に全ての単語辞書６内の単
語と図６で示す文字認識結果候補９との照合を行うと、
結果は図７に示すようになるため、一致度１２がもっと
も高い「府中南小学校」「府中北小学校」「府中西小学
校」の３単語を単語認識結果候補１０として出力する。
しかしながら、この時点では３つの単語の内、いずれが
最も確からしいかの判断ができない。そして、図２に示
すフローにしたがって類似単語判定部５により、単語照
合部４の出力した単語認識結果候補１０「府中南小学
校」，「府中北小学校」及び「府中西小学校」の３単語
を受け（Ｓ１）、この３単語中に類似単語１４があるか
否かを調べる。すなわち、図４に示す類似単語テーブル
７の（ｂ）を参照すると、単語認識結果候補１０「府中
南小学校」の単語Ｎｏ４から類似単語Ｎｏ１とＮｏ２と
が、また、単語認識結果候補１０「府中北小学校」の単
語Ｎｏ５及び「府中西小学校」の単語Ｎｏ６からは類似
単語Ｎｏ２がそれぞれ得られるため、例えば、少なくと
も２つ以上の単語認識結果候補１０から同じ類似単語Ｎ
ｏが得られたときにこれらの単語認識結果候補１０がこ
の類似単語Ｎｏによる「類似単語１４である」と判定す
ると、類似単語Ｎｏ２による類似単語１４があるとの判
定となる（Ｓ２）。次に、字種限定再認識を行う、つま
り、ステップＳ２で判定した類似単語Ｎｏ２を使用し
て、図４の（ａ）に示す類似単語テーブル７のこの類似
単語Ｎｏ２に対応する区別化文字１５である「南」，
「北」及び「西」とその類似文字（図５に示す）
「商」，「甫」，「雨」，「比」，「此」，「酉」，
「両」及び「面」とに認識するときの照合対象を限定し
て、文字列１３中の区分か文字位置（この場合３）の文
字（この場合「西」）を文字認識部３により再認識し、
この結果として図８に示す再認識結果を得る（この再認
識においては文字認識結果候補９の距離値も併せて取得
する。）（Ｓ３）。次に、区別文字確からしさ計算で
は、この再認識結果より、前述した「式１」を用いて、
それぞれの「確からしさ」を算出する。算出経過及び結
果は図９のとおりとなる（Ｓ４）。判定の条件が、「確
からしさ」が例えば１００未満でかつ他のどの値よりも
例えば１０以上小さい場合に、この区別化文字１５を判
定結果とすると、区別化文字１５「西」の「確からし
さ」が「６９」であり条件を満たすため、区別化文字１
５「西」を含む単語認識結果候補１０「府中西小学校」
を単語認識結果１１として出力する（Ｓ５，Ｓ６，Ｓ
７）。Further, the operation of the similar word determination unit 5 will be described with reference to FIG. The character string 13 described in the form or the like and cut out by the character cutout unit 2 is assumed to be “Fuchu Nishi Elementary School”, and the character recognition result candidates 9 output by the character recognition unit 3 are as shown in FIG. When word matching is performed by the word matching unit 4 with reference to the character recognition result candidate 9 shown in FIG. 6, the three words "Fuchu Minami Elementary School", "Fuchu Kita Elementary School", and "Fuchu Nishi Elementary School" match. The degree 12 is the highest and becomes the word recognition result candidate 10. That is, each word in the word dictionary 6 shown in FIG. 3 is compared with the character recognition result candidate 9 shown in FIG.
First, if it is checked whether the first character “blue” matches the first characters “fu”, “bottom”, “common”, “hemp” and “eave” of the character recognition result candidate 9, the results match. Not. Similarly, as to the second character, if it is determined whether or not “leaf” matches “medium”, “shin”, “instep”, “width”, and “ya”, the result does not match. When the same is performed on the third and subsequent characters, the word No. is 1
In the case of “Aobadai Elementary School”, in the character recognition result candidates 9 corresponding to the fourth, fifth and sixth characters “small”, “study” and “school”, as shown in FIG. Since “small”, “study” and “school” exist, the number of matching characters in the word “Aobadai Elementary School” is “3”, that is, the degree of matching 12 is 3. Similarly, when the words in all the word dictionaries 6 are collated with the character recognition result candidates 9 shown in FIG.
Since the result is as shown in FIG. 7, three words “Fuchu Minami Elementary School”, “Fuchu Kita Elementary School”, and “Fuchu Nishi Elementary School” having the highest matching degree 12 are output as word recognition result candidates 10.
However, at this point, it cannot be determined which of the three words is most likely. Then, the similar word determination unit 5 receives three words of the word recognition result candidates 10 “Fuchu Minami Elementary School”, “Fuchu Kita Elementary School”, and “Fuchu Nishi Elementary School” output by the word matching unit 4 according to the flow shown in FIG. (S1) It is checked whether or not there is a similar word 14 among these three words. That is, referring to (b) of the similar word table 7 shown in FIG. 4, similar words No. 1 and No. 2 from the word No. 4 of the word recognition result candidate 10 “Fuchu Minami Elementary School”, and the word recognition result candidate 10 “Fuchu North” Since the similar word No. 2 is obtained from the word No. 5 of “elementary school” and the word No. 6 of “Fuchu west elementary school”, for example, the same similar word N is obtained from at least two or more word recognition result candidates 10.
If these word recognition result candidates 10 are determined to be “similar words 14” by the similar word No when o is obtained, it is determined that there is a similar word 14 by the similar word No2 (S2). Next, character type limited re-recognition is performed, that is, using the similar word No2 determined in step S2, the differentiating character 15 corresponding to the similar word No2 in the similar word table 7 shown in FIG. "South",
"North" and "West" and their similar characters (shown in Fig. 5)
"Quote", "ho", "rain", "ratio", "this", "rooster",
The character to be collated when recognizing “both” and “face” is limited, and the character (in this case, “west”) in the character string 13 or the character at the character position (in this case, 3) is reproduced by the character recognition unit 3. Recognized,
As a result, a re-recognition result shown in FIG. 8 is obtained (in this re-recognition, the distance value of the character recognition result candidate 9 is also acquired) (S3). Next, in the calculation of the likelihood of distinguishing characters, from this re-recognition result,
Calculate the certainty of each. The calculation progress and the result are as shown in FIG. 9 (S4). If the determination condition is that “probability” is, for example, less than 100 and smaller than any other value by, for example, 10 or more, and if the distinguished character 15 is the determination result, the “probability” of the differentiated character 15 “west” "Is" 69 ", which satisfies the condition.
5 Word recognition result candidates including "West" 10 "Fuchu Nishi Elementary School"
Is output as the word recognition result 11 (S5, S6, S
7).

【００３２】[0032]

【発明の効果】以上説明したように、本発明の文字認識
装置によれば、文字列（例えば単語）を読み取って認識
したときにこの文字列の認識結果候補が複数あった場
合、これら複数の認識結果候補の中に類似した認識結果
候補が存在するときに、この類似した認識結果候補の中
から選択してこの文字列の認識結果とする（すなわち、
類似した認識結果候補の中から更に絞り込んでこの文字
列の認識結果とする）ようにしたため、単語辞書内に類
似単語が多く含まれる場合でも、類似した認識結果候補
の中から選択してこの文字列の認識結果とするようにし
たので、複数の認識結果が出力される割合が減るため、
複数の認識結果が出力されたときにこれらから選択する
というオペレータの負担が軽減する。As described above, according to the character recognition apparatus of the present invention, when a character string (for example, a word) is read and recognized, if there are a plurality of recognition result candidates for the character string, the character When there is a similar recognition result candidate among the recognition result candidates, a selection is made from the similar recognition result candidates to be a recognition result of this character string (that is,
Even if a similar word is included in the word dictionary, this character string is selected and selected from similar recognition result candidates. As the result of column recognition is used, the rate of outputting multiple recognition results is reduced,
When a plurality of recognition results are output, the burden on the operator to select from these is reduced.

[Brief description of the drawings]

【図１】本発明の文字認識装置の一つの実施の形態を示
すブロック図である。FIG. 1 is a block diagram showing one embodiment of a character recognition device of the present invention.

【図２】類似単語判定部の動作の一例を示すフローチャ
ートである。FIG. 2 is a flowchart illustrating an example of an operation of a similar word determination unit.

【図３】単語辞書の一例を示す図である。FIG. 3 is a diagram illustrating an example of a word dictionary.

【図４】類似単語テーブルの一例を示す図である。FIG. 4 is a diagram showing an example of a similar word table.

【図５】類似文字テーブルの一例を示す図である。FIG. 5 is a diagram illustrating an example of a similar character table.

【図６】文字認識結果候補の一例を示す図である。FIG. 6 is a diagram illustrating an example of a character recognition result candidate.

【図７】単語辞書内の単語の一致度の一例を示す図であ
る。FIG. 7 is a diagram showing an example of the degree of coincidence of words in a word dictionary.

【図８】再認識結果の一例を示す図である。FIG. 8 is a diagram illustrating an example of a re-recognition result.

【図９】「確からしさ」の計算過程と結果との一例を示
す図である。FIG. 9 is a diagram illustrating an example of a calculation process of “probability” and a result.

【図１０】従来の文字認識装置のブロック図である。FIG. 10 is a block diagram of a conventional character recognition device.

[Explanation of symbols]

１画像入力部２文字切り出し部３文字認識部４単語照合部５類似単語判定部６単語辞書７類似単語テーブル８類似文字テーブル９文字認識結果候補１０単語認識結果候補１１単語認識結果１２一致度１３文字列１４類似単語１５区別化文字 DESCRIPTION OF SYMBOLS 1 Image input part 2 Character extraction part 3 Character recognition part 4 Word collation part 5 Similar word judgment part 6 Word dictionary 7 Similar word table 8 Similar character table 9 Character recognition result candidate 10 Word recognition result candidate 11 Word recognition result 12 Matching degree 13 Character string 14 Similar words 15 Differentiating characters

Claims

[Claims]

1. When a character string is read and recognized, there are a plurality of recognition result candidates of the character string, and when there are similar recognition result candidates among the plurality of recognition result candidates, the similar recognition result is obtained. A character recognizing device, wherein a character string is selected from the candidates to be used as the character string recognition result.

2. When a character string indicating a word is read and recognized, there are a plurality of recognition result candidates for the character string, and when there is a similar recognition result candidate among the plurality of recognition result candidates, the similarity is determined. A character recognition apparatus, wherein the selected character is selected from among the recognized recognition result candidates to obtain the word recognition result.

3. When a character string indicating a word written on a form is read and recognized, there are a plurality of recognition result candidates for the character string, and a similar recognition result candidate exists among the plurality of recognition result candidates. A character recognition device that selects from the similar recognition result candidates to make the recognition result of the word.

4. The recognition result is obtained by performing a “probability” check to determine which character is certain with respect to a different character between the similar recognition result candidates. 4. The character recognition device according to claim 1, wherein the recognition result candidate having the most probable character is selected and obtained.

5. An image input unit for reading a form and outputting image data, a character cutout unit for cutting out a character string from the image data, and recognizing and recognizing the characters of the character string for each character by the number of characters in the character string. A character recognition unit that outputs each character recognition result candidate corresponding to each of the characters, and a word dictionary in which the character recognition result candidates output by the character recognition unit and a plurality of words to be written in the form are stored in advance. A word matching unit that checks a degree of matching indicating a degree of matching with each word of the word, and outputs the word having a high degree of matching as a word recognition result candidate that is a recognition result candidate of the character string; When there are a plurality of candidate word recognition results, and the plurality of candidate word recognition results include the similar candidate word recognition result, the similar candidate word recognition result is determined. Character recognition apparatus characterized by comprising a, a similar word determining unit that the word recognition result is a recognition result of the character string by selecting from within.

6. The word matching unit uses the words in the word dictionary having the same number of characters as the number of characters in the character string, and selects the character recognition result candidate corresponding to the character in the character string and the character string. Whether the character in the word at the same position as the character in the word is the same or not for all the characters in the character string, and set the same number as the matching degree. The character recognition device according to claim 5, wherein

7. The similar word recognition unit performs a search of “probability” for checking which character is certain for a different character between the similar word recognition result candidates, and performs the similar word recognition. 7. The character recognition device according to claim 5, wherein the word recognition result candidate having the most probable character in the result candidates is set as the word recognition result.

8. The “probability” for the character
The character recognition device according to claim 4, wherein the value is calculated by adding a half value of an average of distance values of similar characters to the character to the distance value of the character.