JP3255816B2

JP3255816B2 - Character recognition device

Info

Publication number: JP3255816B2
Application number: JP02708095A
Authority: JP
Inventors: 寿男丹羽; 浩司山本; 英嗣前川; 一弘萱嶋
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1995-02-15
Filing date: 1995-02-15
Publication date: 2002-02-12
Anticipated expiration: 2017-02-12
Also published as: JPH08221521A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、帳票などに記載されて
いる文字を読み取って認識するための文字認識装置に関
するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognizing device for reading and recognizing characters described in a form or the like.

【０００２】[0002]

【従来の技術】従来から、文字認識処理によって得られ
た結果に対して、知識処理を導入して認識精度の向上が
図られている。この知識処理は、１文字ごとの認識結果
に対して、知識辞書との照合を行うことにより、認識結
果を最も確からしい文字に修正する方式である。認識対
象の内容により、知識辞書としては、単語辞書、地名辞
書、人名辞書、品番辞書などが用いられる。2. Description of the Related Art Hitherto, knowledge processing has been introduced to the results obtained by character recognition processing to improve recognition accuracy. This knowledge processing is a method of correcting the recognition result to the most likely character by comparing the recognition result for each character with a knowledge dictionary. Depending on the content of the recognition target, a word dictionary, a place name dictionary, a personal name dictionary, a part number dictionary, or the like is used as the knowledge dictionary.

【０００３】図６は、従来の文字認識装置の構成を示す
ブロック図である。図６において、文字認識部52は、帳
票画像51を読み込み、１文字に付きｎ個の候補文字と認
識類似度を出力する。認識類似度は値が大きい程その候
補文字が確からしいことを表している。文字候補評価値
演算部53は、文字認識部52で出力された文字類似度をも
とに候補文字それぞれについて文字候補評価値を求め
る。例えば、それぞれの認識類似度を第２候補の認識類
似度で割って正規化したものを文字候補評価値とする。
文字列検索部54は、知識辞書55を用いて、候補文字列集
合の中から知識辞書に含まれる文字列を構成する文字の
組み合せを求め、文字列候補評価値演算部56で、知識辞
書55との一致文字数や文字候補評価値などに基づいて文
字列候補評価値を求める。文字列候補選択部57で、文字
列候補評価値が最も高い文字列候補を選択し、この文字
列候補を認識結果として修正文字列58を出力する。FIG. 6 is a block diagram showing a configuration of a conventional character recognition device. In FIG. 6, a character recognizing unit 52 reads a form image 51 and outputs n candidate characters per character and recognition similarity. The larger the value of the recognition similarity, the more likely that the candidate character is. The character candidate evaluation value calculation unit 53 obtains a character candidate evaluation value for each candidate character based on the character similarity output from the character recognition unit 52. For example, a value obtained by dividing each recognition similarity by the recognition similarity of the second candidate and normalizing the result is used as a character candidate evaluation value.
The character string search unit 54 uses the knowledge dictionary 55 to obtain a combination of characters constituting a character string included in the knowledge dictionary from the candidate character string set, and the character string candidate evaluation value calculation unit 56 The character string candidate evaluation value is obtained based on the number of characters that match the character string and the character candidate evaluation value. The character string candidate selecting unit 57 selects a character string candidate having the highest character string candidate evaluation value, and outputs a corrected character string 58 as a recognition result of the character string candidate.

【０００４】以上のようにして、文字列を認識すること
により、文字認識部が誤った認識をした文字を修正する
ことができ、認識の向上を図ることができる。[0004] By recognizing a character string as described above, a character that is incorrectly recognized by the character recognition unit can be corrected, and recognition can be improved.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、上記従
来の構成では、認識対象の画像が傾いていたり、ノイズ
がのっていた場合、文字認識部から出力される候補文字
に対する類似度が大きく変動し、それに伴い文字列候補
評価値も変動することから、知識処理において、誤った
候補を選択してしまうことがある。However, in the above-described conventional configuration, when the image to be recognized is inclined or has noise, the similarity to the candidate character output from the character recognition unit greatly varies. Since the character string candidate evaluation value changes accordingly, an incorrect candidate may be selected in the knowledge processing.

【０００６】また、帳票などの記入において、記入者が
誤って区切り文字（住所においては「大字」「字」、品
番においては「−」「／」など）を挿入したり、省略し
たりすることがある。この場合に、文字位置がずれるの
で、文字列検索部では知識辞書との照合ができずに正し
い文字列候補を検索することができない。In addition, in the entry of a form or the like, a person who inserts or omits a delimiter character (such as "large" or "letter" in an address and "-" or "/" in a product number) by mistake. There is. In this case, since the character position is shifted, the character string search unit cannot perform matching with the knowledge dictionary and cannot search for a correct character string candidate.

【０００７】本発明はこのような従来の課題を解決する
もので、確からしい文字をもとに評価値を求め、記入者
の誤りを推論することにより、文字認識率を高くするこ
とを目的としている。SUMMARY OF THE INVENTION The present invention solves such a conventional problem, and aims at obtaining a character recognition rate by obtaining an evaluation value based on a likely character and inferring an error of a writer. I have.

【０００８】[0008]

【課題を解決するための手段】本発明は上記目的を達成
するために、類似度最大差検出部において各候補文字の
文字認識部での類似度の分布を調べ、類似度の分布に基
づいて文字候補評価値を求める。さらに、記入者の記入
誤りを推論し、誤り文字を削除および挿入して正しい文
字列を求めるものである。In order to achieve the above object, the present invention examines the distribution of similarity in a character recognition unit for each candidate character in a maximum similarity difference detection unit, and based on the distribution of similarity. Find the character candidate evaluation value. Further, it infers a typographical error of the writer and deletes and inserts erroneous characters to obtain a correct character string.

【０００９】[0009]

【作用】本発明は上記した構成により、認識対象の画像
にノイズなどがのっていた場合でも、各候補文字の類似
度の傾向から文字候補評価値を求めるので、信頼度の低
い文字は評価値への寄与が下がり誤認識を減らすことが
できる。さらに、記入者の記入誤りを推論することによ
り、記入誤りがある場合においても正しい文字列を求め
ることができ、よって文字認識率が向上する。According to the present invention, the character candidate evaluation value is obtained from the tendency of the similarity of each candidate character even if noise is present in the image to be recognized. The contribution to the value is reduced, and false recognition can be reduced. Further, by inferring a typographical error of the writer, a correct character string can be obtained even when there is a typographical error, and the character recognition rate is improved.

【００１０】[0010]

【実施例】以下、本発明の第１の実施例について説明す
る。図１にこの実施例の文字認識装置の全体の構成を示
す。The first embodiment of the present invention will be described below. FIG. 1 shows the overall configuration of the character recognition device of this embodiment.

【００１１】図１において、まず、文字認識部２は、帳
票画像１より文字認識を行い、１文字につき第１候補文
字から第ｎ候補文字までのｎ個の候補文字を持つ候補文
字集合と各候補文字の認識類似度を出力する。類似度最
大差検出部３は、文字認識部２から出力された候補文字
で第ｍ候補文字の類似度と第ｍ＋１候補文字の類似度の
差が最大になる候補文字を検出する。文字候補評価値演
算部４は、類似度の差が最大になる候補文字と候補文字
の類似度から文字候補評価値を求める。文字列検索部５
は、各候補文字集合の組み合せの中から、知識辞書６を
検索することにより、文字列候補となる組み合せを選び
出す。文字列候補評価値演算部７は、文字列検索部５で
検索された文字列候補を知識辞書との一致度、文字候補
評価値を基準として、文字列候補評価値を計算する。文
字列候補選択部８は、文字列候補の中で文字列候補評価
値の最も大きい文字列候補を選択する。In FIG. 1, first, a character recognizing unit 2 performs character recognition from a form image 1 and sets a candidate character set having n candidate characters from a first candidate character to an n-th candidate character per character. Output the recognition similarity of the candidate character. The maximum similarity difference detecting unit 3 detects a candidate character having the maximum difference between the similarity of the mth candidate character and the similarity of the (m + 1) th candidate character among the candidate characters output from the character recognizing unit 2. The character candidate evaluation value calculation unit 4 calculates a character candidate evaluation value from the similarity between the candidate character having the largest similarity difference and the candidate character. String search section 5
Selects a combination that becomes a character string candidate by searching the knowledge dictionary 6 from among combinations of each candidate character set. The character string candidate evaluation value calculation unit 7 calculates a character string candidate evaluation value based on the degree of coincidence of the character string candidate searched by the character string search unit 5 with the knowledge dictionary and the character candidate evaluation value. The character string candidate selection unit 8 selects a character string candidate having the largest character string candidate evaluation value among the character string candidates.

【００１２】上記の構成の文字認識装置において次のよ
うにして文字認識を行う。まず、帳票の画像を文字認識
部２で処理し、１文字につき第１候補文字から第ｎ候補
文字までのｎ個の候補文字を持つ候補文字集合と各候補
文字の認識類似度Ａ(i,j)（ｉは文字位置、ｊは第ｊ候
補）を出力する。認識類似度Ａ(i,j)は、値が大きい程
候補文字が正解である確率が高い。図２は、１文字につ
き第１候補文字から第５候補文字までの文字認識を行っ
た結果である。The character recognition device having the above-described structure performs character recognition as follows. First, an image of a form is processed by the character recognition unit 2, and a candidate character set having n candidate characters from a first candidate character to an nth candidate character per character and a recognition similarity A (i, j) (i is a character position, j is a j-th candidate) is output. The larger the value of the recognition similarity A (i, j) , the higher the probability that the candidate character is correct. FIG. 2 shows the result of performing character recognition from the first candidate character to the fifth candidate character for each character.

【００１３】類似度最大差検出部３で、各文字位置ｉご
とにＡ(i,k) − Ａ(i,k+1) （１≦ k ＜ｎ）が最大となる第ｋ候補文字を求める。文字候補評価値演
算部４では、第ｋ候補文字より上位の候補文字（第ｋ候
補文字を含む）と下位の候補文字とで算出方式を変えて
文字候補評価値Ｒ(i,j)（ｉは文字位置、ｊは第ｊ候
補）を求める。例えば、第k候補文字より上位の候補文
字ではＲ(i,j)＝Ａ(i,j)／Ａ(i,k+1)、第ｋ候補文字よ
り下位の候補文字ではＲ(i,j)＝Ａ(i,j)／Ａ(i,1) な
どで求めることができる。すなわち、第ｋ候補文字より
上位の候補文字の文字候補評価値の方が下位の候補文字
より高い評価値となる。このように、類似度の最大差と
なる部分で候補文字をグループ分けするし、類似度最大
差の候補を基準にすることにより、相対値である認識類
似度から絶対値である文字候補評価値を求めることがで
きる。The maximum similarity difference detecting section 3 obtains a k-th candidate character which maximizes A (i, k) -A (i, k + 1) (1 ≦ k <n) for each character position i. . The character candidate evaluation value calculation unit 4 changes the calculation method for candidate characters higher than the k-th candidate character (including the k-th candidate character) and lower-order candidate characters, and changes the character candidate evaluation value R (i, j) (i Is the character position, and j is the j-th candidate). For example, R (i, j) = A (i, j) / A (i, k + 1) for a candidate character higher than the k-th candidate character, and R (i, j ) for a candidate character lower than the k-th candidate character. ) = A (i, j) / A (i, 1) . That is, the character candidate evaluation value of the candidate character higher than the k-th candidate character has a higher evaluation value than the lower candidate character. In this way, by grouping candidate characters in a portion having a maximum similarity difference and using the candidate having the maximum similarity difference as a reference, a character candidate evaluation value which is an absolute value from a recognition similarity which is a relative value Can be requested.

【００１４】文字列検索部５では、知識辞書６に含まれ
る文字列の中から候補文字集合の組み合せと部分一致す
る文字列を文字列候補として抽出する。例えば、図２の
候補文字集合と図３に示す知識辞書との部分一致を行っ
た結果、「ＦＹ−３８Ｎ」，「ＪＮ−２８」，「ＪＰ−
２８Ｍ」，「ＬＲ−Ｖ０８」が文字列候補として抽出さ
れる。文字列検索部５から出力された文字列候補から、
文字列候補評価値演算部７で文字列候補評価値Ｃを求め
る。文字列候補評価値Ｃは、Ｃ＝ΣＢi で求める。ただ
し、Ｂi は、文字列候補の文字位置ｉの文字が文字位置
ｉの第ｊ候補文字に存在するときはＢi ＝Ｒ(i,j) で
あり、文字位置ｉの候補文字の中に存在しないときはＢ
i ＝０である。文字列候補選択部８は、各文字列候補
の文字列候補評価値を比較し、最も文字列候補評価値の
大きい文字列候補を選択し、選択した文字列候補を修正
文字列として出力する。The character string search unit 5 extracts character strings that partially match the combination of candidate character sets from the character strings included in the knowledge dictionary 6 as character string candidates. For example, as a result of performing partial matching between the candidate character set of FIG. 2 and the knowledge dictionary shown in FIG. 3, “FY-38N”, “JN-28”, “JP-38N”
28M "and" LR-V08 "are extracted as character string candidates. From the character string candidates output from the character string search unit 5,
The character string candidate evaluation value calculator 7 obtains a character string candidate evaluation value C. The character string candidate evaluation value C is obtained by C = ΣBi. However, Bi is Bi = R (i, j) when the character at the character position i of the character string candidate exists in the j-th candidate character at the character position i, and does not exist in the candidate character at the character position i. Time B
i = 0. The character string candidate selection unit 8 compares the character string candidate evaluation values of the character string candidates, selects a character string candidate having the largest character string candidate evaluation value, and outputs the selected character string candidate as a corrected character string.

【００１５】以上のように各候補文字の類似度の差が最
大となる候補を基準として文字候補評価値を求めること
により、類似の字形が存在する文字に対しても正しい文
字候補評価値を求めることが可能となった。As described above, the character candidate evaluation value is obtained based on the candidate having the largest similarity difference between the candidate characters, thereby obtaining the correct character candidate evaluation value even for a character having a similar character shape. It became possible.

【００１６】次に、本発明の第２の実施例について説明
する。図４にこの実施例の文字認識装置の全体の構成を
示す。Next, a second embodiment of the present invention will be described. FIG. 4 shows the overall configuration of the character recognition device of this embodiment.

【００１７】文字認識部２は、帳票画像１より文字認識
を行い、１文字につき第１候補文字から第ｎ候補文字ま
でのｎ個の候補文字を持つ候補文字集合と各候補文字の
認識類似度を出力する。文字候補評価値演算部４は、候
補文字の類似度をもとに文字候補評価値を求める。文字
列検索部５は、各候補文字集合の組み合せの中から、知
識辞書６を検索することにより、文字列候補となる組み
合せを選び出す。文字列候補評価値演算部７は、文字列
検索部５で検索された文字列候補を知識辞書との一致
度、文字候補評価値を基準として、文字列候補評価値を
計算する。文字列候補選択部８は、文字列候補の中で文
字列候補評価値の最も大きい文字列候補を選択する。A character recognizing unit 2 performs character recognition from the form image 1, and sets a candidate character set having n candidate characters from a first candidate character to an nth candidate character per character and the recognition similarity of each candidate character. Is output. The character candidate evaluation value calculation unit 4 calculates a character candidate evaluation value based on the similarity of the candidate characters. The character string search unit 5 selects a combination that becomes a character string candidate from the combinations of the candidate character sets by searching the knowledge dictionary 6. The character string candidate evaluation value calculation unit 7 calculates a character string candidate evaluation value based on the degree of coincidence of the character string candidate searched by the character string search unit 5 with the knowledge dictionary and the character candidate evaluation value. The character string candidate selection unit 8 selects a character string candidate having the largest character string candidate evaluation value among the character string candidates.

【００１８】誤記入推論部９は、文字列候補選択部８で
選択された文字列候補の文字列候補評価値と候補文字集
合から、記入者の誤記入がないか推論する。区切り文字
削除挿入部10は、記入者の誤記入があった場合に、誤記
入文字の削除および挿入を行う。図５は誤記入推論部９
の内部構成であり、これについて説明する。The erroneous entry inference unit 9 infers whether there is an erroneous entry by the writer from the character string candidate evaluation value of the character string candidate selected by the character string candidate selection unit 8 and the candidate character set. The delimiter deletion / insertion unit 10 deletes and inserts erroneously entered characters when there is an erroneous entry by the writer. FIG. 5 shows an incorrect entry inference unit 9.
This will be described.

【００１９】高文字評価値文字選択部22は、文字評価値
の高い候補文字を選択し、区切り文字検索部23は、高文
字評価値文字選択部22から出力された候補文字の中から
区切り文字を検索する。誤挿入推論部24は、記入者の区
切り文字誤挿入を推論し、誤挿入があると推論したとき
は、区切り文字削除指示30を出力する。誤認識文字追加
部25は、誤認識辞書28を用いて高文字評価値文字選択部
22から出力された候補文字をもとに誤認識されやすい文
字を追加する。前方部分一致検索部26は、高文字評価値
文字選択部22から出力された候補文字の中で、知識辞書
に含まれる文字列と前方部分一致する文字を検索する。
誤省略推論部27は、記入者の区切り文字誤省略を推論
し、誤省略があると推論したときは、区切り文字挿入指
示31を出力する。The high character evaluation value character selection unit 22 selects a candidate character having a high character evaluation value, and the delimiter search unit 23 selects a delimiter character from the candidate characters output from the high character evaluation value character selection unit 22. Search for. The erroneous insertion inference unit 24 infers the erroneous insertion of the delimiter by the writer, and outputs the delimiter deletion instruction 30 when inferring that there is an erroneous insertion. The misrecognition character adding unit 25 uses the misrecognition dictionary 28 to select a high character evaluation value character selection unit.
Add a character that is likely to be misrecognized based on the candidate characters output from 22. The front part match search unit 26 searches the candidate characters output from the high character evaluation value character selection unit 22 for a character whose front part matches the character string included in the knowledge dictionary.
The erroneous omission inference unit 27 infers the erroneous omission of the delimiter of the writer, and outputs the delimiter insertion instruction 31 when inferring that there is an erroneous omission.

【００２０】上記の構成の文字認識装置において次のよ
うにして文字認識を行う。まず、帳票の画像を文字認識
部２で処理し、１文字につき第１候補文字から第ｎ候補
文字までのｎ個の候補文字を持つ候補文字集合と各候補
文字の認識類似度Ａ(i,j)（ｉは文字位置、ｊは第ｊ候
補）を出力する。文字候補評価値演算部４では、各候補
文字の認識類似度等をもとに文字候補評価値Ｒ(i,j)
（ｉは文字位置、ｊは第ｊ候補）を求める。In the character recognition device having the above configuration,
To perform character recognition. First, character recognition of the form image
Processed by part 2 and the first to nth candidates for each character
Candidate character set with n candidate characters up to the character and each candidate
Character recognition similarity A(i, j)(I is the character position, j is the jth
Complement) is output. In the character candidate evaluation value calculation unit 4, each candidate
Character candidate evaluation value R based on character recognition similarity, etc.(i, j)
(I is a character position, j is a j-th candidate).

【００２１】文字列検索部５では、知識辞書６に含まれ
る文字列の中から候補文字集合の組み合せと部分一致す
る文字列を文字列候補として抽出する。例えば、図２の
候補文字集合と図３に示す知識辞書との部分一致を行っ
た結果、「ＦＹ−３８Ｎ」，「ＪＮ−２８」，「ＪＰ−
２８Ｍ」，「ＬＲ−Ｖ０８」が文字列候補として抽出さ
れる。The character string search unit 5 extracts character strings that partially match the combination of the candidate character sets from the character strings included in the knowledge dictionary 6 as character string candidates. For example, as a result of performing partial matching between the candidate character set of FIG. 2 and the knowledge dictionary shown in FIG. 3, “FY-38N”, “JN-28”, “JP-38N”
28M "and" LR-V08 "are extracted as character string candidates.

【００２２】文字列検索部５から出力された文字列候補
から、文字列候補評価値演算部７で、文字列候補評価値
Ｃを求める。文字列候補評価値Ｃは、Ｃ＝ΣＢi で求め
る。ただし、Ｂi は、文字列候補の文字位置ｉの文字が
文字位置ｉの第ｊ候補文字に存在するときはＢi ＝Ｒ
(i,j) であり、文字位置iの候補文字の中に存在しない
ときはＢi ＝０である。A character string candidate evaluation value calculation unit 7 obtains a character string candidate evaluation value C from the character string candidates output from the character string search unit 5. The character string candidate evaluation value C is obtained by C = ΣBi. However, Bi = R when the character at the character position i of the character string candidate exists in the j-th candidate character at the character position i.
(i, j) , and Bi = 0 if not present in the candidate character at character position i.

【００２３】文字列候補選択部８は、各文字列候補の文
字列候補評価値を比較し、最も文字列候補評価値の大き
い文字列候補を選択する。The character string candidate selection section 8 compares the character string candidate evaluation values of the character string candidates and selects a character string candidate having the largest character string candidate evaluation value.

【００２４】高文字評価値文字選択部22では、候補文字
集合の中から文字評価値が一定以上の値を持つ候補文字
を選択する。または、ある一定の候補(第ｍ候補)を決め
て、その第ｍ候補以内の候補文字( １＜ｍ＜ｎ )を
選択する。The high character evaluation value character selection section 22 selects a candidate character having a character evaluation value of a certain value or more from a candidate character set. Alternatively, a certain candidate (m-th candidate) is determined, and candidate characters (1 <m <n) within the m-th candidate are selected.

【００２５】区切り文字検索部23では、高文字評価値文
字選択部22で選択された候補文字の中から、候補文字の
中にある区切り文字を検索する。誤挿入推論部24では、
文字列候補選択部８が出力した文字列候補の文字列候補
評価値が一定の値以下すなわち正解である確率が低く、
区切り文字検索部23で区切り文字が検索できたとき、そ
の区切り文字の位置を区切り文字削除指示30として、区
切り文字削除挿入部10に出力する。一方、文字列候補選
択部８が出力した文字列候補の文字列候補評価値が一定
の値以上すなわち正解である確率が高いか、区切り文字
検索部23で区切り文字が検索できなかったときは、文字
列候補選択部８が出力した文字列候補を誤認識文字追加
部25に出力する。The delimiter search unit 23 searches the candidate characters selected by the high character evaluation value character selection unit 22 for a delimiter in the candidate characters. In the incorrect insertion inference unit 24,
The probability that the character string candidate evaluation value of the character string candidate output by the character string candidate selection unit 8 is equal to or less than a certain value, that is, a correct answer, is low,
When the delimiter search unit 23 can search for the delimiter, the position of the delimiter is output to the delimiter deletion and insertion unit 10 as a delimiter deletion instruction 30. On the other hand, if the character string candidate evaluation value of the character string candidate output by the character string candidate selection unit 8 is equal to or more than a certain value, that is, the probability of being a correct answer is high, or if the delimiter search unit 23 cannot search for a delimiter, The character string candidates output by the character string candidate selection unit 8 are output to the misrecognized character addition unit 25.

【００２６】誤認識辞書28は、文字認識部２で誤る確率
の高い文字に対して、誤認識の結果文字と正解の文字と
の組を記憶した辞書である。誤認識文字追加部25では、
高文字評価値文字選択部22で選択されたそれぞれの候補
文字が、誤認識辞書28の誤認識の結果文字と一致したな
らば、誤認識辞書28の対応する正解の文字を高文字評価
値文字選択部22で選択された候補文字に追加する。The misrecognition dictionary 28 is a dictionary that stores a set of characters resulting from misrecognition and correct characters for characters having a high probability of being mistaken by the character recognition unit 2. In the misrecognized character addition section 25,
If each candidate character selected by the high character evaluation value character selection unit 22 matches a character as a result of erroneous recognition in the erroneous recognition dictionary 28, the corresponding correct character in the erroneous recognition dictionary 28 is changed to a high character evaluation value character. It is added to the candidate character selected by the selection unit 22.

【００２７】前方部分一致検索部26では、高文字評価値
文字選択部22で選択された候補文字および誤認識文字追
加部25で追加された候補文字と知識辞書の文字列との組
み合せとで前方部分一致し、かつ知識辞書で一致しなか
った最初の文字が区切り文字である文字列を知識辞書か
ら検索する。誤省略推論部27は、文字列候補選択部８が
出力した文字列候補の文字列候補評価値が一定の値以下
すなわち正解である確率が低く、前方部分一致検索部26
から前方部分一致文字列が検索できたとき、部分一致し
なかった最初の文字の文字位置を区切り文字挿入指示31
として、区切り文字削除挿入部10に出力する。一方、文
字列候補選択部８が出力した文字列候補の文字列候補評
価値が一定の値以上すなわち正解である確率が高いか、
前方部分一致検索部26から前方部分一致文字列が検索で
きなかったときは、文字列候補選択部８が出力した文字
列候補を修正文字列11として出力する。The front partial match search unit 26 uses the combination of the candidate character selected by the high character evaluation value character selection unit 22 and the candidate character added by the misrecognition character addition unit 25 with the character string of the knowledge dictionary. A character string in which the first character that partially matches and does not match in the knowledge dictionary is a delimiter is searched from the knowledge dictionary. The omitting omission inference unit 27 has a low probability that the character string candidate evaluation value of the character string candidate output by the character string candidate selection unit 8 is equal to or less than a certain value, that is, a correct answer.
When a partial match string can be searched for from, the character position of the first character that did not match partially is specified as a delimiter insertion instruction 31
Is output to the delimiter deletion / insertion unit 10. On the other hand, whether the probability that the character string candidate evaluation value of the character string candidate output by the character string candidate selection unit 8 is equal to or more than a certain value, that is, a correct answer is high,
When the front part matching search unit 26 cannot search for the front part matching character string, the character string candidate output by the character string candidate selection unit 8 is output as the corrected character string 11.

【００２８】区切り文字削除挿入部10は、区切り文字削
除指示30または区切り文字挿入指示31に基づいて、候補
文字集合を編集する。区切り文字削除指示30の場合は、
候補文字集合に対して区切り文字の文字位置にある候補
文字を削除し、その文字位置以降の文字位置にある候補
文字を１字ずつ前の文字位置にずらす。また、区切り文
字挿入指示31の場合は、候補文字集合に対して区切り文
字の文字位置以降の文字位置にある候補文字を１字ずつ
後ろの文字位置にずらし、区切り文字の文字位置の第１
候補を区切り文字にする。そして、この候補文字集合を
文字列検索部５に出力する。The delimiter deletion / insertion unit 10 edits a candidate character set based on a delimiter deletion instruction 30 or a delimiter insertion instruction 31. In the case of delimiter deletion instruction 30,
The candidate character at the character position of the delimiter character is deleted from the candidate character set, and the candidate characters at the character positions subsequent to the character position are shifted one character at a time to the previous character position. In the case of the delimiter insertion instruction 31, the candidate characters at the character positions subsequent to the delimiter character position are shifted one character at a time to the next character position with respect to the candidate character set.
Make the candidate a delimiter. Then, the candidate character set is output to the character string search unit 5.

【００２９】以上のように誤認識辞書28を用いることに
より、文字認識部２の誤りを考慮した誤省略の推論が可
能となった。By using the erroneous recognition dictionary 28 as described above, it is possible to infer erroneous omissions in consideration of errors in the character recognition unit 2.

【００３０】第１の実施例と第２の実施例を用いた認識
実験を行った。帳票に書かれた品番の認識を行った結
果、後処理を行わない場合の認識率が２９.5％、第１の
実施例のみ行った場合の認識率が８１.9％、第１の実施
例と第２の実施例を両方とも行った場合の認識率が９
３.7％となった。A recognition experiment was performed using the first embodiment and the second embodiment. As a result of recognizing the product number written on the form, the recognition rate when post-processing is not performed is 29.5%, and when only the first embodiment is performed, the recognition rate is 81.9%. The recognition rate when both the example and the second embodiment are performed is 9
It was 3.7%.

【００３１】[0031]

【発明の効果】以上の実施例から明らかなように、本発
明の構成の文字認識装置を使用することにより、各候補
文字の類似度の差が最大となる候補を基準として文字候
補評価値を求めるので、類似の字形が存在する文字に対
しても正しい文字候補評価値を求めることができる。ま
た、誤認識辞書を用いるので文字認識部の誤りを考慮し
た誤省略の推論ができ、記入者が区切り文字を誤って省
略したときでも、正しい文字列を決定することができ
る。以上の構成で文字認識を行うために認識率が向上
し、その効果は大きい。As is clear from the above embodiment, by using the character recognition device of the present invention, the character candidate evaluation value is determined based on the candidate having the maximum similarity difference between the candidate characters. Therefore, a correct character candidate evaluation value can be obtained for a character having a similar character shape. In addition, since an erroneous recognition dictionary is used, erroneous omission can be inferred in consideration of an error in the character recognition unit, and a correct character string can be determined even when a writer accidentally omits a delimiter. Since character recognition is performed with the above configuration, the recognition rate is improved, and the effect is large.

[Brief description of the drawings]

【図１】本発明の第１の実施例の文字認識装置の構成図FIG. 1 is a configuration diagram of a character recognition device according to a first embodiment of the present invention.

【図２】本発明の第１の実施例の文字認識部の出力図FIG. 2 is an output diagram of a character recognition unit according to the first embodiment of the present invention.

【図３】本発明の第１の実施例の文字列検索部の出力図FIG. 3 is an output diagram of a character string search unit according to the first embodiment of the present invention.

【図４】本発明の第２の実施例の文字認識装置の構成図FIG. 4 is a configuration diagram of a character recognition device according to a second embodiment of the present invention.

【図５】本発明の第２の実施例の誤記入推論部の構成図FIG. 5 is a configuration diagram of an erroneous entry inference unit according to a second embodiment of the present invention;

【図６】従来の文字認識装置の構成を示すブロック図FIG. 6 is a block diagram showing a configuration of a conventional character recognition device.

[Explanation of symbols]

１帳票画像２文字認識部３類似度最大差検出部４文字候補評価値演算部５文字列検索部６知識辞書７文字列候補評価値演算部８文字列候補選択部９誤記入推論部 10 区切り文字削除挿入部 11 修正文字列 21 候補文字集合 22 高文字評価値文字選択部 23 区切り文字検索部 24 誤挿入推論部 25 誤認識文字追加部 26 前方部分一致検索部 27 誤省略推論部 28 誤認識辞書 29 修正文字列 30 区切り文字削除指示 31 区切り文字挿入指示 1 Form image 2 Character recognition unit 3 Maximum similarity difference detection unit 4 Character candidate evaluation value calculation unit 5 Character string search unit 6 Knowledge dictionary 7 Character string candidate evaluation value calculation unit 8 Character string candidate selection unit 9 Error entry reasoning unit 10 Delimiter Character deletion / insertion section 11 Corrected character string 21 Candidate character set 22 High-character evaluation value character selection section 23 Delimiter search section 24 Misinsertion inference section 25 Misrecognition character addition section 26 Forward partial match search section 27 False omission inference section 28 Misrecognition Dictionary 29 Corrected character string 30 Delimiter deletion instruction 31 Delimiter insertion instruction

───────────────────────────────────────────────────── フロントページの続き (72)発明者前川英嗣大阪府門真市大字門真1006番地松下電器産業株式会社内 (72)発明者萱嶋一弘大阪府門真市大字門真1006番地松下電器産業株式会社内 (56)参考文献特開平４−205457（ＪＰ，Ａ) 特開平５−62022（ＪＰ，Ａ) ──────────────────────────────────────────────────続き Continuing on the front page (72) Inventor Eiji Maekawa 1006 Kadoma Kadoma, Osaka Prefecture Matsushita Electric Industrial Co., Ltd. (56) References JP-A-4-205457 (JP, A) JP-A-5-62022 (JP, A)

Claims

(57) [Claims]

1. A character recognition unit for recognizing an input image character by character and outputting n candidate characters and recognition similarity for each character, and a recognition similarity of a j-th candidate at a character position i being A (i, j) , the recognition similarity difference A (i, j) −A (i, j +
1) a similarity maximum difference detecting unit that detects a candidate that maximizes,
Assuming that the candidate character having the maximum recognition similarity A (i, k) -A (i, k + 1) is the k-th candidate, the top candidate characters including the k-th candidate from the k-th candidate and the k-th candidate A character candidate evaluation value calculation in which the calculation method is changed between lower-order candidate characters and a higher-order candidate character including the k-th candidate and a higher-order candidate character including the k-th candidate obtains a higher character candidate evaluation value than the lower-rank candidate characters And a character string search unit that searches for a character string that partially matches a combination of a character string and a candidate character included in the knowledge dictionary and outputs a character string candidate, based on the character candidate evaluation value of each character string candidate Character string recognition apparatus, comprising: a character string candidate evaluation value calculating unit for obtaining a character string candidate evaluation value by using a character string candidate selecting unit for selecting a character string candidate having the largest character string candidate evaluation value. .

2. A character recognition unit for recognizing an input image character by character and outputting n candidate characters and recognition similarity for each character, and a character candidate for obtaining a character candidate evaluation value from the recognition similarity of each candidate character. An evaluation value calculation unit, a character string search unit that searches for a character string that partially matches a combination of a character string included in the knowledge dictionary and a candidate character, and outputs a character string candidate, and a character candidate evaluation value of each character string candidate. A character string candidate evaluation value calculation unit for obtaining a character string candidate evaluation value based on the character string candidate evaluation unit, a character string candidate selection unit for selecting a character string candidate having the largest character string candidate evaluation value, and a selected character string candidate and candidate. An erroneous entry inference unit that infers an erroneous entry of a writer from a character set, and a delimiter deletion insertion unit that deletes or inserts an erroneous entry character, wherein the erroneous entry inference unit has a character evaluation value of a certain value or more. Character selection to select candidate characters with a value of A value character selection unit, a delimiter search unit that searches for a delimiter from the candidate characters selected by the high character evaluation value character selection unit, and a delimiter search unit that determines that the character string candidate evaluation value is equal to or less than a certain value. When a character can be searched, the delimiter is inferred to be a delimiter erroneous insertion of a writer, and an erroneous insertion inference unit that deletes a candidate character at a character position of the delimiter from a candidate character set,
If the candidate character selected by the high character evaluation value character selection unit is a misrecognition result character of the misrecognition dictionary, a misrecognition character adding unit that adds the correct character of the misrecognition dictionary to the candidate character,
A front part match search unit that performs a front part match search using a combination of a character string of a knowledge dictionary and a candidate character selected by the high character evaluation value character selection unit and a candidate character added by the misrecognition character addition unit; When the character string candidate evaluation value is equal to or less than a certain value and a partial partial match character string can be searched, the first character position that does not partially match is inferred to be the omission of the delimiter character of the writer, and A character recognition device comprising: an erroneous omission inference unit that inserts a delimiter at a first character position where the partial mismatch occurs.