JP3085107B2

JP3085107B2 - Character recognition device

Info

Publication number: JP3085107B2
Application number: JP06268640A
Authority: JP
Inventors: 寿男丹羽; 浩司山本; 英嗣前川; 一弘萱嶋
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1994-11-01
Filing date: 1994-11-01
Publication date: 2000-09-04
Anticipated expiration: 2015-09-04
Also published as: JPH08129616A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、帳票などに記載されて
いる文字を読み取って認識するための文字認識装置に関
するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognizing device for reading and recognizing characters described in a form or the like.

【０００２】[0002]

【従来の技術】従来から、文字認識処理によって得られ
た結果に対して、知識処理を導入して認識精度の向上が
図られている。この知識処理は、１文字ごとの認識結果
に対して、知識辞書との照合を行うことにより、認識結
果を最も確からしい文字に修正する方式である。認識対
象の内容により、知識辞書としては、単語辞書、地名辞
書、人名辞書、品番辞書などが用いられる。2. Description of the Related Art Hitherto, knowledge processing has been introduced to the results obtained by character recognition processing to improve recognition accuracy. This knowledge processing is a method of correcting the recognition result to the most likely character by comparing the recognition result for each character with a knowledge dictionary. Depending on the content of the recognition target, a word dictionary, a place name dictionary, a personal name dictionary, a part number dictionary, or the like is used as the knowledge dictionary.

【０００３】図６は、従来の文字認識装置の構成を示す
図で、以下この図を用いてその動作を説明する。図に示
すように文字認識部52は、帳票画像51を読み込み、１文
字に付きｎ個の候補文字を出力する。文字列検索部53
は、知識辞書54を用いて、候補文字列集合の中から知識
辞書54に含まれる文字列を構成する文字の組み合せを求
め、文字列候補評価値演算部55で、知識辞書54との一致
文字数や文字認識部52での類似度などに基づいて文字列
候補評価値を求める。文字列候補選択部56で、文字列候
補評価値が最も高い文字列候補を選択し、この文字列候
補を認識結果として出力する。以上のようにして、文字
列を認識することにより、文字認識部52が誤って認識を
した文字を修正することができ、認識の向上を図ること
ができる。FIG. 6 is a diagram showing a configuration of a conventional character recognition device. The operation of the device will be described below with reference to FIG. As shown in the figure, the character recognition unit 52 reads the form image 51 and outputs n candidate characters per character. String search section 53
Calculates a combination of characters constituting the character string included in the knowledge dictionary 54 from the candidate character string set using the knowledge dictionary 54, and calculates the number of matching characters with the knowledge dictionary 54 by the character string candidate evaluation value calculation unit 55. And a character string candidate evaluation value based on the similarity in the character recognition unit 52 and the like. The character string candidate selection unit 56 selects a character string candidate with the highest character string candidate evaluation value, and outputs this character string candidate as a recognition result. As described above, by recognizing the character string, the character that the character recognizing unit 52 has erroneously recognized can be corrected, and the recognition can be improved.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら上記従来
の構成では、認識対象の画像が傾いていたり、ノイズが
のっていた場合、文字認識部52から出力される候補文字
に対する類似度が大きく変動し、それに伴い文字列候補
評価値も変動することから、知識処理において、誤った
候補を選択してしまうことがある。However, in the above-described conventional configuration, when the image to be recognized is tilted or has noise, the degree of similarity to the candidate character output from the character recognition unit 52 greatly varies. Since the character string candidate evaluation value changes accordingly, an incorrect candidate may be selected in the knowledge processing.

【０００５】また、帳票などの記入において、記入者が
誤って区切り文字（例えば、住所においては「町」
「字」、品番においては「−」「／」など）を挿入した
り、省略したりすることがある。この場合に、文字位置
がずれるので、文字列検索部では知識辞書との照合がで
きずに正しい文字列候補を検索することができない。[0005] In addition, in the entry of a form or the like, the entry person may mistakenly input a delimiter character (for example, "town" in an address).
In some cases, “characters” and “−” and “/” in product numbers) may be inserted or omitted. In this case, since the character position is shifted, the character string search unit cannot perform matching with the knowledge dictionary and cannot search for a correct character string candidate.

【０００６】本発明はこのような従来の課題を解決する
もので、確からしい文字をもとに評価値を求め、記入者
の誤りを推論することにより、文字認識率を高くするこ
とを目的としている。The present invention solves such a conventional problem. It is an object of the present invention to obtain an evaluation value based on probable characters and infer an error of a writer, thereby increasing a character recognition rate. I have.

【０００７】[0007]

【課題を解決するための手段】本発明は上記目的を達成
するために、文字列候補評価値演算部において各候補文
字の文字認識部での類似度の分布および知識辞書におけ
る文字の出現頻度をもとに文字列候補評価値を求める。
さらに、記入者の記入誤りを推論し、誤り文字を削除お
よび挿入して正しい文字列を求めるものである。According to the present invention, in order to achieve the above object, a character string candidate evaluation value calculating unit calculates a similarity distribution of each candidate character in a character recognition unit and a character appearance frequency in a knowledge dictionary. A character string candidate evaluation value is obtained based on the evaluation value.
Further, it infers a typographical error of the writer and deletes and inserts erroneous characters to obtain a correct character string.

【０００８】[0008]

【作用】本発明は上記した構成により、認識対象の画像
にノイズなどがのっていた場合でも、各候補文字の類似
度の傾向から文字列候補評価値を求めるので、信頼度の
低い文字は評価値への寄与が下がり誤認識を減らすこと
ができる。さらに、記入者の記入誤りを推論することに
より、記入誤りがある場合においても正しい文字列を求
めることができ、よって文字認識率が向上する。According to the present invention, the character string candidate evaluation value is obtained from the tendency of the similarity of each candidate character even if noise is present in the image to be recognized. The contribution to the evaluation value is reduced, and erroneous recognition can be reduced. Further, by inferring a typographical error of the writer, a correct character string can be obtained even when there is a typographical error, and the character recognition rate is improved.

【０００９】[0009]

【実施例】以下、本発明の一実施例について説明する。
図１にこの実施例の文字認識装置の全体の構成を示す。An embodiment of the present invention will be described below.
FIG. 1 shows the overall configuration of the character recognition device of this embodiment.

【００１０】文字認識部２は、帳票画像１より文字認識
を行い、１文字につき第１候補文字から第ｎ候補文字ま
でのｎ個の候補文字を持つ候補文字集合と各候補文字の
認識類似度を出力する。A character recognizing unit 2 performs character recognition from the form image 1 and sets a candidate character set having n candidate characters from a first candidate character to an nth candidate character per character and the recognition similarity of each candidate character. Is output.

【００１１】文字列検索部３は、各候補文字集合の組み
合せの中から、知識辞書４を検索することにより、文字
列候補となる組み合せを選び出す。文字列候補評価値演
算部５は、文字列検索部３で検索された文字列候補を知
識辞書４との一致度、文字認識部１での認識類似度およ
び知識辞書４での文字の出現頻度を基準として、文字列
候補評価値を計算する。文字列候補選択部６は、文字列
候補の中で文字列候補評価値の最も大きい文字列候補を
選択する。The character string search section 3 searches the knowledge dictionary 4 from combinations of the candidate character sets to select combinations that are character string candidates. The character string candidate evaluation value calculator 5 calculates the degree of coincidence of the character string candidate searched by the character string search unit 3 with the knowledge dictionary 4, the degree of similarity recognized by the character recognition unit 1, and the frequency of appearance of characters in the knowledge dictionary 4. The character string candidate evaluation value is calculated based on. The character string candidate selection unit 6 selects a character string candidate having the largest character string candidate evaluation value among the character string candidates.

【００１２】誤記入推論部７は、文字列候補選択部６で
選択された文字列候補の文字列候補評価値と候補文字集
合から、記入者の誤記入がないか推論する。区切り文字
削除挿入部８は、記入者の誤記入があった場合に、誤記
入文字の削除および挿入を行う。The erroneous entry inference unit 7 infers from the character string candidate evaluation value of the character string candidate selected by the character string candidate selection unit 6 and a candidate character set whether there is any erroneous entry by the writer. The delimiter deletion / insertion unit 8 deletes and inserts an erroneously entered character when there is an erroneous entry by the writer.

【００１３】図２は文字列候補評価値演算部の内部構成
を示す図であり、これについて説明する。まず、類似度
評価部12は、各候補文字の類似度の傾向から類似度評価
を行い、文字出現頻度評価部13は、知識辞書に含まれる
文字の出現頻度情報から文字出現頻度評価を行う。文字
評価値演算部14は、各候補文字の文字評価値を類似度評
価と文字出現頻度評価をもとに求める。候補外文字評価
部15は、候補外文字の評価を行う。文字列評価値導出部
16は、各候補文字の文字評価値から、各文字列候補の文
字列評価値を求める。FIG. 2 is a diagram showing the internal configuration of the character string candidate evaluation value calculation unit, which will be described. First, the similarity evaluation unit 12 performs similarity evaluation based on the tendency of similarity of each candidate character, and the character appearance frequency evaluation unit 13 performs character appearance frequency evaluation based on the appearance frequency information of the characters included in the knowledge dictionary. The character evaluation value calculation unit 14 obtains the character evaluation value of each candidate character based on the similarity evaluation and the character appearance frequency evaluation. The non-candidate character evaluation unit 15 evaluates non-candidate characters. String evaluation value derivation unit
Step 16 calculates a character string evaluation value of each character string candidate from the character evaluation value of each candidate character.

【００１４】図３は誤記入推論部の内部構成を示す図で
あり、以下これについて説明する。まず、高文字評価値
文字選択部22は、文字評価値の高い候補文字を選択し、
区切り文字検索部23は、高文字評価値文字選択部22から
出力された候補文字の中から区切り文字を検索する。誤
挿入推論部24は、記入者の区切り文字誤挿入を推論し
て、誤挿入があると推論したときは、区切り文字削除指
示28を出力する。前方部分一致検索部25は、高文字評価
値文字選択部22から出力された候補文字の中で、知識辞
書に含まれる文字列と前方部分一致する文字を検索す
る。誤省略推論部26は、記入者の区切り文字誤省略を推
論し、誤省略があると推論したときは、区切り文字挿入
指示29を出力する。FIG. 3 is a diagram showing the internal configuration of the erroneous entry inference unit, which will be described below. First, the high character evaluation value character selection unit 22 selects a candidate character having a high character evaluation value,
The delimiter search unit 23 searches for a delimiter from among the candidate characters output from the high-character evaluation value character selection unit 22. The erroneous insertion inference unit 24 infers the erroneous insertion of the delimiter by the writer, and outputs the delimiter deletion instruction 28 when inferring that there is an erroneous insertion. The front part match search unit 25 searches the candidate characters output from the high character evaluation value character selection unit 22 for a character whose front part matches the character string included in the knowledge dictionary. The erroneous omission inference unit 26 infers an erroneous omission of the delimiter of the entry person and outputs a delimiter insertion instruction 29 when inferring that there is an erroneous omission.

【００１５】上記構成の文字認識装置において次のよう
にして文字認識を行う。まず、帳票画像１を文字認識部
２で処理し、１文字につき第１候補文字から第ｎ候補文
字までのｎ個の候補文字を持つ候補文字集合と各候補文
字の認識類似度Ａij（ｉは文字位置、ｊは第ｊ候補）を
得る。図４は、文字認識部２で文字認識を行い、１文字
につき第１候補文字から第５候補文字までの候補文字を
得た結果である。文字列検索部３では、知識辞書４に含
まれる文字列の中から候補文字集合の組み合せと部分一
致する文字列を文字列候補として抽出する。例えば、図
４の候補文字集合と、図５に示す知識辞書４との部分一
致を行った結果、「ＦＹ−３８Ｎ」，「ＪＮ−２８」，
「ＪＰ−２８Ｍ」，「ＬＲ−Ｖ０８」が文字列候補とし
て抽出される。The character recognition apparatus having the above-described structure performs character recognition as follows. First, the form image 1 is processed by the character recognizing unit 2, and a candidate character set having n candidate characters from the first candidate character to the n-th candidate character per character and the recognition similarity Aij (i is Character position, j is the j-th candidate). FIG. 4 shows the result of performing character recognition by the character recognition unit 2 and obtaining candidate characters from the first candidate character to the fifth candidate character for each character. The character string search unit 3 extracts character strings that partially match the combination of the candidate character sets from the character strings included in the knowledge dictionary 4 as character string candidates. For example, as a result of performing partial matching between the candidate character set of FIG. 4 and the knowledge dictionary 4 shown in FIG. 5, “FY-38N”, “JN-28”,
“JP-28M” and “LR-V08” are extracted as character string candidates.

【００１６】文字列検索部３から出力された文字列候補
から、類似度評価部12で文字位置ごとに文字列候補と一
致した候補文字の類似度を調べ、文字位置ｉに対して類
似度から求まる類似度評価値Ｒi を出力する。文字列候
補と一致した文字が第ｊ候補のとき、Ｒi はＡijによっ
て求まる関数で、例えば、Ｒi ＝Ａijで求めることがで
きる。また、第１候補の認識類似度Ａi1で正規化して
Ｒi＝Ａij／Ａi1で求めることができる。ただし、文字
列候補と一致した文字がない場合は、Ｒi ＝０である。
文字出現頻度評価部13では、文字列検索部３から出力さ
れた文字列候補の各文字が知識辞書４に含まれる文字列
の同じ文字位置ｉに出現する頻度確率Ｐiを求める。例
えば、文字列候補「ＦＹ−３８Ｎ」の頻度確率Ｐ1 は、
知識辞書に含まれる文字列の中で、文字位置１に「Ｆ」
が出現する確率である。From the character string candidates output from the character string search unit 3, the similarity evaluation unit 12 checks the similarity of candidate characters that match the character string candidates for each character position, and determines the similarity for the character position i from the similarity. The calculated similarity evaluation value Ri is output. When the character that matches the character string candidate is the j-th candidate, Ri is a function determined by Aij, for example, Ri = Aij. Also, normalized by the recognition similarity Ai1 of the first candidate,
Ri = Aij / Ai1. However, if there is no character that matches the character string candidate, Ri = 0.
The character appearance frequency evaluation unit 13 obtains a frequency probability Pi that each character of the character string candidate output from the character string search unit 3 appears at the same character position i of the character string included in the knowledge dictionary 4. For example, the frequency probability P1 of the character string candidate "FY-38N" is
In the character string included in the knowledge dictionary, "F"
Is the probability that appears.

【００１７】文字評価値演算部14は、文字列候補の各文
字位置の文字評価値Ｂi を求める。Ｂi は、Ｂi ＝Ｒi
× ｆ(Ｐi) で、ｆ(Ｐi) は単調減少関数である。例え
ば、ｆ(Ｐi) ＝１／Ｐi とすることができる。頻度確率
Ｐi の高い文字ほど文字評価値Ｂi を低くすることによ
り、よく出現する文字での認識文字列決定権を低くでき
る。これは、出現頻度の高い文字（例えば句切り文字）
は多くの文字列に含まれているので、その文字だけに正
解文字列の推論をたよることは誤認識する危険が大きい
からである。The character evaluation value calculator 14 calculates a character evaluation value Bi at each character position of a character string candidate. Bi is Bi = Ri
× f (Pi), where f (Pi) is a monotonically decreasing function. For example, f (Pi) = 1 / Pi. By lowering the character evaluation value Bi for a character having a higher frequency probability Pi, the recognition character string determination authority for frequently occurring characters can be reduced. This is a frequently occurring character (eg, a punctuation character)
Because is included in many character strings, relying solely on that character to infer a correct character string is highly risky of misrecognition.

【００１８】候補外文字評価部15は、文字列候補と一致
した文字がない場合に、文字評価値Ｂi を求める。文字
評価値Ｂi は、Ｂi ＝ −ａ × (Ａi1 − Ａin) で、ａ
は０＜ａ＜１の定数である。例えば、文字列候補「ＦＹ
−３８Ｎ」の文字位置１の「Ｆ」が第１候補文字から第
ｎ候補文字の中にないときの文字評価値Ｂ1 は、Ｂ1 ＝
−ａ × (Ａ11 − Ａ1n) である。The non-candidate character evaluation unit 15 obtains a character evaluation value Bi when there is no character that matches the character string candidate. The character evaluation value Bi is Bi = −a × (Ai1−Ain), and a
Is a constant of 0 <a <1. For example, the character string candidate “FY
The character evaluation value B1 when "F" at character position 1 of "-38N" is not in the first to n-th candidate characters is B1 =
−a × (A11−A1n).

【００１９】文字列評価値導出部16は、文字列候補の各
文字の文字評価値Ｂi より、文字列候補評価値Ｃを求め
る。文字列候補評価値Ｃは、Ｃ＝ΣＢi で求める。文字
列候補選択部６は、各文字列候補の文字列候補評価値を
比較し、最も文字列候補評価値の大きい文字列候補を選
択する。高文字評価値文字選択部22では、候補文字集合
の中から文字評価値が一定以上の値を持つ候補文字を選
択する。もしくは、第ｍ候補以内の候補文字（１＜ｍ＜
ｎ）を選択する。The character string evaluation value deriving section 16 obtains a character string candidate evaluation value C from the character evaluation value Bi of each character of the character string candidate. The character string candidate evaluation value C is obtained by C = ΣBi. The character string candidate selection unit 6 compares the character string candidate evaluation values of the respective character string candidates and selects the character string candidate having the largest character string candidate evaluation value. The high character evaluation value character selection unit 22 selects a candidate character having a character evaluation value of a certain value or more from a candidate character set. Alternatively, candidate characters within the m-th candidate (1 <m <
Select n).

【００２０】区切り文字検索部23では、高文字評価値文
字選択部22で選択された候補文字の中から区切り文字を
検索する。誤挿入推論部24では、文字列候補選択部６が
出力した文字列候補の文字列候補評価値が一定の値以下
であり、区切り文字検索部23で区切り文字が検索できた
とき、その区切り文字の位置を区切り文字削除指示とし
て、区切り文字削除挿入部８に出力する。一方、文字列
候補選択部６が出力した文字列候補の文字列候補評価値
が一定の値以上であるか、区切り文字検索部23で区切り
文字が検索できなかったときは、文字列候補選択部６が
出力した文字列候補を前方部分一致検出部25に出力す
る。The delimiter search unit 23 searches for a delimiter from the candidate characters selected by the high character evaluation value character selection unit 22. In the erroneous insertion inference unit 24, when the character string candidate evaluation value of the character string candidate output by the character string candidate selection unit 6 is equal to or less than a certain value, and the delimiter search unit 23 can search for the delimiter, Is output to the delimiter deletion insertion unit 8 as a delimiter deletion instruction. On the other hand, if the character string candidate evaluation value of the character string candidate output by the character string candidate selection unit 6 is equal to or greater than a certain value, or if the delimiter search unit 23 cannot search for a delimiter, the character string candidate selection unit 6 outputs the character string candidate output to the front part match detection unit 25.

【００２１】前方部分一致検索部25では、知識辞書４の
文字列と高文字評価値文字選択部22で選択された候補文
字の組み合せとで前方部分一致し、かつ知識辞書で一致
しなかった最初の文字が区切り文字である文字列を知識
辞書から検索する。誤省略推論部26は、文字列候補選択
部６が出力した文字列候補の文字列候補評価値が一定の
値以下であり、前方部分一致検索部25から前方部分一致
文字列が検索できたとき、部分一致しなかった最初の文
字の文字位置を区切り文字挿入指示として、区切り文字
削除挿入部に出力する。一方、文字列候補選択部６が出
力した文字列候補の文字列候補評価値が一定の値以上で
あるか、前方部分一致検索部25から前方部分一致文字列
が検索できなかったときは、文字列候補選択部６が出力
した文字列候補を修正文字列として出力する。In the front part matching search unit 25, the first part that does not match the front part of the character string of the knowledge dictionary 4 and the combination of the candidate characters selected by the high character evaluation value character selection unit 22 and does not match in the knowledge dictionary is used. A character string whose character is a delimiter is searched from the knowledge dictionary. The omitting omission inference unit 26 determines that the character string candidate evaluation value of the character string candidate output by the character string candidate selection unit 6 is equal to or less than a certain value, and that the front part matching search unit 25 can search for the front part matching character string. The character position of the first character that does not partially match is output to the delimiter deletion / insertion unit as a delimiter insertion instruction. On the other hand, if the character string candidate evaluation value of the character string candidate output by the character string candidate selection unit 6 is equal to or greater than a certain value, or if the front partial match search unit 25 cannot search for a front part match character string, The character string candidates output by the column candidate selection unit 6 are output as corrected character strings.

【００２２】区切り文字削除挿入部８は、区切り文字削
除指示28あるいは区切り文字挿入指示29に基づいて、候
補文字集合を編集する。区切り文字削除指示28の場合
は、候補文字集合に対して区切り文字の文字位置にある
候補文字を削除し、その文字位置以降の文字位置にある
候補文字を１字ずつ前の文字位置にずらす。また、区切
り文字挿入指示29の場合は、候補文字集合に対して区切
り文字の文字位置以降の文字位置にある候補文字を１字
ずつ後ろの文字位置にずらし、区切り文字の文字位置の
第１候補を区切り文字にする。そして、この候補文字集
合を文字列検索部に出力する。The delimiter deletion / insertion unit 8 edits a candidate character set based on a delimiter deletion instruction 28 or a delimiter insertion instruction 29. In the case of the delimiter deletion instruction 28, the candidate character at the character position of the delimiter is deleted from the candidate character set, and the candidate characters at the character positions subsequent to the character position are shifted one character position to the preceding character position. In the case of the delimiter insertion instruction 29, the candidate characters at the character positions subsequent to the delimiter character position are shifted one character at a time to the next character position with respect to the candidate character set, and the first candidate character position of the delimiter character is shifted. As a delimiter. Then, the candidate character set is output to the character string search unit.

【００２３】[0023]

【発明の効果】以上のように、本発明の文字認識装置を
使用することにより、文字列評価値を認識類似度の傾向
と知識辞書の文字出現頻度とから計算するので、認識対
象画像にノイズなどがのっている場合においても、より
正確に修正文字列を決定することができる。また、誤記
入推論により、記入者が区切り文字を誤って挿入したり
省略したときでも、正しい文字列を決定することができ
る。このように文字認識を行うために認識率が向上し、
その効果は大なるものがある。As described above, by using the character recognition apparatus of the present invention, a character string evaluation value is calculated from the tendency of recognition similarity and the frequency of appearance of characters in the knowledge dictionary. Even in the case where a correction character string is included, the correction character string can be determined more accurately. In addition, by the erroneous entry inference, a correct character string can be determined even when a writer inserts or omit a separator character by mistake. In order to perform character recognition in this way, the recognition rate is improved,
The effect is great.

[Brief description of the drawings]

【図１】本発明の一実施例の文字認識装置の構成を示す
ブロック図FIG. 1 is a block diagram illustrating a configuration of a character recognition device according to an embodiment of the present invention.

【図２】本実施例の文字列候補評価値演算部の構成を示
すブロック図FIG. 2 is a block diagram illustrating a configuration of a character string candidate evaluation value calculation unit according to the embodiment;

【図３】本実施例の誤記入推論部の構成を示すブロック
図FIG. 3 is a block diagram illustrating a configuration of an erroneous entry inference unit according to the embodiment;

【図４】本実施例の文字認識部の出力図FIG. 4 is an output diagram of a character recognition unit according to the embodiment.

【図５】本実施例の文字列検索部の出力図FIG. 5 is an output diagram of a character string search unit according to the embodiment;

【図６】従来の文字認識装置の構成図FIG. 6 is a configuration diagram of a conventional character recognition device.

[Explanation of symbols]

１帳票画像２文字認識部３文字列検索部４知識辞書５文字列候補評価値演算部６文字列候補選択部７誤記入推論部８区切り文字削除挿入部９修正文字列 11 文字列候補 12 類似度評価部 13 文字出現頻度評価部 14 文字評価値演算部 15 候補外文字評価部 16 文字列評価値導出部 17 文字列候補評価値 21 候補文字集合 22 高文字評価値文字選択部 23 区切り文字検索部 24 誤挿入推論部 25 前方部分一致検索部 26 誤省略推論部 27 修正文字列 28 区切り文字削除指示 29 区切り文字挿入指示 1 Form Image 2 Character Recognition Unit 3 Character String Search Unit 4 Knowledge Dictionary 5 Character String Candidate Evaluation Value Calculation Unit 6 Character String Candidate Selection Unit 7 Mistake Inference Unit 8 Delimiter Deletion and Insertion Unit 9 Corrected Character String 11 Character String Candidate 12 Similar Degree evaluation unit 13 Character appearance frequency evaluation unit 14 Character evaluation value calculation unit 15 Non-candidate character evaluation unit 16 Character string evaluation value derivation unit 17 Character string candidate evaluation value 21 Candidate character set 22 High character evaluation value character selection unit 23 Delimiter search Part 24 Incorrect insertion inference part 25 Partial match search part 26 Incorrect omission inference part 27 Corrected character string 28 Delimiter deletion instruction 29 Delimiter insertion instruction

───────────────────────────────────────────────────── フロントページの続き (72)発明者萱嶋一弘大阪府門真市大字門真1006番地松下電器産業株式会社内 (56)参考文献特開平４−120679（ＪＰ，Ａ) 特開平４−205457（ＪＰ，Ａ) 特開平５−62022（ＪＰ，Ａ) 特開平２−116988（ＪＰ，Ａ) 特開平５−174195（ＪＰ，Ａ) 特開平２−121078（ＪＰ，Ａ) 特開平４−77980（ＪＰ，Ａ) 特開平４−205457（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06K 9/72 ────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Kazuhiro Kayashima 1006 Kadoma Kadoma, Osaka Prefecture Matsushita Electric Industrial Co., Ltd. (56) References JP-A-4-120679 (JP, A) JP-A-4- 205457 (JP, A) JP-A-5-62022 (JP, A) JP-A-2-116988 (JP, A) JP-A-5-174195 (JP, A) JP-A-2-121078 (JP, A) JP-A-4-77980 (JP, A) JP-A-4-205457 (JP, A) (58) Fields investigated (Int. Cl. ⁷ , DB name) G06K 9/72

Claims

(57) [Claims]

A character recognition unit for recognizing an input image character by character and outputting n candidate characters and recognition similarity for each character, and a partial match between a combination of a character string and a candidate character included in a knowledge dictionary. A character string search unit that searches for character strings to be output and outputs character string candidates, and n character strings at each character position of each character string candidate
Of the recognition similarity of the candidate character and the character string included in the knowledge dictionary
And string candidate evaluation value calculating unit for obtaining the character string candidate evaluation value appearance frequency of definitive to each character position based on the character string candidate selection unit for selecting a character string candidates with the highest string candidate evaluation value , a false description inference unit for inferring a registrant erroneous entry from the character string candidate and the candidate character set selected, the delimiter deletion insertion unit for deleting or insertion of false description character
Character recognition apparatus characterized by comprising.

Character wherein character string candidate evaluation value calculation unit, and a similarity evaluation unit for recognizing similarities or et similarity evaluation of n candidate characters in each character position of the character string candidates, included in the knowledge dictionary A character appearance frequency evaluation unit that evaluates the appearance frequency of a character at each character position in the column; a character evaluation value calculation unit that obtains a character evaluation value at each character position of a character string candidate from similarity evaluation and character appearance frequency evaluation; character recognition system according to claim 1, characterized in that it comprises a string evaluation value deriving portion for obtaining the string evaluation value from the evaluation value.

Statement 3. A string candidate evaluation value calculation unit, and a similarity evaluation unit for recognizing similarities or et similarity evaluation of n candidate characters in each character position of the character string candidates, included in the knowledge dictionary
A sentence that evaluates the frequency of occurrence of characters at each character position in the character string
A character appearance frequency evaluation unit, a non-candidate character evaluation unit that performs a non-candidate character evaluation when there is no matching character among the candidate characters from the similarity evaluation and the character appearance frequency evaluation, and a character character recognition system according to claim 1, characterized in that it comprises a string evaluation value introducing unit for obtaining the character string evaluation value from the evaluation value.

4. A high character evaluation value character selecting unit for selecting a candidate character having a character evaluation value equal to or greater than a predetermined value, and a candidate character selected by the high character evaluation value character selecting unit. and delimiter character search unit that searches for the delimiter character from the string climate
Complementary evaluation value is less than a certain value and separated by delimiter search part
When a character can be searched, the delimiter is
Erroneous insertion of characters and delimiter
Character recognition system according to claim 1, characterized in that it comprises a false insertion inference unit <br/> deletes the candidate character in character positions.

5. An erroneous entry inference unit comprising: a high character evaluation value character selection unit for selecting a candidate character having a character evaluation value equal to or greater than a certain value; and a character string of a knowledge dictionary and a high character evaluation value selection unit. Selected
Before performing a partial match search with the combination of candidate characters
Partial match search part, character string candidate evaluation value is below a certain value
And if a partial match string can be searched,
The first character position that was not entered was incorrectly omitted by the writer.
And the partial match does not occur for the candidate character set.
First character recognition system according to claim 1, characterized in that it comprises a false omitted inference unit <br/> and inserting the delimiter character positions.

6. A character string candidate evaluation value calculation unit calculates Ri as a character string
The recognition similarity of the candidate character at the character position i in the candidate and
Pi is a sentence in which each character of the character string candidate is included in the knowledge dictionary
F (Pi) is the probability of occurrence at the same character position i in the character string
Is a monotonically decreasing function, the product of Ri and f (Pi)
2. A character string evaluation value is obtained by:
Character recognition device according to the description.