JP3255816B2 - Character recognition device - Google Patents

Character recognition device

Info

Publication number
JP3255816B2
JP3255816B2 JP02708095A JP2708095A JP3255816B2 JP 3255816 B2 JP3255816 B2 JP 3255816B2 JP 02708095 A JP02708095 A JP 02708095A JP 2708095 A JP2708095 A JP 2708095A JP 3255816 B2 JP3255816 B2 JP 3255816B2
Authority
JP
Japan
Prior art keywords
character
candidate
character string
evaluation value
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
JP02708095A
Other languages
Japanese (ja)
Other versions
JPH08221521A (en
Inventor
寿男 丹羽
浩司 山本
英嗣 前川
一弘 萱嶋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Corp
Panasonic Holdings Corp
Original Assignee
Panasonic Corp
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Panasonic Corp, Matsushita Electric Industrial Co Ltd filed Critical Panasonic Corp
Priority to JP02708095A priority Critical patent/JP3255816B2/en
Publication of JPH08221521A publication Critical patent/JPH08221521A/en
Application granted granted Critical
Publication of JP3255816B2 publication Critical patent/JP3255816B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【0001】[0001]

【産業上の利用分野】本発明は、帳票などに記載されて
いる文字を読み取って認識するための文字認識装置に関
するものである。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognizing device for reading and recognizing characters described in a form or the like.

【0002】[0002]

【従来の技術】従来から、文字認識処理によって得られ
た結果に対して、知識処理を導入して認識精度の向上が
図られている。この知識処理は、1文字ごとの認識結果
に対して、知識辞書との照合を行うことにより、認識結
果を最も確からしい文字に修正する方式である。認識対
象の内容により、知識辞書としては、単語辞書、地名辞
書、人名辞書、品番辞書などが用いられる。
2. Description of the Related Art Hitherto, knowledge processing has been introduced to the results obtained by character recognition processing to improve recognition accuracy. This knowledge processing is a method of correcting the recognition result to the most likely character by comparing the recognition result for each character with a knowledge dictionary. Depending on the content of the recognition target, a word dictionary, a place name dictionary, a personal name dictionary, a part number dictionary, or the like is used as the knowledge dictionary.

【0003】図6は、従来の文字認識装置の構成を示す
ブロック図である。図6において、文字認識部52は、帳
票画像51を読み込み、1文字に付きn個の候補文字と認
識類似度を出力する。認識類似度は値が大きい程その候
補文字が確からしいことを表している。文字候補評価値
演算部53は、文字認識部52で出力された文字類似度をも
とに候補文字それぞれについて文字候補評価値を求め
る。例えば、それぞれの認識類似度を第2候補の認識類
似度で割って正規化したものを文字候補評価値とする。
文字列検索部54は、知識辞書55を用いて、候補文字列集
合の中から知識辞書に含まれる文字列を構成する文字の
組み合せを求め、文字列候補評価値演算部56で、知識辞
書55との一致文字数や文字候補評価値などに基づいて文
字列候補評価値を求める。文字列候補選択部57で、文字
列候補評価値が最も高い文字列候補を選択し、この文字
列候補を認識結果として修正文字列58を出力する。
FIG. 6 is a block diagram showing a configuration of a conventional character recognition device. In FIG. 6, a character recognizing unit 52 reads a form image 51 and outputs n candidate characters per character and recognition similarity. The larger the value of the recognition similarity, the more likely that the candidate character is. The character candidate evaluation value calculation unit 53 obtains a character candidate evaluation value for each candidate character based on the character similarity output from the character recognition unit 52. For example, a value obtained by dividing each recognition similarity by the recognition similarity of the second candidate and normalizing the result is used as a character candidate evaluation value.
The character string search unit 54 uses the knowledge dictionary 55 to obtain a combination of characters constituting a character string included in the knowledge dictionary from the candidate character string set, and the character string candidate evaluation value calculation unit 56 The character string candidate evaluation value is obtained based on the number of characters that match the character string and the character candidate evaluation value. The character string candidate selecting unit 57 selects a character string candidate having the highest character string candidate evaluation value, and outputs a corrected character string 58 as a recognition result of the character string candidate.

【0004】以上のようにして、文字列を認識すること
により、文字認識部が誤った認識をした文字を修正する
ことができ、認識の向上を図ることができる。
[0004] By recognizing a character string as described above, a character that is incorrectly recognized by the character recognition unit can be corrected, and recognition can be improved.

【0005】[0005]

【発明が解決しようとする課題】しかしながら、上記従
来の構成では、認識対象の画像が傾いていたり、ノイズ
がのっていた場合、文字認識部から出力される候補文字
に対する類似度が大きく変動し、それに伴い文字列候補
評価値も変動することから、知識処理において、誤った
候補を選択してしまうことがある。
However, in the above-described conventional configuration, when the image to be recognized is inclined or has noise, the similarity to the candidate character output from the character recognition unit greatly varies. Since the character string candidate evaluation value changes accordingly, an incorrect candidate may be selected in the knowledge processing.

【0006】また、帳票などの記入において、記入者が
誤って区切り文字(住所においては「大字」「字」、品
番においては「−」「/」など)を挿入したり、省略し
たりすることがある。この場合に、文字位置がずれるの
で、文字列検索部では知識辞書との照合ができずに正し
い文字列候補を検索することができない。
In addition, in the entry of a form or the like, a person who inserts or omits a delimiter character (such as "large" or "letter" in an address and "-" or "/" in a product number) by mistake. There is. In this case, since the character position is shifted, the character string search unit cannot perform matching with the knowledge dictionary and cannot search for a correct character string candidate.

【0007】本発明はこのような従来の課題を解決する
もので、確からしい文字をもとに評価値を求め、記入者
の誤りを推論することにより、文字認識率を高くするこ
とを目的としている。
SUMMARY OF THE INVENTION The present invention solves such a conventional problem, and aims at obtaining a character recognition rate by obtaining an evaluation value based on a likely character and inferring an error of a writer. I have.

【0008】[0008]

【課題を解決するための手段】本発明は上記目的を達成
するために、類似度最大差検出部において各候補文字の
文字認識部での類似度の分布を調べ、類似度の分布に基
づいて文字候補評価値を求める。さらに、記入者の記入
誤りを推論し、誤り文字を削除および挿入して正しい文
字列を求めるものである。
In order to achieve the above object, the present invention examines the distribution of similarity in a character recognition unit for each candidate character in a maximum similarity difference detection unit, and based on the distribution of similarity. Find the character candidate evaluation value. Further, it infers a typographical error of the writer and deletes and inserts erroneous characters to obtain a correct character string.

【0009】[0009]

【作用】本発明は上記した構成により、認識対象の画像
にノイズなどがのっていた場合でも、各候補文字の類似
度の傾向から文字候補評価値を求めるので、信頼度の低
い文字は評価値への寄与が下がり誤認識を減らすことが
できる。さらに、記入者の記入誤りを推論することによ
り、記入誤りがある場合においても正しい文字列を求め
ることができ、よって文字認識率が向上する。
According to the present invention, the character candidate evaluation value is obtained from the tendency of the similarity of each candidate character even if noise is present in the image to be recognized. The contribution to the value is reduced, and false recognition can be reduced. Further, by inferring a typographical error of the writer, a correct character string can be obtained even when there is a typographical error, and the character recognition rate is improved.

【0010】[0010]

【実施例】以下、本発明の第1の実施例について説明す
る。図1にこの実施例の文字認識装置の全体の構成を示
す。
The first embodiment of the present invention will be described below. FIG. 1 shows the overall configuration of the character recognition device of this embodiment.

【0011】図1において、まず、文字認識部2は、帳
票画像1より文字認識を行い、1文字につき第1候補文
字から第n候補文字までのn個の候補文字を持つ候補文
字集合と各候補文字の認識類似度を出力する。類似度最
大差検出部3は、文字認識部2から出力された候補文字
で第m候補文字の類似度と第m+1候補文字の類似度の
差が最大になる候補文字を検出する。文字候補評価値演
算部4は、類似度の差が最大になる候補文字と候補文字
の類似度から文字候補評価値を求める。文字列検索部5
は、各候補文字集合の組み合せの中から、知識辞書6を
検索することにより、文字列候補となる組み合せを選び
出す。文字列候補評価値演算部7は、文字列検索部5で
検索された文字列候補を知識辞書との一致度、文字候補
評価値を基準として、文字列候補評価値を計算する。文
字列候補選択部8は、文字列候補の中で文字列候補評価
値の最も大きい文字列候補を選択する。
In FIG. 1, first, a character recognizing unit 2 performs character recognition from a form image 1 and sets a candidate character set having n candidate characters from a first candidate character to an n-th candidate character per character. Output the recognition similarity of the candidate character. The maximum similarity difference detecting unit 3 detects a candidate character having the maximum difference between the similarity of the mth candidate character and the similarity of the (m + 1) th candidate character among the candidate characters output from the character recognizing unit 2. The character candidate evaluation value calculation unit 4 calculates a character candidate evaluation value from the similarity between the candidate character having the largest similarity difference and the candidate character. String search section 5
Selects a combination that becomes a character string candidate by searching the knowledge dictionary 6 from among combinations of each candidate character set. The character string candidate evaluation value calculation unit 7 calculates a character string candidate evaluation value based on the degree of coincidence of the character string candidate searched by the character string search unit 5 with the knowledge dictionary and the character candidate evaluation value. The character string candidate selection unit 8 selects a character string candidate having the largest character string candidate evaluation value among the character string candidates.

【0012】上記の構成の文字認識装置において次のよ
うにして文字認識を行う。まず、帳票の画像を文字認識
部2で処理し、1文字につき第1候補文字から第n候補
文字までのn個の候補文字を持つ候補文字集合と各候補
文字の認識類似度A(i,j)(iは文字位置、jは第j候
補)を出力する。認識類似度A(i,j)は、値が大きい程
候補文字が正解である確率が高い。図2は、1文字につ
き第1候補文字から第5候補文字までの文字認識を行っ
た結果である。
The character recognition device having the above-described structure performs character recognition as follows. First, an image of a form is processed by the character recognition unit 2, and a candidate character set having n candidate characters from a first candidate character to an nth candidate character per character and a recognition similarity A (i, j) (i is a character position, j is a j-th candidate) is output. The larger the value of the recognition similarity A (i, j) , the higher the probability that the candidate character is correct. FIG. 2 shows the result of performing character recognition from the first candidate character to the fifth candidate character for each character.

【0013】類似度最大差検出部3で、各文字位置iご
とに A(i,k) − A(i,k+1) (1≦ k <n) が最大となる第k候補文字を求める。文字候補評価値演
算部4では、第k候補文字より上位の候補文字(第k候
補文字を含む)と下位の候補文字とで算出方式を変えて
文字候補評価値R(i,j)(iは文字位置、jは第j候
補)を求める。例えば、第k候補文字より上位の候補文
字では R(i,j)=A(i,j)/A(i,k+1)、第k候補文字よ
り下位の候補文字では R(i,j)=A(i,j)/A(i,1)
どで求めることができる。すなわち、第k候補文字より
上位の候補文字の文字候補評価値の方が下位の候補文字
より高い評価値となる。このように、類似度の最大差と
なる部分で候補文字をグループ分けするし、類似度最大
差の候補を基準にすることにより、相対値である認識類
似度から絶対値である文字候補評価値を求めることがで
きる。
The maximum similarity difference detecting section 3 obtains a k-th candidate character which maximizes A (i, k) -A (i, k + 1) (1 ≦ k <n) for each character position i. . The character candidate evaluation value calculation unit 4 changes the calculation method for candidate characters higher than the k-th candidate character (including the k-th candidate character) and lower-order candidate characters, and changes the character candidate evaluation value R (i, j) (i Is the character position, and j is the j-th candidate). For example, R (i, j) = A (i, j) / A (i, k + 1) for a candidate character higher than the k-th candidate character, and R (i, j ) for a candidate character lower than the k-th candidate character. ) = A (i, j) / A (i, 1) . That is, the character candidate evaluation value of the candidate character higher than the k-th candidate character has a higher evaluation value than the lower candidate character. In this way, by grouping candidate characters in a portion having a maximum similarity difference and using the candidate having the maximum similarity difference as a reference, a character candidate evaluation value which is an absolute value from a recognition similarity which is a relative value Can be requested.

【0014】文字列検索部5では、知識辞書6に含まれ
る文字列の中から候補文字集合の組み合せと部分一致す
る文字列を文字列候補として抽出する。例えば、図2の
候補文字集合と図3に示す知識辞書との部分一致を行っ
た結果、「FY−38N」,「JN−28」,「JP−
28M」,「LR−V08」が文字列候補として抽出さ
れる。文字列検索部5から出力された文字列候補から、
文字列候補評価値演算部7で文字列候補評価値Cを求め
る。文字列候補評価値Cは、C=ΣBi で求める。ただ
し、Bi は、文字列候補の文字位置iの文字が文字位置
iの第j候補文字に存在するときは Bi =R(i,j)
あり、文字位置iの候補文字の中に存在しないときはB
i = 0 である。文字列候補選択部8は、各文字列候補
の文字列候補評価値を比較し、最も文字列候補評価値の
大きい文字列候補を選択し、選択した文字列候補を修正
文字列として出力する。
The character string search unit 5 extracts character strings that partially match the combination of candidate character sets from the character strings included in the knowledge dictionary 6 as character string candidates. For example, as a result of performing partial matching between the candidate character set of FIG. 2 and the knowledge dictionary shown in FIG. 3, “FY-38N”, “JN-28”, “JP-38N”
28M "and" LR-V08 "are extracted as character string candidates. From the character string candidates output from the character string search unit 5,
The character string candidate evaluation value calculator 7 obtains a character string candidate evaluation value C. The character string candidate evaluation value C is obtained by C = ΣBi. However, Bi is Bi = R (i, j) when the character at the character position i of the character string candidate exists in the j-th candidate character at the character position i, and does not exist in the candidate character at the character position i. Time B
i = 0. The character string candidate selection unit 8 compares the character string candidate evaluation values of the character string candidates, selects a character string candidate having the largest character string candidate evaluation value, and outputs the selected character string candidate as a corrected character string.

【0015】以上のように各候補文字の類似度の差が最
大となる候補を基準として文字候補評価値を求めること
により、類似の字形が存在する文字に対しても正しい文
字候補評価値を求めることが可能となった。
As described above, the character candidate evaluation value is obtained based on the candidate having the largest similarity difference between the candidate characters, thereby obtaining the correct character candidate evaluation value even for a character having a similar character shape. It became possible.

【0016】次に、本発明の第2の実施例について説明
する。図4にこの実施例の文字認識装置の全体の構成を
示す。
Next, a second embodiment of the present invention will be described. FIG. 4 shows the overall configuration of the character recognition device of this embodiment.

【0017】文字認識部2は、帳票画像1より文字認識
を行い、1文字につき第1候補文字から第n候補文字ま
でのn個の候補文字を持つ候補文字集合と各候補文字の
認識類似度を出力する。文字候補評価値演算部4は、候
補文字の類似度をもとに文字候補評価値を求める。文字
列検索部5は、各候補文字集合の組み合せの中から、知
識辞書6を検索することにより、文字列候補となる組み
合せを選び出す。文字列候補評価値演算部7は、文字列
検索部5で検索された文字列候補を知識辞書との一致
度、文字候補評価値を基準として、文字列候補評価値を
計算する。文字列候補選択部8は、文字列候補の中で文
字列候補評価値の最も大きい文字列候補を選択する。
A character recognizing unit 2 performs character recognition from the form image 1, and sets a candidate character set having n candidate characters from a first candidate character to an nth candidate character per character and the recognition similarity of each candidate character. Is output. The character candidate evaluation value calculation unit 4 calculates a character candidate evaluation value based on the similarity of the candidate characters. The character string search unit 5 selects a combination that becomes a character string candidate from the combinations of the candidate character sets by searching the knowledge dictionary 6. The character string candidate evaluation value calculation unit 7 calculates a character string candidate evaluation value based on the degree of coincidence of the character string candidate searched by the character string search unit 5 with the knowledge dictionary and the character candidate evaluation value. The character string candidate selection unit 8 selects a character string candidate having the largest character string candidate evaluation value among the character string candidates.

【0018】誤記入推論部9は、文字列候補選択部8で
選択された文字列候補の文字列候補評価値と候補文字集
合から、記入者の誤記入がないか推論する。区切り文字
削除挿入部10は、記入者の誤記入があった場合に、誤記
入文字の削除および挿入を行う。図5は誤記入推論部9
の内部構成であり、これについて説明する。
The erroneous entry inference unit 9 infers whether there is an erroneous entry by the writer from the character string candidate evaluation value of the character string candidate selected by the character string candidate selection unit 8 and the candidate character set. The delimiter deletion / insertion unit 10 deletes and inserts erroneously entered characters when there is an erroneous entry by the writer. FIG. 5 shows an incorrect entry inference unit 9.
This will be described.

【0019】高文字評価値文字選択部22は、文字評価値
の高い候補文字を選択し、区切り文字検索部23は、高文
字評価値文字選択部22から出力された候補文字の中から
区切り文字を検索する。誤挿入推論部24は、記入者の区
切り文字誤挿入を推論し、誤挿入があると推論したとき
は、区切り文字削除指示30を出力する。誤認識文字追加
部25は、誤認識辞書28を用いて高文字評価値文字選択部
22から出力された候補文字をもとに誤認識されやすい文
字を追加する。前方部分一致検索部26は、高文字評価値
文字選択部22から出力された候補文字の中で、知識辞書
に含まれる文字列と前方部分一致する文字を検索する。
誤省略推論部27は、記入者の区切り文字誤省略を推論
し、誤省略があると推論したときは、区切り文字挿入指
示31を出力する。
The high character evaluation value character selection unit 22 selects a candidate character having a high character evaluation value, and the delimiter search unit 23 selects a delimiter character from the candidate characters output from the high character evaluation value character selection unit 22. Search for. The erroneous insertion inference unit 24 infers the erroneous insertion of the delimiter by the writer, and outputs the delimiter deletion instruction 30 when inferring that there is an erroneous insertion. The misrecognition character adding unit 25 uses the misrecognition dictionary 28 to select a high character evaluation value character selection unit.
Add a character that is likely to be misrecognized based on the candidate characters output from 22. The front part match search unit 26 searches the candidate characters output from the high character evaluation value character selection unit 22 for a character whose front part matches the character string included in the knowledge dictionary.
The erroneous omission inference unit 27 infers the erroneous omission of the delimiter of the writer, and outputs the delimiter insertion instruction 31 when inferring that there is an erroneous omission.

【0020】上記の構成の文字認識装置において次のよ
うにして文字認識を行う。まず、帳票の画像を文字認識
部2で処理し、1文字につき第1候補文字から第n候補
文字までのn個の候補文字を持つ候補文字集合と各候補
文字の認識類似度A(i,j)(iは文字位置、jは第j候
補)を出力する。文字候補評価値演算部4では、各候補
文字の認識類似度等をもとに文字候補評価値R(i,j)
(iは文字位置、jは第j候補)を求める。
In the character recognition device having the above configuration,
To perform character recognition. First, character recognition of the form image
Processed by part 2 and the first to nth candidates for each character
Candidate character set with n candidate characters up to the character and each candidate
Character recognition similarity A(i, j)(I is the character position, j is the jth
Complement) is output. In the character candidate evaluation value calculation unit 4, each candidate
Character candidate evaluation value R based on character recognition similarity, etc.(i, j)
(I is a character position, j is a j-th candidate).

【0021】文字列検索部5では、知識辞書6に含まれ
る文字列の中から候補文字集合の組み合せと部分一致す
る文字列を文字列候補として抽出する。例えば、図2の
候補文字集合と図3に示す知識辞書との部分一致を行っ
た結果、「FY−38N」,「JN−28」,「JP−
28M」,「LR−V08」が文字列候補として抽出さ
れる。
The character string search unit 5 extracts character strings that partially match the combination of the candidate character sets from the character strings included in the knowledge dictionary 6 as character string candidates. For example, as a result of performing partial matching between the candidate character set of FIG. 2 and the knowledge dictionary shown in FIG. 3, “FY-38N”, “JN-28”, “JP-38N”
28M "and" LR-V08 "are extracted as character string candidates.

【0022】文字列検索部5から出力された文字列候補
から、文字列候補評価値演算部7で、文字列候補評価値
Cを求める。文字列候補評価値Cは、C=ΣBi で求め
る。ただし、Bi は、文字列候補の文字位置iの文字が
文字位置iの第j候補文字に存在するときは Bi = R
(i,j) であり、文字位置iの候補文字の中に存在しない
ときは Bi = 0 である。
A character string candidate evaluation value calculation unit 7 obtains a character string candidate evaluation value C from the character string candidates output from the character string search unit 5. The character string candidate evaluation value C is obtained by C = ΣBi. However, Bi = R when the character at the character position i of the character string candidate exists in the j-th candidate character at the character position i.
(i, j) , and Bi = 0 if not present in the candidate character at character position i.

【0023】文字列候補選択部8は、各文字列候補の文
字列候補評価値を比較し、最も文字列候補評価値の大き
い文字列候補を選択する。
The character string candidate selection section 8 compares the character string candidate evaluation values of the character string candidates and selects a character string candidate having the largest character string candidate evaluation value.

【0024】高文字評価値文字選択部22では、候補文字
集合の中から文字評価値が一定以上の値を持つ候補文字
を選択する。または、ある一定の候補(第m候補)を決め
て、その第m候補以内の候補文字( 1 < m < n )を
選択する。
The high character evaluation value character selection section 22 selects a candidate character having a character evaluation value of a certain value or more from a candidate character set. Alternatively, a certain candidate (m-th candidate) is determined, and candidate characters (1 <m <n) within the m-th candidate are selected.

【0025】区切り文字検索部23では、高文字評価値文
字選択部22で選択された候補文字の中から、候補文字の
中にある区切り文字を検索する。誤挿入推論部24では、
文字列候補選択部8が出力した文字列候補の文字列候補
評価値が一定の値以下すなわち正解である確率が低く、
区切り文字検索部23で区切り文字が検索できたとき、そ
の区切り文字の位置を区切り文字削除指示30として、区
切り文字削除挿入部10に出力する。一方、文字列候補選
択部8が出力した文字列候補の文字列候補評価値が一定
の値以上すなわち正解である確率が高いか、区切り文字
検索部23で区切り文字が検索できなかったときは、文字
列候補選択部8が出力した文字列候補を誤認識文字追加
部25に出力する。
The delimiter search unit 23 searches the candidate characters selected by the high character evaluation value character selection unit 22 for a delimiter in the candidate characters. In the incorrect insertion inference unit 24,
The probability that the character string candidate evaluation value of the character string candidate output by the character string candidate selection unit 8 is equal to or less than a certain value, that is, a correct answer, is low,
When the delimiter search unit 23 can search for the delimiter, the position of the delimiter is output to the delimiter deletion and insertion unit 10 as a delimiter deletion instruction 30. On the other hand, if the character string candidate evaluation value of the character string candidate output by the character string candidate selection unit 8 is equal to or more than a certain value, that is, the probability of being a correct answer is high, or if the delimiter search unit 23 cannot search for a delimiter, The character string candidates output by the character string candidate selection unit 8 are output to the misrecognized character addition unit 25.

【0026】誤認識辞書28は、文字認識部2で誤る確率
の高い文字に対して、誤認識の結果文字と正解の文字と
の組を記憶した辞書である。誤認識文字追加部25では、
高文字評価値文字選択部22で選択されたそれぞれの候補
文字が、誤認識辞書28の誤認識の結果文字と一致したな
らば、誤認識辞書28の対応する正解の文字を高文字評価
値文字選択部22で選択された候補文字に追加する。
The misrecognition dictionary 28 is a dictionary that stores a set of characters resulting from misrecognition and correct characters for characters having a high probability of being mistaken by the character recognition unit 2. In the misrecognized character addition section 25,
If each candidate character selected by the high character evaluation value character selection unit 22 matches a character as a result of erroneous recognition in the erroneous recognition dictionary 28, the corresponding correct character in the erroneous recognition dictionary 28 is changed to a high character evaluation value character. It is added to the candidate character selected by the selection unit 22.

【0027】前方部分一致検索部26では、高文字評価値
文字選択部22で選択された候補文字および誤認識文字追
加部25で追加された候補文字と知識辞書の文字列との組
み合せとで前方部分一致し、かつ知識辞書で一致しなか
った最初の文字が区切り文字である文字列を知識辞書か
ら検索する。誤省略推論部27は、文字列候補選択部8が
出力した文字列候補の文字列候補評価値が一定の値以下
すなわち正解である確率が低く、前方部分一致検索部26
から前方部分一致文字列が検索できたとき、部分一致し
なかった最初の文字の文字位置を区切り文字挿入指示31
として、区切り文字削除挿入部10に出力する。一方、文
字列候補選択部8が出力した文字列候補の文字列候補評
価値が一定の値以上すなわち正解である確率が高いか、
前方部分一致検索部26から前方部分一致文字列が検索で
きなかったときは、文字列候補選択部8が出力した文字
列候補を修正文字列11として出力する。
The front partial match search unit 26 uses the combination of the candidate character selected by the high character evaluation value character selection unit 22 and the candidate character added by the misrecognition character addition unit 25 with the character string of the knowledge dictionary. A character string in which the first character that partially matches and does not match in the knowledge dictionary is a delimiter is searched from the knowledge dictionary. The omitting omission inference unit 27 has a low probability that the character string candidate evaluation value of the character string candidate output by the character string candidate selection unit 8 is equal to or less than a certain value, that is, a correct answer.
When a partial match string can be searched for from, the character position of the first character that did not match partially is specified as a delimiter insertion instruction 31
Is output to the delimiter deletion / insertion unit 10. On the other hand, whether the probability that the character string candidate evaluation value of the character string candidate output by the character string candidate selection unit 8 is equal to or more than a certain value, that is, a correct answer is high,
When the front part matching search unit 26 cannot search for the front part matching character string, the character string candidate output by the character string candidate selection unit 8 is output as the corrected character string 11.

【0028】区切り文字削除挿入部10は、区切り文字削
除指示30または区切り文字挿入指示31に基づいて、候補
文字集合を編集する。区切り文字削除指示30の場合は、
候補文字集合に対して区切り文字の文字位置にある候補
文字を削除し、その文字位置以降の文字位置にある候補
文字を1字ずつ前の文字位置にずらす。また、区切り文
字挿入指示31の場合は、候補文字集合に対して区切り文
字の文字位置以降の文字位置にある候補文字を1字ずつ
後ろの文字位置にずらし、区切り文字の文字位置の第1
候補を区切り文字にする。そして、この候補文字集合を
文字列検索部5に出力する。
The delimiter deletion / insertion unit 10 edits a candidate character set based on a delimiter deletion instruction 30 or a delimiter insertion instruction 31. In the case of delimiter deletion instruction 30,
The candidate character at the character position of the delimiter character is deleted from the candidate character set, and the candidate characters at the character positions subsequent to the character position are shifted one character at a time to the previous character position. In the case of the delimiter insertion instruction 31, the candidate characters at the character positions subsequent to the delimiter character position are shifted one character at a time to the next character position with respect to the candidate character set.
Make the candidate a delimiter. Then, the candidate character set is output to the character string search unit 5.

【0029】以上のように誤認識辞書28を用いることに
より、文字認識部2の誤りを考慮した誤省略の推論が可
能となった。
By using the erroneous recognition dictionary 28 as described above, it is possible to infer erroneous omissions in consideration of errors in the character recognition unit 2.

【0030】第1の実施例と第2の実施例を用いた認識
実験を行った。帳票に書かれた品番の認識を行った結
果、後処理を行わない場合の認識率が29.5%、第1の
実施例のみ行った場合の認識率が81.9%、第1の実施
例と第2の実施例を両方とも行った場合の認識率が9
3.7%となった。
A recognition experiment was performed using the first embodiment and the second embodiment. As a result of recognizing the product number written on the form, the recognition rate when post-processing is not performed is 29.5%, and when only the first embodiment is performed, the recognition rate is 81.9%. The recognition rate when both the example and the second embodiment are performed is 9
It was 3.7%.

【0031】[0031]

【発明の効果】以上の実施例から明らかなように、本発
明の構成の文字認識装置を使用することにより、各候補
文字の類似度の差が最大となる候補を基準として文字候
補評価値を求めるので、類似の字形が存在する文字に対
しても正しい文字候補評価値を求めることができる。ま
た、誤認識辞書を用いるので文字認識部の誤りを考慮し
た誤省略の推論ができ、記入者が区切り文字を誤って省
略したときでも、正しい文字列を決定することができ
る。以上の構成で文字認識を行うために認識率が向上
し、その効果は大きい。
As is clear from the above embodiment, by using the character recognition device of the present invention, the character candidate evaluation value is determined based on the candidate having the maximum similarity difference between the candidate characters. Therefore, a correct character candidate evaluation value can be obtained for a character having a similar character shape. In addition, since an erroneous recognition dictionary is used, erroneous omission can be inferred in consideration of an error in the character recognition unit, and a correct character string can be determined even when a writer accidentally omits a delimiter. Since character recognition is performed with the above configuration, the recognition rate is improved, and the effect is large.

【図面の簡単な説明】[Brief description of the drawings]

【図1】本発明の第1の実施例の文字認識装置の構成図FIG. 1 is a configuration diagram of a character recognition device according to a first embodiment of the present invention.

【図2】本発明の第1の実施例の文字認識部の出力図FIG. 2 is an output diagram of a character recognition unit according to the first embodiment of the present invention.

【図3】本発明の第1の実施例の文字列検索部の出力図FIG. 3 is an output diagram of a character string search unit according to the first embodiment of the present invention.

【図4】本発明の第2の実施例の文字認識装置の構成図FIG. 4 is a configuration diagram of a character recognition device according to a second embodiment of the present invention.

【図5】本発明の第2の実施例の誤記入推論部の構成図FIG. 5 is a configuration diagram of an erroneous entry inference unit according to a second embodiment of the present invention;

【図6】従来の文字認識装置の構成を示すブロック図FIG. 6 is a block diagram showing a configuration of a conventional character recognition device.

【符号の説明】[Explanation of symbols]

1 帳票画像 2 文字認識部 3 類似度最大差検出部 4 文字候補評価値演算部 5 文字列検索部 6 知識辞書 7 文字列候補評価値演算部 8 文字列候補選択部 9 誤記入推論部 10 区切り文字削除挿入部 11 修正文字列 21 候補文字集合 22 高文字評価値文字選択部 23 区切り文字検索部 24 誤挿入推論部 25 誤認識文字追加部 26 前方部分一致検索部 27 誤省略推論部 28 誤認識辞書 29 修正文字列 30 区切り文字削除指示 31 区切り文字挿入指示 1 Form image 2 Character recognition unit 3 Maximum similarity difference detection unit 4 Character candidate evaluation value calculation unit 5 Character string search unit 6 Knowledge dictionary 7 Character string candidate evaluation value calculation unit 8 Character string candidate selection unit 9 Error entry reasoning unit 10 Delimiter Character deletion / insertion section 11 Corrected character string 21 Candidate character set 22 High-character evaluation value character selection section 23 Delimiter search section 24 Misinsertion inference section 25 Misrecognition character addition section 26 Forward partial match search section 27 False omission inference section 28 Misrecognition Dictionary 29 Corrected character string 30 Delimiter deletion instruction 31 Delimiter insertion instruction

───────────────────────────────────────────────────── フロントページの続き (72)発明者 前川 英嗣 大阪府門真市大字門真1006番地 松下電 器産業株式会社内 (72)発明者 萱嶋 一弘 大阪府門真市大字門真1006番地 松下電 器産業株式会社内 (56)参考文献 特開 平4−205457(JP,A) 特開 平5−62022(JP,A) ──────────────────────────────────────────────────続 き Continuing on the front page (72) Inventor Eiji Maekawa 1006 Kadoma Kadoma, Osaka Prefecture Matsushita Electric Industrial Co., Ltd. (56) References JP-A-4-205457 (JP, A) JP-A-5-62022 (JP, A)

Claims (2)

(57)【特許請求の範囲】(57) [Claims] 【請求項1】入力画像を1文字ずつ認識し、1文字につ
きn個の候補文字と認識類似度を出力する文字認識部
と、文字位置iの第j候補の認識類似度をA(i,j)とし
たときに、各候補文字の認識類似度差A(i,j)−A(i,j+
1)が最大になる候補を検出する類似度最大差検出部と、
認識類似度A(i,k)−A(i,k+1)が最大となる候補文字を
第k候補とすると、第k候補から第k候補を含む上位の
候補文字と、第k候補より下位の候補文字とで算出方式
を変えて前記第k候補から第k候補を含む上位の候補文
字の方が前記第k候補より下位の候補文字より高い文字
候補評価値を求める文字候補評価値演算部と、知識辞書
に含まれる文字列と候補文字の組み合せとで部分一致す
る文字列を検索し文字列候補を出力する文字列検索部
と、各文字列候補の文字候補評価値をもとにして文字列
候補評価値を求める文字列候補評価値演算部と、最も大
きい文字列候補評価値を持つ文字列候補を選択する文字
列候補選択部とを備えたことを特徴とする文字列認識装
置。
1. A character recognition unit for recognizing an input image character by character and outputting n candidate characters and recognition similarity for each character, and a recognition similarity of a j-th candidate at a character position i being A (i, j) , the recognition similarity difference A (i, j) −A (i, j +
1) a similarity maximum difference detecting unit that detects a candidate that maximizes,
Assuming that the candidate character having the maximum recognition similarity A (i, k) -A (i, k + 1) is the k-th candidate, the top candidate characters including the k-th candidate from the k-th candidate and the k-th candidate A character candidate evaluation value calculation in which the calculation method is changed between lower-order candidate characters and a higher-order candidate character including the k-th candidate and a higher-order candidate character including the k-th candidate obtains a higher character candidate evaluation value than the lower-rank candidate characters And a character string search unit that searches for a character string that partially matches a combination of a character string and a candidate character included in the knowledge dictionary and outputs a character string candidate, based on the character candidate evaluation value of each character string candidate Character string recognition apparatus, comprising: a character string candidate evaluation value calculating unit for obtaining a character string candidate evaluation value by using a character string candidate selecting unit for selecting a character string candidate having the largest character string candidate evaluation value. .
【請求項2】入力画像を1文字ずつ認識し、1文字につ
きn個の候補文字と認識類似度を出力する文字認識部
と、各候補文字の認識類似度から文字候補評価値を求め
る文字候補評価値演算部と、知識辞書に含まれる文字列
と候補文字の組み合せとで部分一致する文字列を検索し
文字列候補を出力する文字列検索部と、各文字列候補の
文字候補評価値をもとにして文字列候補評価値を求める
文字列候補評価値演算部と、最も大きい文字列候補評価
値を持つ文字列候補を選択する文字列候補選択部と、選
択された文字列候補と候補文字集合から記入者の誤記入
を推論する誤記入推論部と、誤記入文字の削除または挿
入を行う区切り文字削除挿入部とを備え、 前記誤記入推論部が、文字評価値が一定の値以上の値を
持つ候補文字を選択する高文字評価値文字選択部と、前
記高文字評価値文字選択部で選択された候補文字から区
切り文字を検索する区切り文字検索部と、文字列候補評
価値が一定の値以下であり区切り文字検索部で区切り文
字が検索できたとき、該区切り文字を記入者の区切り文
字誤挿入と推論し、候補文字集合に対して該区切り文字
の文字位置にある候補文字を削除する誤挿入推論部と、
前記高文字評価値文字選択部で選択された候補文字が誤
認識辞書の誤認識結果文字であるならば前記誤認識辞書
の正解文字を候補文字に追加する誤認識文字追加部と、
知識辞書の文字列と前記高文字評価値文字選択部で選択
された候補文字および前記誤認識文字追加部で追加され
た候補文字との組み合せで前方部分一致検索を行う前方
部分一致検索部と、文字列候補評価値が一定の値以下で
あり前方部分一致文字列が検索できたとき、部分一致し
なかった最初の文字位置を記入者の区切り文字誤省略と
推論し、候補文字集合に対して前記部分一致しなかった
最初の文字位置に区切り文字を挿入する誤省略推論部を
有することを特徴とする文字認識装置。
2. A character recognition unit for recognizing an input image character by character and outputting n candidate characters and recognition similarity for each character, and a character candidate for obtaining a character candidate evaluation value from the recognition similarity of each candidate character. An evaluation value calculation unit, a character string search unit that searches for a character string that partially matches a combination of a character string included in the knowledge dictionary and a candidate character, and outputs a character string candidate, and a character candidate evaluation value of each character string candidate. A character string candidate evaluation value calculation unit for obtaining a character string candidate evaluation value based on the character string candidate evaluation unit, a character string candidate selection unit for selecting a character string candidate having the largest character string candidate evaluation value, and a selected character string candidate and candidate. An erroneous entry inference unit that infers an erroneous entry of a writer from a character set, and a delimiter deletion insertion unit that deletes or inserts an erroneous entry character, wherein the erroneous entry inference unit has a character evaluation value of a certain value or more. Character selection to select candidate characters with a value of A value character selection unit, a delimiter search unit that searches for a delimiter from the candidate characters selected by the high character evaluation value character selection unit, and a delimiter search unit that determines that the character string candidate evaluation value is equal to or less than a certain value. When a character can be searched, the delimiter is inferred to be a delimiter erroneous insertion of a writer, and an erroneous insertion inference unit that deletes a candidate character at a character position of the delimiter from a candidate character set,
If the candidate character selected by the high character evaluation value character selection unit is a misrecognition result character of the misrecognition dictionary, a misrecognition character adding unit that adds the correct character of the misrecognition dictionary to the candidate character,
A front part match search unit that performs a front part match search using a combination of a character string of a knowledge dictionary and a candidate character selected by the high character evaluation value character selection unit and a candidate character added by the misrecognition character addition unit; When the character string candidate evaluation value is equal to or less than a certain value and a partial partial match character string can be searched, the first character position that does not partially match is inferred to be the omission of the delimiter character of the writer, and A character recognition device comprising: an erroneous omission inference unit that inserts a delimiter at a first character position where the partial mismatch occurs.
JP02708095A 1995-02-15 1995-02-15 Character recognition device Expired - Fee Related JP3255816B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP02708095A JP3255816B2 (en) 1995-02-15 1995-02-15 Character recognition device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP02708095A JP3255816B2 (en) 1995-02-15 1995-02-15 Character recognition device

Publications (2)

Publication Number Publication Date
JPH08221521A JPH08221521A (en) 1996-08-30
JP3255816B2 true JP3255816B2 (en) 2002-02-12

Family

ID=12211109

Family Applications (1)

Application Number Title Priority Date Filing Date
JP02708095A Expired - Fee Related JP3255816B2 (en) 1995-02-15 1995-02-15 Character recognition device

Country Status (1)

Country Link
JP (1) JP3255816B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115169335A (en) * 2022-09-07 2022-10-11 深圳高灯计算机科技有限公司 Invoice data calibration method and device, computer equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115169335A (en) * 2022-09-07 2022-10-11 深圳高灯计算机科技有限公司 Invoice data calibration method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
JPH08221521A (en) 1996-08-30

Similar Documents

Publication Publication Date Title
KR910007531B1 (en) Syllable recognition device
CN111859921A (en) Text error correction method and device, computer equipment and storage medium
JP3255816B2 (en) Character recognition device
JPS5854433B2 (en) Difference detection device
JP3085107B2 (en) Character recognition device
Chaudhuri et al. OCR error detection and correction of an inflectional indian language script
JP3350127B2 (en) Character recognition device
JPH11328316A (en) Device and method for character recognition and storage medium
JP2827066B2 (en) Post-processing method for character recognition of documents with mixed digit strings
JP4318223B2 (en) Document proofing apparatus and program storage medium
JPS646514B2 (en)
JPH0290384A (en) Post-processing system for character recognizing device
JP3123181B2 (en) Character recognition device
JPH0757059A (en) Character recognition device
JP3264961B2 (en) Character recognition device
JP3924899B2 (en) Text search apparatus and text search method
JP2006294069A (en) Document corrector and program storage medium
Muliadi et al. Comparison of String Similarity Algorithm in post-processing OCR
CN113901795A (en) Chinese spelling error correction method based on behavior data statistics
JPS60138689A (en) Character recognizing method
JP3725206B2 (en) Character recognition device
JP3659688B2 (en) Character recognition device
JPH05225402A (en) Character recognition device
CN111639488A (en) English word correction system, method, application, device and readable storage medium
JPH02118785A (en) Method for correcting erroneous recognition

Legal Events

Date Code Title Description
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20071130

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20081130

Year of fee payment: 7

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20091130

Year of fee payment: 8

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20091130

Year of fee payment: 8

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20101130

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20111130

Year of fee payment: 10

LAPS Cancellation because of no payment of annual fees