JP3085107B2 - Character recognition device - Google Patents

Character recognition device

Info

Publication number
JP3085107B2
JP3085107B2 JP06268640A JP26864094A JP3085107B2 JP 3085107 B2 JP3085107 B2 JP 3085107B2 JP 06268640 A JP06268640 A JP 06268640A JP 26864094 A JP26864094 A JP 26864094A JP 3085107 B2 JP3085107 B2 JP 3085107B2
Authority
JP
Japan
Prior art keywords
character
candidate
evaluation value
unit
string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP06268640A
Other languages
Japanese (ja)
Other versions
JPH08129616A (en
Inventor
寿男 丹羽
浩司 山本
英嗣 前川
一弘 萱嶋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Corp
Panasonic Holdings Corp
Original Assignee
Panasonic Corp
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Panasonic Corp, Matsushita Electric Industrial Co Ltd filed Critical Panasonic Corp
Priority to JP06268640A priority Critical patent/JP3085107B2/en
Publication of JPH08129616A publication Critical patent/JPH08129616A/en
Application granted granted Critical
Publication of JP3085107B2 publication Critical patent/JP3085107B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【0001】[0001]

【産業上の利用分野】本発明は、帳票などに記載されて
いる文字を読み取って認識するための文字認識装置に関
するものである。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognizing device for reading and recognizing characters described in a form or the like.

【0002】[0002]

【従来の技術】従来から、文字認識処理によって得られ
た結果に対して、知識処理を導入して認識精度の向上が
図られている。この知識処理は、1文字ごとの認識結果
に対して、知識辞書との照合を行うことにより、認識結
果を最も確からしい文字に修正する方式である。認識対
象の内容により、知識辞書としては、単語辞書、地名辞
書、人名辞書、品番辞書などが用いられる。
2. Description of the Related Art Hitherto, knowledge processing has been introduced to the results obtained by character recognition processing to improve recognition accuracy. This knowledge processing is a method of correcting the recognition result to the most likely character by comparing the recognition result for each character with a knowledge dictionary. Depending on the content of the recognition target, a word dictionary, a place name dictionary, a personal name dictionary, a part number dictionary, or the like is used as the knowledge dictionary.

【0003】図6は、従来の文字認識装置の構成を示す
図で、以下この図を用いてその動作を説明する。図に示
すように文字認識部52は、帳票画像51を読み込み、1文
字に付きn個の候補文字を出力する。文字列検索部53
は、知識辞書54を用いて、候補文字列集合の中から知識
辞書54に含まれる文字列を構成する文字の組み合せを求
め、文字列候補評価値演算部55で、知識辞書54との一致
文字数や文字認識部52での類似度などに基づいて文字列
候補評価値を求める。文字列候補選択部56で、文字列候
補評価値が最も高い文字列候補を選択し、この文字列候
補を認識結果として出力する。以上のようにして、文字
列を認識することにより、文字認識部52が誤って認識を
した文字を修正することができ、認識の向上を図ること
ができる。
FIG. 6 is a diagram showing a configuration of a conventional character recognition device. The operation of the device will be described below with reference to FIG. As shown in the figure, the character recognition unit 52 reads the form image 51 and outputs n candidate characters per character. String search section 53
Calculates a combination of characters constituting the character string included in the knowledge dictionary 54 from the candidate character string set using the knowledge dictionary 54, and calculates the number of matching characters with the knowledge dictionary 54 by the character string candidate evaluation value calculation unit 55. And a character string candidate evaluation value based on the similarity in the character recognition unit 52 and the like. The character string candidate selection unit 56 selects a character string candidate with the highest character string candidate evaluation value, and outputs this character string candidate as a recognition result. As described above, by recognizing the character string, the character that the character recognizing unit 52 has erroneously recognized can be corrected, and the recognition can be improved.

【0004】[0004]

【発明が解決しようとする課題】しかしながら上記従来
の構成では、認識対象の画像が傾いていたり、ノイズが
のっていた場合、文字認識部52から出力される候補文字
に対する類似度が大きく変動し、それに伴い文字列候補
評価値も変動することから、知識処理において、誤った
候補を選択してしまうことがある。
However, in the above-described conventional configuration, when the image to be recognized is tilted or has noise, the degree of similarity to the candidate character output from the character recognition unit 52 greatly varies. Since the character string candidate evaluation value changes accordingly, an incorrect candidate may be selected in the knowledge processing.

【0005】また、帳票などの記入において、記入者が
誤って区切り文字(例えば、住所においては「町」
「字」、品番においては「−」「/」など)を挿入した
り、省略したりすることがある。この場合に、文字位置
がずれるので、文字列検索部では知識辞書との照合がで
きずに正しい文字列候補を検索することができない。
[0005] In addition, in the entry of a form or the like, the entry person may mistakenly input a delimiter character (for example, "town" in an address).
In some cases, “characters” and “−” and “/” in product numbers) may be inserted or omitted. In this case, since the character position is shifted, the character string search unit cannot perform matching with the knowledge dictionary and cannot search for a correct character string candidate.

【0006】本発明はこのような従来の課題を解決する
もので、確からしい文字をもとに評価値を求め、記入者
の誤りを推論することにより、文字認識率を高くするこ
とを目的としている。
The present invention solves such a conventional problem. It is an object of the present invention to obtain an evaluation value based on probable characters and infer an error of a writer, thereby increasing a character recognition rate. I have.

【0007】[0007]

【課題を解決するための手段】本発明は上記目的を達成
するために、文字列候補評価値演算部において各候補文
字の文字認識部での類似度の分布および知識辞書におけ
る文字の出現頻度をもとに文字列候補評価値を求める。
さらに、記入者の記入誤りを推論し、誤り文字を削除お
よび挿入して正しい文字列を求めるものである。
According to the present invention, in order to achieve the above object, a character string candidate evaluation value calculating unit calculates a similarity distribution of each candidate character in a character recognition unit and a character appearance frequency in a knowledge dictionary. A character string candidate evaluation value is obtained based on the evaluation value.
Further, it infers a typographical error of the writer and deletes and inserts erroneous characters to obtain a correct character string.

【0008】[0008]

【作用】本発明は上記した構成により、認識対象の画像
にノイズなどがのっていた場合でも、各候補文字の類似
度の傾向から文字列候補評価値を求めるので、信頼度の
低い文字は評価値への寄与が下がり誤認識を減らすこと
ができる。さらに、記入者の記入誤りを推論することに
より、記入誤りがある場合においても正しい文字列を求
めることができ、よって文字認識率が向上する。
According to the present invention, the character string candidate evaluation value is obtained from the tendency of the similarity of each candidate character even if noise is present in the image to be recognized. The contribution to the evaluation value is reduced, and erroneous recognition can be reduced. Further, by inferring a typographical error of the writer, a correct character string can be obtained even when there is a typographical error, and the character recognition rate is improved.

【0009】[0009]

【実施例】以下、本発明の一実施例について説明する。
図1にこの実施例の文字認識装置の全体の構成を示す。
An embodiment of the present invention will be described below.
FIG. 1 shows the overall configuration of the character recognition device of this embodiment.

【0010】文字認識部2は、帳票画像1より文字認識
を行い、1文字につき第1候補文字から第n候補文字ま
でのn個の候補文字を持つ候補文字集合と各候補文字の
認識類似度を出力する。
A character recognizing unit 2 performs character recognition from the form image 1 and sets a candidate character set having n candidate characters from a first candidate character to an nth candidate character per character and the recognition similarity of each candidate character. Is output.

【0011】文字列検索部3は、各候補文字集合の組み
合せの中から、知識辞書4を検索することにより、文字
列候補となる組み合せを選び出す。文字列候補評価値演
算部5は、文字列検索部3で検索された文字列候補を知
識辞書4との一致度、文字認識部1での認識類似度およ
び知識辞書4での文字の出現頻度を基準として、文字列
候補評価値を計算する。文字列候補選択部6は、文字列
候補の中で文字列候補評価値の最も大きい文字列候補を
選択する。
The character string search section 3 searches the knowledge dictionary 4 from combinations of the candidate character sets to select combinations that are character string candidates. The character string candidate evaluation value calculator 5 calculates the degree of coincidence of the character string candidate searched by the character string search unit 3 with the knowledge dictionary 4, the degree of similarity recognized by the character recognition unit 1, and the frequency of appearance of characters in the knowledge dictionary 4. The character string candidate evaluation value is calculated based on. The character string candidate selection unit 6 selects a character string candidate having the largest character string candidate evaluation value among the character string candidates.

【0012】誤記入推論部7は、文字列候補選択部6で
選択された文字列候補の文字列候補評価値と候補文字集
合から、記入者の誤記入がないか推論する。区切り文字
削除挿入部8は、記入者の誤記入があった場合に、誤記
入文字の削除および挿入を行う。
The erroneous entry inference unit 7 infers from the character string candidate evaluation value of the character string candidate selected by the character string candidate selection unit 6 and a candidate character set whether there is any erroneous entry by the writer. The delimiter deletion / insertion unit 8 deletes and inserts an erroneously entered character when there is an erroneous entry by the writer.

【0013】図2は文字列候補評価値演算部の内部構成
を示す図であり、これについて説明する。まず、類似度
評価部12は、各候補文字の類似度の傾向から類似度評価
を行い、文字出現頻度評価部13は、知識辞書に含まれる
文字の出現頻度情報から文字出現頻度評価を行う。文字
評価値演算部14は、各候補文字の文字評価値を類似度評
価と文字出現頻度評価をもとに求める。候補外文字評価
部15は、候補外文字の評価を行う。文字列評価値導出部
16は、各候補文字の文字評価値から、各文字列候補の文
字列評価値を求める。
FIG. 2 is a diagram showing the internal configuration of the character string candidate evaluation value calculation unit, which will be described. First, the similarity evaluation unit 12 performs similarity evaluation based on the tendency of similarity of each candidate character, and the character appearance frequency evaluation unit 13 performs character appearance frequency evaluation based on the appearance frequency information of the characters included in the knowledge dictionary. The character evaluation value calculation unit 14 obtains the character evaluation value of each candidate character based on the similarity evaluation and the character appearance frequency evaluation. The non-candidate character evaluation unit 15 evaluates non-candidate characters. String evaluation value derivation unit
Step 16 calculates a character string evaluation value of each character string candidate from the character evaluation value of each candidate character.

【0014】図3は誤記入推論部の内部構成を示す図で
あり、以下これについて説明する。まず、高文字評価値
文字選択部22は、文字評価値の高い候補文字を選択し、
区切り文字検索部23は、高文字評価値文字選択部22から
出力された候補文字の中から区切り文字を検索する。誤
挿入推論部24は、記入者の区切り文字誤挿入を推論し
て、誤挿入があると推論したときは、区切り文字削除指
示28を出力する。前方部分一致検索部25は、高文字評価
値文字選択部22から出力された候補文字の中で、知識辞
書に含まれる文字列と前方部分一致する文字を検索す
る。誤省略推論部26は、記入者の区切り文字誤省略を推
論し、誤省略があると推論したときは、区切り文字挿入
指示29を出力する。
FIG. 3 is a diagram showing the internal configuration of the erroneous entry inference unit, which will be described below. First, the high character evaluation value character selection unit 22 selects a candidate character having a high character evaluation value,
The delimiter search unit 23 searches for a delimiter from among the candidate characters output from the high-character evaluation value character selection unit 22. The erroneous insertion inference unit 24 infers the erroneous insertion of the delimiter by the writer, and outputs the delimiter deletion instruction 28 when inferring that there is an erroneous insertion. The front part match search unit 25 searches the candidate characters output from the high character evaluation value character selection unit 22 for a character whose front part matches the character string included in the knowledge dictionary. The erroneous omission inference unit 26 infers an erroneous omission of the delimiter of the entry person and outputs a delimiter insertion instruction 29 when inferring that there is an erroneous omission.

【0015】上記構成の文字認識装置において次のよう
にして文字認識を行う。まず、帳票画像1を文字認識部
2で処理し、1文字につき第1候補文字から第n候補文
字までのn個の候補文字を持つ候補文字集合と各候補文
字の認識類似度Aij(iは文字位置、jは第j候補)を
得る。図4は、文字認識部2で文字認識を行い、1文字
につき第1候補文字から第5候補文字までの候補文字を
得た結果である。文字列検索部3では、知識辞書4に含
まれる文字列の中から候補文字集合の組み合せと部分一
致する文字列を文字列候補として抽出する。例えば、図
4の候補文字集合と、図5に示す知識辞書4との部分一
致を行った結果、「FY−38N」,「JN−28」,
「JP−28M」,「LR−V08」が文字列候補とし
て抽出される。
The character recognition apparatus having the above-described structure performs character recognition as follows. First, the form image 1 is processed by the character recognizing unit 2, and a candidate character set having n candidate characters from the first candidate character to the n-th candidate character per character and the recognition similarity Aij (i is Character position, j is the j-th candidate). FIG. 4 shows the result of performing character recognition by the character recognition unit 2 and obtaining candidate characters from the first candidate character to the fifth candidate character for each character. The character string search unit 3 extracts character strings that partially match the combination of the candidate character sets from the character strings included in the knowledge dictionary 4 as character string candidates. For example, as a result of performing partial matching between the candidate character set of FIG. 4 and the knowledge dictionary 4 shown in FIG. 5, “FY-38N”, “JN-28”,
“JP-28M” and “LR-V08” are extracted as character string candidates.

【0016】文字列検索部3から出力された文字列候補
から、類似度評価部12で文字位置ごとに文字列候補と一
致した候補文字の類似度を調べ、文字位置iに対して類
似度から求まる類似度評価値Ri を出力する。文字列候
補と一致した文字が第j候補のとき、Ri はAijによっ
て求まる関数で、例えば、Ri =Aijで求めることがで
きる。また、第1候補の認識類似度Ai1で正規化して
Ri=Aij/Ai1で求めることができる。ただし、文字
列候補と一致した文字がない場合は、Ri =0である。
文字出現頻度評価部13では、文字列検索部3から出力さ
れた文字列候補の各文字が知識辞書4に含まれる文字列
の同じ文字位置iに出現する頻度確率Piを求める。例
えば、文字列候補「FY−38N」の頻度確率P1 は、
知識辞書に含まれる文字列の中で、文字位置1に「F」
が出現する確率である。
From the character string candidates output from the character string search unit 3, the similarity evaluation unit 12 checks the similarity of candidate characters that match the character string candidates for each character position, and determines the similarity for the character position i from the similarity. The calculated similarity evaluation value Ri is output. When the character that matches the character string candidate is the j-th candidate, Ri is a function determined by Aij, for example, Ri = Aij. Also, normalized by the recognition similarity Ai1 of the first candidate,
Ri = Aij / Ai1. However, if there is no character that matches the character string candidate, Ri = 0.
The character appearance frequency evaluation unit 13 obtains a frequency probability Pi that each character of the character string candidate output from the character string search unit 3 appears at the same character position i of the character string included in the knowledge dictionary 4. For example, the frequency probability P1 of the character string candidate "FY-38N" is
In the character string included in the knowledge dictionary, "F"
Is the probability that appears.

【0017】文字評価値演算部14は、文字列候補の各文
字位置の文字評価値Bi を求める。Bi は、Bi = Ri
× f(Pi) で、f(Pi) は単調減少関数である。例え
ば、f(Pi) =1/Pi とすることができる。頻度確率
Pi の高い文字ほど文字評価値Bi を低くすることによ
り、よく出現する文字での認識文字列決定権を低くでき
る。これは、出現頻度の高い文字(例えば句切り文字)
は多くの文字列に含まれているので、その文字だけに正
解文字列の推論をたよることは誤認識する危険が大きい
からである。
The character evaluation value calculator 14 calculates a character evaluation value Bi at each character position of a character string candidate. Bi is Bi = Ri
× f (Pi), where f (Pi) is a monotonically decreasing function. For example, f (Pi) = 1 / Pi. By lowering the character evaluation value Bi for a character having a higher frequency probability Pi, the recognition character string determination authority for frequently occurring characters can be reduced. This is a frequently occurring character (eg, a punctuation character)
Because is included in many character strings, relying solely on that character to infer a correct character string is highly risky of misrecognition.

【0018】候補外文字評価部15は、文字列候補と一致
した文字がない場合に、文字評価値Bi を求める。文字
評価値Bi は、Bi = −a × (Ai1 − Ain) で、a
は0<a<1の定数である。例えば、文字列候補「FY
−38N」の文字位置1の「F」が第1候補文字から第
n候補文字の中にないときの文字評価値B1 は、B1 =
−a × (A11 − A1n) である。
The non-candidate character evaluation unit 15 obtains a character evaluation value Bi when there is no character that matches the character string candidate. The character evaluation value Bi is Bi = −a × (Ai1−Ain), and a
Is a constant of 0 <a <1. For example, the character string candidate “FY
The character evaluation value B1 when "F" at character position 1 of "-38N" is not in the first to n-th candidate characters is B1 =
−a × (A11−A1n).

【0019】文字列評価値導出部16は、文字列候補の各
文字の文字評価値Bi より、文字列候補評価値Cを求め
る。文字列候補評価値Cは、C=ΣBi で求める。文字
列候補選択部6は、各文字列候補の文字列候補評価値を
比較し、最も文字列候補評価値の大きい文字列候補を選
択する。高文字評価値文字選択部22では、候補文字集合
の中から文字評価値が一定以上の値を持つ候補文字を選
択する。もしくは、第m候補以内の候補文字(1<m<
n)を選択する。
The character string evaluation value deriving section 16 obtains a character string candidate evaluation value C from the character evaluation value Bi of each character of the character string candidate. The character string candidate evaluation value C is obtained by C = ΣBi. The character string candidate selection unit 6 compares the character string candidate evaluation values of the respective character string candidates and selects the character string candidate having the largest character string candidate evaluation value. The high character evaluation value character selection unit 22 selects a candidate character having a character evaluation value of a certain value or more from a candidate character set. Alternatively, candidate characters within the m-th candidate (1 <m <
Select n).

【0020】区切り文字検索部23では、高文字評価値文
字選択部22で選択された候補文字の中から区切り文字を
検索する。誤挿入推論部24では、文字列候補選択部6が
出力した文字列候補の文字列候補評価値が一定の値以下
であり、区切り文字検索部23で区切り文字が検索できた
とき、その区切り文字の位置を区切り文字削除指示とし
て、区切り文字削除挿入部8に出力する。一方、文字列
候補選択部6が出力した文字列候補の文字列候補評価値
が一定の値以上であるか、区切り文字検索部23で区切り
文字が検索できなかったときは、文字列候補選択部6が
出力した文字列候補を前方部分一致検出部25に出力す
る。
The delimiter search unit 23 searches for a delimiter from the candidate characters selected by the high character evaluation value character selection unit 22. In the erroneous insertion inference unit 24, when the character string candidate evaluation value of the character string candidate output by the character string candidate selection unit 6 is equal to or less than a certain value, and the delimiter search unit 23 can search for the delimiter, Is output to the delimiter deletion insertion unit 8 as a delimiter deletion instruction. On the other hand, if the character string candidate evaluation value of the character string candidate output by the character string candidate selection unit 6 is equal to or greater than a certain value, or if the delimiter search unit 23 cannot search for a delimiter, the character string candidate selection unit 6 outputs the character string candidate output to the front part match detection unit 25.

【0021】前方部分一致検索部25では、知識辞書4の
文字列と高文字評価値文字選択部22で選択された候補文
字の組み合せとで前方部分一致し、かつ知識辞書で一致
しなかった最初の文字が区切り文字である文字列を知識
辞書から検索する。誤省略推論部26は、文字列候補選択
部6が出力した文字列候補の文字列候補評価値が一定の
値以下であり、前方部分一致検索部25から前方部分一致
文字列が検索できたとき、部分一致しなかった最初の文
字の文字位置を区切り文字挿入指示として、区切り文字
削除挿入部に出力する。一方、文字列候補選択部6が出
力した文字列候補の文字列候補評価値が一定の値以上で
あるか、前方部分一致検索部25から前方部分一致文字列
が検索できなかったときは、文字列候補選択部6が出力
した文字列候補を修正文字列として出力する。
In the front part matching search unit 25, the first part that does not match the front part of the character string of the knowledge dictionary 4 and the combination of the candidate characters selected by the high character evaluation value character selection unit 22 and does not match in the knowledge dictionary is used. A character string whose character is a delimiter is searched from the knowledge dictionary. The omitting omission inference unit 26 determines that the character string candidate evaluation value of the character string candidate output by the character string candidate selection unit 6 is equal to or less than a certain value, and that the front part matching search unit 25 can search for the front part matching character string. The character position of the first character that does not partially match is output to the delimiter deletion / insertion unit as a delimiter insertion instruction. On the other hand, if the character string candidate evaluation value of the character string candidate output by the character string candidate selection unit 6 is equal to or greater than a certain value, or if the front partial match search unit 25 cannot search for a front part match character string, The character string candidates output by the column candidate selection unit 6 are output as corrected character strings.

【0022】区切り文字削除挿入部8は、区切り文字削
除指示28あるいは区切り文字挿入指示29に基づいて、候
補文字集合を編集する。区切り文字削除指示28の場合
は、候補文字集合に対して区切り文字の文字位置にある
候補文字を削除し、その文字位置以降の文字位置にある
候補文字を1字ずつ前の文字位置にずらす。また、区切
り文字挿入指示29の場合は、候補文字集合に対して区切
り文字の文字位置以降の文字位置にある候補文字を1字
ずつ後ろの文字位置にずらし、区切り文字の文字位置の
第1候補を区切り文字にする。そして、この候補文字集
合を文字列検索部に出力する。
The delimiter deletion / insertion unit 8 edits a candidate character set based on a delimiter deletion instruction 28 or a delimiter insertion instruction 29. In the case of the delimiter deletion instruction 28, the candidate character at the character position of the delimiter is deleted from the candidate character set, and the candidate characters at the character positions subsequent to the character position are shifted one character position to the preceding character position. In the case of the delimiter insertion instruction 29, the candidate characters at the character positions subsequent to the delimiter character position are shifted one character at a time to the next character position with respect to the candidate character set, and the first candidate character position of the delimiter character is shifted. As a delimiter. Then, the candidate character set is output to the character string search unit.

【0023】[0023]

【発明の効果】以上のように、本発明の文字認識装置を
使用することにより、文字列評価値を認識類似度の傾向
と知識辞書の文字出現頻度とから計算するので、認識対
象画像にノイズなどがのっている場合においても、より
正確に修正文字列を決定することができる。また、誤記
入推論により、記入者が区切り文字を誤って挿入したり
省略したときでも、正しい文字列を決定することができ
る。このように文字認識を行うために認識率が向上し、
その効果は大なるものがある。
As described above, by using the character recognition apparatus of the present invention, a character string evaluation value is calculated from the tendency of recognition similarity and the frequency of appearance of characters in the knowledge dictionary. Even in the case where a correction character string is included, the correction character string can be determined more accurately. In addition, by the erroneous entry inference, a correct character string can be determined even when a writer inserts or omit a separator character by mistake. In order to perform character recognition in this way, the recognition rate is improved,
The effect is great.

【図面の簡単な説明】[Brief description of the drawings]

【図1】本発明の一実施例の文字認識装置の構成を示す
ブロック図
FIG. 1 is a block diagram illustrating a configuration of a character recognition device according to an embodiment of the present invention.

【図2】本実施例の文字列候補評価値演算部の構成を示
すブロック図
FIG. 2 is a block diagram illustrating a configuration of a character string candidate evaluation value calculation unit according to the embodiment;

【図3】本実施例の誤記入推論部の構成を示すブロック
FIG. 3 is a block diagram illustrating a configuration of an erroneous entry inference unit according to the embodiment;

【図4】本実施例の文字認識部の出力図FIG. 4 is an output diagram of a character recognition unit according to the embodiment.

【図5】本実施例の文字列検索部の出力図FIG. 5 is an output diagram of a character string search unit according to the embodiment;

【図6】従来の文字認識装置の構成図FIG. 6 is a configuration diagram of a conventional character recognition device.

【符号の説明】[Explanation of symbols]

1 帳票画像 2 文字認識部 3 文字列検索部 4 知識辞書 5 文字列候補評価値演算部 6 文字列候補選択部 7 誤記入推論部 8 区切り文字削除挿入部 9 修正文字列 11 文字列候補 12 類似度評価部 13 文字出現頻度評価部 14 文字評価値演算部 15 候補外文字評価部 16 文字列評価値導出部 17 文字列候補評価値 21 候補文字集合 22 高文字評価値文字選択部 23 区切り文字検索部 24 誤挿入推論部 25 前方部分一致検索部 26 誤省略推論部 27 修正文字列 28 区切り文字削除指示 29 区切り文字挿入指示 1 Form Image 2 Character Recognition Unit 3 Character String Search Unit 4 Knowledge Dictionary 5 Character String Candidate Evaluation Value Calculation Unit 6 Character String Candidate Selection Unit 7 Mistake Inference Unit 8 Delimiter Deletion and Insertion Unit 9 Corrected Character String 11 Character String Candidate 12 Similar Degree evaluation unit 13 Character appearance frequency evaluation unit 14 Character evaluation value calculation unit 15 Non-candidate character evaluation unit 16 Character string evaluation value derivation unit 17 Character string candidate evaluation value 21 Candidate character set 22 High character evaluation value character selection unit 23 Delimiter search Part 24 Incorrect insertion inference part 25 Partial match search part 26 Incorrect omission inference part 27 Corrected character string 28 Delimiter deletion instruction 29 Delimiter insertion instruction

───────────────────────────────────────────────────── フロントページの続き (72)発明者 萱嶋 一弘 大阪府門真市大字門真1006番地 松下電 器産業株式会社内 (56)参考文献 特開 平4−120679(JP,A) 特開 平4−205457(JP,A) 特開 平5−62022(JP,A) 特開 平2−116988(JP,A) 特開 平5−174195(JP,A) 特開 平2−121078(JP,A) 特開 平4−77980(JP,A) 特開 平4−205457(JP,A) (58)調査した分野(Int.Cl.7,DB名) G06K 9/72 ────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Kazuhiro Kayashima 1006 Kadoma Kadoma, Osaka Prefecture Matsushita Electric Industrial Co., Ltd. (56) References JP-A-4-120679 (JP, A) JP-A-4- 205457 (JP, A) JP-A-5-62022 (JP, A) JP-A-2-116988 (JP, A) JP-A-5-174195 (JP, A) JP-A-2-121078 (JP, A) JP-A-4-77980 (JP, A) JP-A-4-205457 (JP, A) (58) Fields investigated (Int. Cl. 7 , DB name) G06K 9/72

Claims (6)

(57)【特許請求の範囲】(57) [Claims] 【請求項1】入力画像を1文字ずつ認識し、1文字につ
きn個の候補文字と認識類似度を出力する文字認識部
と、知識辞書に含まれる文字列と候補文字の組み合せと
で部分一致する文字列を検索し文字列候補を出力する文
字列検索部と、各文字列候補の各文字位置におけるn個
の候補文字の認識類似度と知識辞書に含まれる文字列の
各文字位置における文字出現頻度をもとにして文字列候
補評価値を求める文字列候補評価値演算部と、最も大き
い文字列候補評価値を持つ文字列候補を選択する文字列
候補選択部と、選択された文字列候補と候補文字集合か
ら記入者の誤記入を推論する誤記入推論部と、誤記入文
字の削除あるいは挿入を行う区切り文字削除挿入部と
備えたことを特徴とする文字認識装置。
A character recognition unit for recognizing an input image character by character and outputting n candidate characters and recognition similarity for each character, and a partial match between a combination of a character string and a candidate character included in a knowledge dictionary. A character string search unit that searches for character strings to be output and outputs character string candidates, and n character strings at each character position of each character string candidate
Of the recognition similarity of the candidate character and the character string included in the knowledge dictionary
And string candidate evaluation value calculating unit for obtaining the character string candidate evaluation value appearance frequency of definitive to each character position based on the character string candidate selection unit for selecting a character string candidates with the highest string candidate evaluation value , a false description inference unit for inferring a registrant erroneous entry from the character string candidate and the candidate character set selected, the delimiter deletion insertion unit for deleting or insertion of false description character
Character recognition apparatus characterized by comprising.
【請求項2】文字列候補評価値演算部が、文字列候補の
各文字位置におけるn個の候補文字の認識類似度から類
似度評価を行う類似度評価部と、知識辞書に含まれる文
字列の各文字位置における文字の出現頻度を評価する文
字出現頻度評価部と、類似度評価と文字出現頻度評価か
ら文字列候補の各文字位置における文字評価値を求める
文字評価値演算部と、文字評価値から文字列評価値を求
める文字列評価値導出部とを有することを特徴とする請
求項1記載の文字認識装置。
Character wherein character string candidate evaluation value calculation unit, and a similarity evaluation unit for recognizing similarities or et similarity evaluation of n candidate characters in each character position of the character string candidates, included in the knowledge dictionary A character appearance frequency evaluation unit that evaluates the appearance frequency of a character at each character position in the column; a character evaluation value calculation unit that obtains a character evaluation value at each character position of a character string candidate from similarity evaluation and character appearance frequency evaluation; character recognition system according to claim 1, characterized in that it comprises a string evaluation value deriving portion for obtaining the string evaluation value from the evaluation value.
【請求項3】文字列候補評価値演算部が、文字列候補の
各文字位置におけるn個の候補文字の認識類似度から類
似度評価を行う類似度評価部と、知識辞書に含まれる文
字列の各文字位置における文字の出現頻度を評価する文
字出現頻度評価部と、類似度評価と文字出現頻度評価か
ら文字列候補の各文字の中で候補文字の中に一致する文
字がないとき候補外文字評価を行う候補外文字評価部
と、文字評価値から文字列評価値を求める文字列評価値
導入部とを有することを特徴とする請求項1記載の文字
認識装置。
Statement 3. A string candidate evaluation value calculation unit, and a similarity evaluation unit for recognizing similarities or et similarity evaluation of n candidate characters in each character position of the character string candidates, included in the knowledge dictionary
A sentence that evaluates the frequency of occurrence of characters at each character position in the character string
A character appearance frequency evaluation unit, a non-candidate character evaluation unit that performs a non-candidate character evaluation when there is no matching character among the candidate characters from the similarity evaluation and the character appearance frequency evaluation, and a character character recognition system according to claim 1, characterized in that it comprises a string evaluation value introducing unit for obtaining the character string evaluation value from the evaluation value.
【請求項4】誤記入推論部が、文字評価値が一定の値以
上の値を持つ候補文字を選択する高文字評価値文字選択
部と、高文字評価値文字選択部で選択された候補文字か
ら区切り文字を検索する区切り文字検索部と、文字列候
補評価値が一定の値以下であり区切り文字検索部で区切
り文字が検索できたとき、該区切り文字を記入者の区切
り文字誤挿入と推論し、候補文字集合に対して該区切り
文字の文字位置にある候補文字を削除する誤挿入推論部
を有することを特徴とする請求項1記載の文字認識装
置。
4. A high character evaluation value character selecting unit for selecting a candidate character having a character evaluation value equal to or greater than a predetermined value, and a candidate character selected by the high character evaluation value character selecting unit. and delimiter character search unit that searches for the delimiter character from the string climate
Complementary evaluation value is less than a certain value and separated by delimiter search part
When a character can be searched, the delimiter is
Erroneous insertion of characters and delimiter
Character recognition system according to claim 1, characterized in that it comprises a false insertion inference unit <br/> deletes the candidate character in character positions.
【請求項5】誤記入推論部が、文字評価値が一定の値以
上の値を持つ候補文字を選択する高文字評価値文字選択
部と、知識辞書の文字列と高文字評価値選択部で選択さ
れた候補文字の組み合せとで前方部分一致検索を行う前
方部分一致検索部と、文字列候補評価値が一定の値以下
であり前方部分一致文字列が検索できたとき、部分一致
しなかった最初の文字位置を記入者の区切り文字誤省略
と推論し、候補文字集合に対して前記部分一致しなかっ
た最初の文字位置に区切り文字を挿入する誤省略推論部
を有することを特徴とする請求項1記載の文字認識装
置。
5. An erroneous entry inference unit comprising: a high character evaluation value character selection unit for selecting a candidate character having a character evaluation value equal to or greater than a certain value; and a character string of a knowledge dictionary and a high character evaluation value selection unit. Selected
Before performing a partial match search with the combination of candidate characters
Partial match search part, character string candidate evaluation value is below a certain value
And if a partial match string can be searched,
The first character position that was not entered was incorrectly omitted by the writer.
And the partial match does not occur for the candidate character set.
First character recognition system according to claim 1, characterized in that it comprises a false omitted inference unit <br/> and inserting the delimiter character positions.
【請求項6】文字列候補評価値演算部が、Riを文字列6. A character string candidate evaluation value calculation unit calculates Ri as a character string
候補中の文字位置iにおける候補文字の認識類似度とThe recognition similarity of the candidate character at the character position i in the candidate and
し、Piは文字列候補の各文字が知識辞書に含まれる文Pi is a sentence in which each character of the character string candidate is included in the knowledge dictionary
字列の同じ文字位置iに出現する頻度確率でf(Pi)F (Pi) is the probability of occurrence at the same character position i in the character string
は単調減少関数であるとすると、Riとf(Pi)の積Is a monotonically decreasing function, the product of Ri and f (Pi)
により文字列評価値を求めることを特徴とする請求項12. A character string evaluation value is obtained by:
記載の文字認識装置。Character recognition device according to the description.
JP06268640A 1994-11-01 1994-11-01 Character recognition device Expired - Lifetime JP3085107B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP06268640A JP3085107B2 (en) 1994-11-01 1994-11-01 Character recognition device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP06268640A JP3085107B2 (en) 1994-11-01 1994-11-01 Character recognition device

Publications (2)

Publication Number Publication Date
JPH08129616A JPH08129616A (en) 1996-05-21
JP3085107B2 true JP3085107B2 (en) 2000-09-04

Family

ID=17461366

Family Applications (1)

Application Number Title Priority Date Filing Date
JP06268640A Expired - Lifetime JP3085107B2 (en) 1994-11-01 1994-11-01 Character recognition device

Country Status (1)

Country Link
JP (1) JP3085107B2 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5107157B2 (en) * 2008-06-30 2012-12-26 富士通フロンテック株式会社 Character recognition program, character recognition device, and character recognition method

Also Published As

Publication number Publication date
JPH08129616A (en) 1996-05-21

Similar Documents

Publication Publication Date Title
CN111859921A (en) Text error correction method and device, computer equipment and storage medium
JP3085107B2 (en) Character recognition device
JP2000089786A (en) Method for correcting speech recognition result and apparatus therefor
JPS5854433B2 (en) Difference detection device
JP3255816B2 (en) Character recognition device
JP3975825B2 (en) Character recognition error correction method, apparatus and program
JP3350127B2 (en) Character recognition device
JPH06215184A (en) Labeling device for extracted area
JP4047895B2 (en) Document proofing apparatus and program storage medium
US5689583A (en) Character recognition apparatus using a keyword
JP4318223B2 (en) Document proofing apparatus and program storage medium
JP3157557B2 (en) Character recognition device
JP2827066B2 (en) Post-processing method for character recognition of documents with mixed digit strings
JP2908460B2 (en) Error recognition correction method and apparatus
JP4047894B2 (en) Document proofing apparatus and program storage medium
JP3123181B2 (en) Character recognition device
JPH0290384A (en) Post-processing system for character recognizing device
JPH0757059A (en) Character recognition device
JPH07271921A (en) Character recognizing device and method thereof
JPH0256086A (en) Method for postprocessing for character recognition
JP2746345B2 (en) Post-processing method for character recognition
JPS60138689A (en) Character recognizing method
JPS646514B2 (en)
JPH07152877A (en) English alphabet recognition device
JPH06119497A (en) Character recognizing method

Legal Events

Date Code Title Description
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20070707

Year of fee payment: 7

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20080707

Year of fee payment: 8

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090707

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090707

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100707

Year of fee payment: 10