JPH02224085A - Character recognizing device - Google Patents

Character recognizing device

Info

Publication number
JPH02224085A
JPH02224085A JP1042944A JP4294489A JPH02224085A JP H02224085 A JPH02224085 A JP H02224085A JP 1042944 A JP1042944 A JP 1042944A JP 4294489 A JP4294489 A JP 4294489A JP H02224085 A JPH02224085 A JP H02224085A
Authority
JP
Japan
Prior art keywords
character
recognition
pattern
post
character recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP1042944A
Other languages
Japanese (ja)
Other versions
JP2856409B2 (en
Inventor
Yoshiaki Kurosawa
由明 黒沢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Corp
Original Assignee
Toshiba Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Corp filed Critical Toshiba Corp
Priority to JP1042944A priority Critical patent/JP2856409B2/en
Publication of JPH02224085A publication Critical patent/JPH02224085A/en
Application granted granted Critical
Publication of JP2856409B2 publication Critical patent/JP2856409B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PURPOSE:To correct erroneous reading and to secure high accuracy for reading by recognizing a character with the use of a character pattern itself in a slip. CONSTITUTION:Concerning the input pattern which is detected with a character detection part 2 at first and continuously processed with a first character recognition part 3 and a post-processing part 4, the final result of character recognition is obtained by executing again processing to include one or both of character detection and second character recognition by using a second dictionary storage part 12 for character recognition. Namely, after the execution of picture input, character detection, first character recognition and post-processing, concerning a part for which the reliability of the result is judged to be low, the detection and segmentation of the character and character recognition are executed again by using the character pattern on the input slip. Thus, even when a context or a word, which is not included in knowledge concerning context or monotone, is inputted or when the character of low quality is inputted, the character is not erroneously read but a high recognition rate can be obtained.

Description

【発明の詳細な説明】 〔発明の目的〕 (産業上の利用分野) 本発明は文字認識結果に対して誤読している部分を修正
し正しい認識結果を得る文字認識装置に関する。
DETAILED DESCRIPTION OF THE INVENTION [Object of the Invention] (Industrial Application Field) The present invention relates to a character recognition device that corrects misread portions of character recognition results and obtains correct recognition results.

(従来の技術) 文字認識装置は画像入力部1文字検出部2文字認識部、
後処理部より構成され、文字認識結果のあいまい性を文
脈や単語の知識に基づいて修正することにより最終的な
認識率の向上をはかつている。
(Prior art) A character recognition device has an image input section, a character detection section, two character recognition sections,
It consists of a post-processing unit that improves the final recognition rate by correcting ambiguity in character recognition results based on context and word knowledge.

しかしながら、文脈や単語の知識にない文脈や単語が入
力された時や、低品質文字が入力された時には最終的な
誤読文字を救済する手段はこれまで開発されていなかっ
た。
However, when a context or word that is not known in the context or word is input, or when low-quality characters are input, no means has been developed so far to relieve the eventual misreading of characters.

(発明が解決しようとする課題) このように従来の文字認識装置では、未知語や低品質文
字に対し救済手段が無いという欠点が有った・ 本発明は文脈や単語の知識にない文脈や単語が入力され
た時や、低品質文字が入力された時でもwA読すること
なく、全体として高い認識率を得ることのできる文字認
識装置を提供することを目的としたものである。
(Problem to be solved by the invention) As described above, conventional character recognition devices have the disadvantage that there is no remedy for unknown words or low-quality characters. The object of the present invention is to provide a character recognition device capable of obtaining a high recognition rate as a whole without performing wA reading even when a word or a low-quality character is input.

〔発明の構成〕[Structure of the invention]

(課題を解決するための手段) 本発明は、画像入力部1文字検出部、第1の文字認識部
、後処理部、再認識部から構成され、再認識部は第1の
文字認識部と後処理部の出力結果から入力文字の正しい
文字コードすなわち正解文字の推定を行なう部分と、こ
の推定結果に基づいて第2の文字認識用の辞書を作成す
る部分と、第2の文字認識を実行する部分から構成され
る。
(Means for Solving the Problem) The present invention includes an image input section, a single character detection section, a first character recognition section, a post-processing section, and a re-recognition section, and the re-recognition section is a first character recognition section. A part that estimates the correct character code of the input character, that is, a correct character, from the output result of the post-processing unit, a part that creates a dictionary for second character recognition based on this estimation result, and a part that executes the second character recognition. It consists of parts.

最初に文字検出さオして、続いて第1の文字認識・後処
理された入カバターンについて、その全部または一部に
°〕いて第2の文字認識用辞書を用いて少なくとも文字
検出、第2の文字認識の一方または両方を含む処理を再
度行なうことにより最終的な文字認識結果を得る。。
First, characters are detected, and then all or part of the input cover pattern that has been subjected to the first character recognition and post-processing is subjected to at least character detection and second character recognition using a second character recognition dictionary. The final character recognition result is obtained by performing the process including one or both of the above character recognitions again. .

(作 用) 本発明では1画像入力・文字検出・第1の文字認識・後
処理の実行後6その結果の信頼性が低いと判断される部
分について、入力帳票」二の文字パターンを用いて再度
、文字の検出・切り出し、及び文字認識を行なうことが
可能となる。
(Function) In the present invention, after performing one image input, character detection, first character recognition, and post-processing, the second character pattern of the input form is used for parts where the reliability of the results is judged to be low. It becomes possible to detect and cut out characters and perform character recognition again.

(実施例) 第1図は本発明を実施例するための概念ブロック図であ
る。文字が印刷または記入された帳票の画像データは電
気信号に変えられて1の画像入力部より入力される1次
に、2の文字検出部において、この画像データから文字
行が抽出され、さらにその中から文字が取り出される。
(Embodiment) FIG. 1 is a conceptual block diagram for carrying out an embodiment of the present invention. Image data of a form on which characters are printed or written is converted into an electrical signal and inputted from the image input section 1. Next, character lines are extracted from this image data in the character detection section 2, and then the Characters are extracted from inside.

この取り出された文字パターンは3の第1の文字認識部
で各々認識され、その認識結果として、たとえば複数の
候補文字が決められる。各文字ごとに得られた複数の候
補文字は4の後処理部に送られる。4ではこの文脈に関
する知識、たとえば11語データからなる知識辞書等に
よって正しいと推定される文字列を入力さ扛た文字認識
結果より推定して出力する。この結果はさらに5の再認
識部へ送られて、次に述べろ処理が実行され、る。最終
的な結果が5から出力され、これが最終的な文字認識結
果となる。
The extracted character patterns are each recognized by the first character recognition section 3, and as a result of the recognition, for example, a plurality of candidate characters are determined. A plurality of candidate characters obtained for each character are sent to the post-processing section 4. In step 4, a character string that is estimated to be correct based on knowledge regarding this context, such as a knowledge dictionary consisting of 11-word data, is estimated from the input character recognition result and output. This result is further sent to the re-recognition unit 5, where the processing described below is executed. The final result is output from 5 and becomes the final character recognition result.

次に5における動作を説明する。第2図は再し2諏部を
詳細なブロック図で記述したものである。
Next, the operation in step 5 will be explained. FIG. 2 is a detailed block diagram of the second part.

第3図は本実施例にお番プるデータの流れを示したもの
である。20は入力画像で、これを文字認識した結果が
21であるにの例では文字認識結果の上位3候補を図示
した。図中のO印が正解候補である。文脈に関する知識
を後処理として使うことにより、この結果は22のよう
に修正さil、る。こ4−では「人様」なる単語が「人
権ノと認識されているが、「人様」が@語辞書に登録さ
オしていないために、文脈に関する知識が使用できず、
修正されていない。21と22のデータが第2図10の
正解文字推定部に入力される8 正解文字推定部では、第1の認識結果から得られた第1
位の候補文字のうち後処理の結果修正が生じた文字は誤
読文字で、後処理結果の文字が正解文字であると判断し
1、第3図の例では22の○印の部分をリストアツブし
、これを第2認識辞書作成部11に送る。第2辞書作成
部ではりス1−アップされた文字の入力文字パターンに
基づいて新たにその文字の文字認識用辞書を作成する。
FIG. 3 shows the flow of data in this embodiment. 20 is an input image, and the result of character recognition is 21. In this example, the top three candidates of the character recognition results are illustrated. The O mark in the figure is a correct answer candidate. By using knowledge of the context as post-processing, this result is modified as in 22. In this 4-, the word ``jin-sama'' is recognized as a human right, but because ``jin-sama'' is not registered in the @language dictionary, knowledge about the context cannot be used.
Not fixed. The data of 21 and 22 are input to the correct character estimation section in FIG.
Among the candidate characters in the lowest position, the characters that have been corrected as a result of post-processing are misread characters, and the characters resulting from post-processing are determined to be the correct characters. , and sends this to the second recognition dictionary creation section 11. The second dictionary creation section creates a new character recognition dictionary for the character, based on the input character pattern of the character that has been uploaded.

この辞書は第73図23に示されるものであり、これが
12の第2認識辞書記憶部に格納される。
This dictionary is shown in FIG. 73, 23, and is stored in the second recognition dictionary storage section 12.

一方、21の認識結果、及び22の後処理結果は13の
読取結果蓄積部に蓄積されている。帳票の認識が終了し
た後か、または途中において、13に蓄積されている認
識結果を第2認識実行判定部14に送り、ここで各文字
について、第2認識が必要かどうかを判断する。第3図
の例では後処理が不成功に終った「人様」の部分が第2
認識の実行部分として選択される。そこで、2の文字検
出部より再度「人様」の部分の画像パターンが取り出さ
れて、第ztirR部へ送られる。第2詔識部では、す
でに登録されている23の辞書を使用して「人様」の部
分を正しく認識し、これを出力部に送出する。出力部で
は第2認識部に送られなかった部分の読取り結果と第2
認識部分の認識結果を統合して、最終出力結果を出力す
る。
On the other hand, the recognition result 21 and the post-processing result 22 are stored in the reading result storage section 13. After or during the recognition of the form, the recognition results stored in 13 are sent to the second recognition execution determination section 14, where it is determined whether or not second recognition is necessary for each character. In the example shown in Figure 3, the “people” part for which post-processing was unsuccessful is the second
Selected as the execution part of recognition. Therefore, the image pattern of the "person" portion is extracted again from the second character detection section and sent to the ztirR section. The second edict recognition section uses the 23 already registered dictionaries to correctly recognize the part of "person" and sends it to the output section. The output unit outputs the reading results of the portion not sent to the second recognition unit and the second recognition unit.
The recognition results of the recognition parts are integrated and the final output result is output.

ここで、第2認識部に入力された文字が読取不能である
ならばまたは、第2の読取結果が最初の読取結果に比べ
て信頼性に劣ると判定されれば、第1の文字認識部の読
取結果を最終結果として出力する。
Here, if the characters input to the second recognition unit are unreadable, or if it is determined that the second reading result is less reliable than the first reading result, the first character recognition unit Output the reading result as the final result.

第2の認識方式としては次のような簡便な方式で認識を
実行する。
As the second recognition method, recognition is performed using the following simple method.

まず、辞書を作成する元となるパターン31を太らせて
パターン32を作成し、これを辞書とする。
First, a pattern 32 is created by thickening the pattern 31 from which a dictionary is created, and this pattern 32 is used as a dictionary.

認識は、入カバターン33を太らせてパターン34を作
成し、こわ、と辞書パターン32との画素ごとにおける
排他的論理和を旧算し、パターン38を作り、32と3
4の不一致部すなわち、35の部分の面積を計算し、こ
れと、一致部分すなわち37の面積の比を算出し、不一
致部分の小さい辞書に対応する文字を認識結果とする。
For recognition, a pattern 34 is created by thickening the input cover pattern 33, and the exclusive OR of the stiffness and the dictionary pattern 32 is calculated for each pixel to create a pattern 38.
The area of the unmatched part of 4, that is, the part of 35 is calculated, and the ratio of this to the area of the matched part, that is, 37 is calculated, and the character corresponding to the dictionary with the smaller unmatched part is taken as the recognition result.

この方式では太らせる操作や画素ごとの排他論理和はラ
ン1ノンゲス表現のパターンにおいても直接行なうこと
が可能なので、パターンや辞書の表現はランレングス表
現も可能である。
In this method, the fattening operation and the exclusive OR for each pixel can be directly performed even on a pattern of run-1 non-guess expression, so the expression of a pattern or dictionary can also be expressed as a run-length expression.

第5図は別の実施例を説明ずろための図である。FIG. 5 is a diagram for explaining another embodiment.

ここで42.43.44はそれぞれ第2認識辞書記憶部
に格納さ九たパターンの−・部である。41は、3個の
文字が接触[7ているために個々の文字が切り出せず、
そのままベンディング、または強制的に読取、った為に
誤読しているパターンである。再認識部はこのパターン
を入力し左側の部分について各辞書パターンを重ねあわ
せて、その一致度を調査し、その中で一致度の高い辞書
パターン42に対して重なる部分45を(の文字として
パターン4]から切り出す、同様にして順次パターン4
3に一致する4Gを切り出し、パターン44に一致する
47を切り出すこと番こより、パターン41から各文字
を切り出す。
Here, 42, 43, and 44 are the - parts of the nine patterns stored in the second recognition dictionary storage section, respectively. 41, the three characters are in contact [7], so individual characters cannot be cut out,
This is a pattern where the data is incorrectly read due to bending or forced reading. The re-recognition unit inputs this pattern, superimposes each dictionary pattern on the left side, examines the degree of matching, and identifies the overlapping portion 45 with the dictionary pattern 42 with a high degree of matching as a character in the pattern. 4], cut out pattern 4 in the same way sequentially.
Each character is cut out from the pattern 41 by cutting out 4G that matches 3 and cutting out 47 that matches the pattern 44.

このような方式によって複雑に入組んだ接触文字も確実
に切り出すことがJ能となる。、′:こて、切り出した
パターンについてあらためて、確認の意味で第1の文字
認識方式を適用して1.8識結果をチエツクしても良い
Using such a method, it is possible to reliably extract even complicated touching characters. ,': For confirmation, the first character recognition method may be applied to the cut out pattern to check the recognition results in 1.8.

第1の実施例では第1の文字認識結果と後処理に基づい
て正解文字を推定したが、第2の実施例では後処理部を
使わずに第】の文字認識結果に、1寸ける認識の信頼性
に関する評価値、たとえば類似度を用いて推定を行なう
ようにしても良い。
In the first example, the correct character was estimated based on the first character recognition result and post-processing, but in the second example, the correct character was estimated based on the first character recognition result and the post-processing, but in the second example, the recognition result that is 1 inch smaller than the character recognition result of The estimation may be performed using an evaluation value regarding reliability, for example, similarity.

なお、本発明は上記実施例に限定されるものではない。Note that the present invention is not limited to the above embodiments.

第1の認識部の認識方式や第2の認識部の認識はどのよ
うなものであっても良く、上記実施例に限定されない。
The recognition method of the first recognition section and the recognition method of the second recognition section may be of any type, and are not limited to the above embodiments.

第1のi、W 識方式と第2認識方式が同一であっても
良い、また、後処理の方式もどのようなもので良い。再
認識を実行するタイミングもいつでも良い3また、どの
部分を再認識すへぎかという判断も後処理結果のみを使
うのではなく、第1の文字認識結果の信頼性も評価値の
一部として使用することができる。
The first i,W recognition method and the second recognition method may be the same, and any post-processing method may be used. Re-recognition can be performed at any time.3 Also, when determining which part to re-recognize, we do not use only the post-processing results, but also use the reliability of the first character recognition result as part of the evaluation value. can do.

また、すべての文字を再認識の対象としても良い。認識
に使う辞書は太らせることなく、原パターンそのもので
あっても良い。第2の認識用の辞書の蓄積は複数の帳票
にまたがっても良い。
Alternatively, all characters may be subject to re-recognition. The dictionary used for recognition may be the original pattern itself without making it thicker. The second recognition dictionary may be stored over multiple forms.

〔発明の効果〕〔Effect of the invention〕

本発明によれば入力文字の品質が低い時や、後処理で使
用する単語辞書内に存在しない単語が入力された時など
に起る誤読に苅して、帳票内の文字パターンそのものを
使用して認識を行なうので、誤読の修正が可能となり、
結果として高い読取精度を確保することができ、実用I
−,絶大なる効果を発揮することができる。
According to the present invention, the character pattern itself in the form is used to prevent misreading that occurs when the quality of input characters is low or when a word that does not exist in the word dictionary used in post-processing is input. Since the recognition is performed by
As a result, high reading accuracy can be ensured, making it suitable for practical use.
-, it can have a tremendous effect.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明の一実施例の全体構成図、第2図は本発
明における再認識部の構成図、第3図は本発明のデータ
の流わを表わした図、第4図は元パターンと辞書バター
・ンとのマツチングの様子を示す図5第5図は接触文字
パターンに対する切出しの様子を示す図である。 1・・・画像入力部     2・−・文字検出部3・
・・第1−の文字認識部  4・・・後処理部5・・・
再認識部 代理人 弁理士 則 近 憲 佑 同  松山光之 第 図 但力
FIG. 1 is an overall configuration diagram of an embodiment of the present invention, FIG. 2 is a configuration diagram of the re-recognition section of the present invention, FIG. 3 is a diagram showing the data flow of the present invention, and FIG. 4 is the original FIG. 5 shows how patterns and dictionary patterns are matched. FIG. 5 shows how a contact character pattern is cut out. 1... Image input section 2... Character detection section 3.
...1st character recognition section 4...Post-processing section 5...
Re-recognition Department Agent Patent Attorney Nori Chika Ken Yudo Mitsuyuki Matsuyama Tadashi

Claims (1)

【特許請求の範囲】[Claims] 文字パターンを含む画像を入力するための画像入力部と
、入力画像から文字を検出し切出す文字検出部と、切出
された文字を認識する第1の文字認識部と、認識結果に
対し文脈に関する知識を用いて誤読部分を修正する後処
理部と、この後処理結果と前記認識結果を比較すること
により誤読文字を推定する推定部と、この推定結果に基
いて前記入力画像の文字パターンから第2の文字認識用
辞書を作成する辞書作成部と、前記第2の文字認識用辞
書を用いて文字検出及び第2の文字認識の少なくとも一
方を行なう手段とを具備したことを特徴とする文字認識
装置。
an image input section for inputting an image including a character pattern; a character detection section for detecting and cutting out characters from the input image; a first character recognition section for recognizing the cut out characters; a post-processing unit that corrects misread parts using knowledge of the above; an estimation unit that estimates misread characters by comparing the post-processing results with the recognition results; and an estimation unit that estimates misread characters by comparing the post-processing results with the recognition results; A character that is characterized by comprising: a dictionary creation unit that creates a second character recognition dictionary; and means that uses the second character recognition dictionary to perform at least one of character detection and second character recognition. recognition device.
JP1042944A 1989-02-27 1989-02-27 Character recognition apparatus and method Expired - Lifetime JP2856409B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP1042944A JP2856409B2 (en) 1989-02-27 1989-02-27 Character recognition apparatus and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP1042944A JP2856409B2 (en) 1989-02-27 1989-02-27 Character recognition apparatus and method

Publications (2)

Publication Number Publication Date
JPH02224085A true JPH02224085A (en) 1990-09-06
JP2856409B2 JP2856409B2 (en) 1999-02-10

Family

ID=12650123

Family Applications (1)

Application Number Title Priority Date Filing Date
JP1042944A Expired - Lifetime JP2856409B2 (en) 1989-02-27 1989-02-27 Character recognition apparatus and method

Country Status (1)

Country Link
JP (1) JP2856409B2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04188383A (en) * 1990-11-22 1992-07-06 Nec Corp Character code knowledge processing system
JP2015153240A (en) * 2014-02-17 2015-08-24 株式会社東芝 Pattern recognition apparatus, pattern recognition method, and program

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04188383A (en) * 1990-11-22 1992-07-06 Nec Corp Character code knowledge processing system
JP2015153240A (en) * 2014-02-17 2015-08-24 株式会社東芝 Pattern recognition apparatus, pattern recognition method, and program

Also Published As

Publication number Publication date
JP2856409B2 (en) 1999-02-10

Similar Documents

Publication Publication Date Title
JP2737734B2 (en) Fingerprint classifier
JP2000353215A (en) Character recognition device and recording medium where character recognizing program is recorded
JPH01246678A (en) Pattern recognizing device
JP3099797B2 (en) Character recognition device
JP3092576B2 (en) Character recognition device
JPH02224085A (en) Character recognizing device
JPH0520794B2 (en)
JPH11328315A (en) Character recognizing device
JP4083723B2 (en) Image processing device
JP2671984B2 (en) Information recognition device
JPS5835674A (en) Extracting method for feature of online hand-written character
JP2001147988A (en) Method and device for recognizing character
JP2851865B2 (en) Character recognition device
JP2002074269A (en) Method for recognizing character
JP2925270B2 (en) Character reader
KR100367580B1 (en) Device for recognizing on-line character of stroke order independence
JPS60138689A (en) Character recognizing method
JPH06187506A (en) Character recognizer
JPH08202830A (en) Character recognition system
JPH01265378A (en) European character recognizing system
JPH0816720A (en) Character recognition device
JPS62236088A (en) Recognizing device of character with sonant mark and p-sonant mark
JPH04216171A (en) Method for extracting contour vector
JPH0485684A (en) Character recognition device
JPH022156B2 (en)

Legal Events

Date Code Title Description
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20071127

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20081127

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20081127

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20091127

Year of fee payment: 11

EXPY Cancellation because of completion of term
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20091127

Year of fee payment: 11