JPH09138834A

JPH09138834A - Character recognition device/method

Info

Publication number: JPH09138834A
Application number: JP7296571A
Authority: JP
Inventors: Daisuke Nishiwaki; 大輔西脇
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1995-11-15
Filing date: 1995-11-15
Publication date: 1997-05-27
Anticipated expiration: 2015-11-15
Also published as: JP2973898B2

Abstract

PROBLEM TO BE SOLVED: To highly precisely execute a post-processing by generating again a pattern whose threshold of binarization is changed for a character which is not collated yet, executing individual character recognition and referring to the new candidate of a discriminated result obtained by character recognition to execute word collation. SOLUTION: A character pattern which is photo-electrically converted is inputted and it is stored in an image data storage means 301. A binarization/ normalization means 101 executes binarization/normalization by using the same threshold by the whole image for one character. A feature extraction means 102 extracts feature quantity from the input pattern and stores it in a feature data storage means 302. A character recognition means 103 discriminates the character by using the feature quantity. Then, a post-processing means 104 collates a recognized result with knowledge for the plural candidates obtained by the character recognition means 103. The pattern whose threshold of binarization is changed is generated again for the character which is not collated yet. Individual character recognition is executed and the word is collated by referring to the new candidate of the discriminated result obtained by character recognition.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、入力された文字を
自動的に読み取る文字認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device for automatically reading input characters.

【０００２】[0002]

【従来の技術】文字を高精度に自動認識する場合に、個
別の文字認識結果が出力する認識結果の候補に対して、
単語照合等の後処理を施すことで、全ての文字が完全に
読めない場合でも単語単位で読み取れる場合があり、氏
名や住所等の知識の利用が可能な局面では読み取り精度
向上のために積極的に用いられている。2. Description of the Related Art In the case of automatically recognizing a character with high accuracy, individual character recognition results are output as recognition result candidates.
By performing post-processing such as word matching, even if all characters are not completely readable, it may be possible to read in word units, and in the situation where knowledge such as name and address is available, positively improve reading accuracy. Is used for.

【０００３】以下に住所読み取りの具体例を２つ上げて
説明する。まず、特開平２−１０９１８７号公報「べた
書き住所の文字認識後処理方式」では、個別の文字認識
結果と住所辞書との照合に文字認識結果と住所辞書との
一致数を計算し、最も一致数の多い住所名を住所辞書か
ら取り出し、それを認識結果としている。Two specific examples of address reading will be described below. First, in Japanese Unexamined Patent Application Publication No. 2-109187, "Post-character recognition post-processing method for solid address," the number of matches between the character recognition result and the address dictionary is calculated to match the individual character recognition result with the address dictionary, and the best match is obtained. A large number of address names are extracted from the address dictionary and are used as recognition results.

【０００４】次に、特開昭６２−１０３７８５号公報
「文字読取装置」では、個別の文字認識結果が出力する
候補を照合範囲とし、その範囲内で住所辞書との照合を
行い、照合が完了した際に照合結果を認識結果としてい
る。Next, in Japanese Patent Laid-Open No. 62-103785, "Character Reader", the candidates output by individual character recognition results are set as a collation range, and collation with an address dictionary is performed within the range, and collation is completed. The matching result is used as the recognition result.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、上述１
番目のような、単純に一致数のみで判定しようとした場
合、例えば図６に示すように、「東町」が正解で「東」
がたまたま個別文字認識で読み取れず、同じ地域に「東
町」と「本町」があった場合には、一致数は個別文字認
識結果「？町」に対して「東町」、「本町」とも１で同
じとなり、住所の特定ができなくなる。これは、一致数
が個別文字の読み取れない部分に対してドントケアにな
ってることに起因する。However, the above-mentioned 1
If you try to make a judgment based only on the number of coincidences as in the second example, for example, as shown in FIG. 6, “Higashimachi” is the correct answer and “East”
When it happened that individual characters could not be read and there were "Higashimachi" and "Honmachi" in the same area, the number of matches was 1 for both "Higashimachi" and "Honmachi" for the individual character recognition result "? Machi". It will be the same and the address cannot be specified. This is because the number of matches is don't care for the unreadable part of the individual character.

【０００６】これに対して、上述２番目では、照合範囲
を個別の文字認識結果の出力する候補として、その中に
照合すべき候補があれば引き上げるというものである。
第１番目の例に比べ、読み取れない部分を個別文字認識
結果の下位の候補から探索する分、照合精度の向上が期
待できる。例えば、前例のように「東町」が正解で、図
７に示すように「東」が個別文字認識結果の第１位に現
れなくても、下位の候補に「東」が存在すれば、「東
町」である可能性はかなり高く、上述１番目の例よりも
高精度な照合が行える。しかしながら、この場合でも下
位の候補に「東」、「本」の両方が存在した場合、上位
の候補が「東」であったとしても、一般に下位候補の順
位の信頼度が不安定なことから確実ではなく、「本」が
上位に現れた場合や、候補中に正解文字が存在しない場
合には上述１番目と同様に正しい照合結果が得られな
い。埋め込みが可能な候補文字が複数あり、互いに類似
している本例のような場合は特に顕著である。文字がつ
ぶれている場合にも特徴抽出がうまく行われずそれだけ
不安定になる。On the other hand, in the above-mentioned second method, the collation range is raised as a candidate for outputting the individual character recognition result, and if there is a candidate to be collated among them.
Compared to the first example, since the unreadable portion is searched from the lower candidates of the individual character recognition result, the collation accuracy can be expected to be improved. For example, if “Higashimachi” is the correct answer as in the previous example and “Higashi” does not appear in the first place of the individual character recognition result as shown in FIG. There is a high possibility that it is "Higashimachi", and the collation can be performed with higher accuracy than in the first example above. However, even in this case, if both “East” and “Book” exist in the lower candidates, the reliability of the rank of the lower candidates is generally unstable even if the upper candidate is “East”. If the "book" appears in the upper rank or if there is no correct character in the candidates, the correct collation result cannot be obtained as in the first case. This is particularly noticeable in the case of this example in which there are a plurality of candidate characters that can be embedded and they are similar to each other. Even when the characters are crushed, the feature extraction is not performed well, and it becomes unstable.

【０００７】これは、個別文字認識部で読み取れない文
字パタンが、照合の結果補完可能であった場合に、本当
にその文字に補完してよいのかを何らかの手段で確認し
ていないことに起因する。This is because, if a character pattern that cannot be read by the individual character recognition unit can be complemented as a result of collation, it is not confirmed by any means whether or not the character can be complemented.

【０００８】本発明の目的は、上述のごとき照合手法を
持つ問題点を解決し、より高精度な後処理が行える文字
認識装置を提供することにある。An object of the present invention is to provide a character recognition device which solves the problems associated with the collating method as described above and can perform post-processing with higher accuracy.

【０００９】[0009]

【課題を解決するための手段】本発明の文字認識方法
は、入力文字パタンを２値化し、単語照合を行う文字認
識方法において、単語照合時に未照合となった文字に対
し、２値化の閾値を変えたパタンを再度生成し、個別の
文字認識を実行し、そこで得られる判別結果の新たな候
補を参照することにより単語照合を行うことを特徴とす
る。A character recognition method of the present invention is a character recognition method for binarizing an input character pattern and performing word matching, in which a character that has not been collated at the time of word matching is binarized. A feature is that word matching is performed by regenerating patterns with different thresholds, executing individual character recognition, and referring to new candidates of the discrimination result obtained there.

【００１０】また、本発明の文字認識装置は、入力文字
パタンを電気信号に変換する光電変換手段によって得ら
れた信号を格納するイメージデータ格納手段と、前記イ
メージデータ格納部に格納したイメージデータを２値化
し、同じ字種内での変動を吸収し、異なる字種間の違い
を強調する処理を行う２値化・正規化手段と、前記２値
化・正規化手段により処理を行った入力文字パタンか
ら、文字の判別に有効な特徴量を抽出する特徴抽出手段
と、前記特徴抽出手段によって抽出した文字の判別に有
効な特徴量を格納する特徴データ格納手段と、前記特徴
データ格納手段に格納した特徴量をもとに、該入力文字
パタンがどの字種に該当するかを判別し、それらを判定
結果として可能性の高い順に複数候補出力する文字認識
手段と、前記イメージデータ格納手段に格納されている
該入力文字パタンの図形情報を抽出する図形情報抽出手
段と、前記文字認識手段で得られた該入力文字パタンに
対する判定結果として得られた該複数候補に対し、入力
文字パタンの知識を用いて該入力文字パタンの前後のパ
タンに対する判定結果から照合を行い、該入力文字パタ
ンの該複数候補から１文字を認識結果として確定する後
処理手段とを備え、前記後処理手段において認識結果を
確定できなかった際に、前記イメージデータ格納部に格
納されている該入力文字パタンの図形情報を前記図形情
報抽出手段により抽出し、それが現在の種と異なるまで
前記２値化・正規化手段において２値化の閾値を変化さ
せて得られたパタンに対し、前記後処理手段において認
識結果が確定するまで前記特徴抽出手段における特徴抽
出、前記文字認識手段による判定、前記後処理手段によ
る修正を繰り返すことを特徴とする。Further, the character recognition device of the present invention stores the image data stored in the image data storage means and the image data storage means for storing the signal obtained by the photoelectric conversion means for converting the input character pattern into an electric signal. Binarization / normalization means for performing binarization, absorbing variations within the same character type, and emphasizing differences between different character types, and input processed by the binarization / normalization means A feature extraction unit that extracts a feature amount effective for discriminating characters from a character pattern, a feature data storage unit that stores the feature amount effective for discriminating characters extracted by the feature extraction unit, and the feature data storage unit. A character recognition unit that determines which character type the input character pattern corresponds to on the basis of the stored characteristic amount, and outputs a plurality of candidates in descending order of possibility as a determination result; The graphic information extraction means for extracting the graphic information of the input character pattern stored in the data storage means, and the input to the plurality of candidates obtained as the determination result for the input character pattern obtained by the character recognition means Post-processing means for performing collation based on judgment results of patterns before and after the input character pattern using knowledge of the character pattern, and for determining one character as a recognition result from the plurality of candidates of the input character pattern. When the recognition result cannot be confirmed by the means, the graphic information of the input character pattern stored in the image data storage unit is extracted by the graphic information extracting means, and the binary information is extracted until it is different from the current seed. For the pattern obtained by changing the binarization threshold value in the normalization / normalization means, the feature extraction procedure is performed until the recognition result is finalized in the post-processing means. Feature extraction in the determination by the character recognition means, and repeating the correction by the post-processing means.

【００１１】[0011]

【作用】本発明の作用を図４を用いて説明する。The operation of the present invention will be described with reference to FIG.

【００１２】図４のグラフは単語照合時に図６または図
７で引用した例で未照合だった文字パタン「東」の濃度
ヒストグラムであり、横軸は濃度レベル、縦軸は頻度を
表している。The graph of FIG. 4 is a density histogram of the character pattern "East" which has not been matched in the example quoted in FIG. 6 or 7 at the time of word matching. The horizontal axis represents the density level and the vertical axis represents the frequency. .

【００１３】最初の個別文字認識における２値化の閾値
をｓ１とし、その時に得られる２値パタンを図５（ａ）
に示す。つぶれが生じていて、「東」か「本」はこの時
点で個別文字認識手段では正確には判断できない。そこ
で、２値化の閾値を図４において線幅の細くなる方、ｓ
２へ変えることを行う。その時に得られる２値パタンを
図５（ｂ）に示す。このパタンにおいては、文字の潰れ
がなくストロークがきれいにでているので、安定な特徴
が抽出され、その結果個別の文字認識手段は「東」と正
しく判定できる。The threshold for binarization in the first individual character recognition is set to s1, and the binary pattern obtained at that time is shown in FIG.
Shown in Since the crushing has occurred, "East" or "Book" cannot be accurately determined by the individual character recognition means at this point. Therefore, the threshold for binarization is set to the one in which the line width becomes thin in FIG.
Change to 2. The binary pattern obtained at that time is shown in FIG. In this pattern, the characters are not crushed and the strokes are clear, so stable features are extracted, and as a result, the individual character recognition means can correctly determine "East".

【００１４】このように本発明によれば、単語照合時に
未照合となった文字に対し、２値化の閾値を変えたパタ
ンを再度生成し、個別の文字認識を実行し、そこで得ら
れる判別結果の新たな候補を参照することにより照合を
行うので、従来の技術では誤った照合を行う場合にも正
しい文字の照合が可能となる。As described above, according to the present invention, for a character which has not been collated at the time of word collation, a pattern in which the threshold for binarization is changed is generated again, individual character recognition is executed, and the discrimination obtained there is carried out. Since the collation is performed by referring to the new candidate of the result, the conventional technique enables the collation of the correct character even when the collation is incorrect.

【００１５】[0015]

【発明の実施の形態】次に本発明の第１実施例について
図面を用いて説明する。BEST MODE FOR CARRYING OUT THE INVENTION Next, a first embodiment of the present invention will be described with reference to the drawings.

【００１６】図１は本発明の実施例を示す文字認識装置
のブロック図、図２はその動作を示すフローチャートで
ある。FIG. 1 is a block diagram of a character recognition apparatus showing an embodiment of the present invention, and FIG. 2 is a flow chart showing its operation.

【００１７】本発明の文字認識装置は、入力文字パタン
を電気信号に変換する光電変換手段によって得られた信
号を格納するイメージデータ格納手段３０１と、前記イ
メージデータ格納手段３０１に格納したイメージデータ
を２値化し、同じ字種内での変動を吸収し、異なる字種
間の違いを強調するような処理を行う２値化・正規化手
段１０１、前記２値化・正規化手段１０１により処理を
行った入力パタンから、文字の判別に有効な特徴量を抽
出する特徴抽出手段１０２と、前記特徴抽出手段１０２
によって抽出した文字の判別に有効な特徴量を格納する
特徴データ格納手段３０２、前記特徴データ格納手段３
０２に格納した特徴量をもとに、該入力パタンがどの字
種に該当するかを判別し、それらを判定結果として可能
性の高い順に複数候補出力する文字認識手段１０３と、
前記イメージデータ格納手段３０１に格納されている該
入力パタンの図形情報を抽出する図形情報抽出手段１０
５と、前記文字認識手段１０３で得られた該入力パタン
に対する判定結果として得られた該複数候補に対し、入
力パタンの知識を用いて該入力パタンの前後のパタンに
対する判定結果から照合を行い、該入力パタンの該複数
候補から１文字を認識結果として確定する後処理手段１
０４と、前記２値化・正規化手段１０１、イメージデー
タ格納手段３０１、特徴抽出手段１０２、特徴データ格
納手段３０２、文字認識手段１０３、図形情報抽出手段
１０５間のデータのやりとりを行うデータバス４０１
と、前記２値化・正規化手段１０１、イメージデータ格
納手段３０１、特徴手段手段１０２、特徴データ格納手
段３０２、文字認識手段１０３、図形情報抽出手段１０
５を制御する制御手段２０１で構成されている。The character recognition apparatus of the present invention stores image data storage means 301 for storing a signal obtained by a photoelectric conversion means for converting an input character pattern into an electric signal, and image data stored in the image data storage means 301. Binarization / normalization means 101 that performs binarization, absorbs fluctuations within the same character type, and emphasizes differences between different character types, and the binarization / normalization means 101 performs the processing. Feature extraction means 102 for extracting a feature amount effective for character discrimination from the input pattern, and the feature extraction means 102.
Characteristic data storage means 302 for storing the characteristic amount effective for discrimination of the characters extracted by the above-mentioned characteristic data storage means 3
A character recognition unit 103 that determines which character type the input pattern corresponds to based on the feature amount stored in 02, and outputs a plurality of candidates in order of high possibility as a determination result;
Graphic information extraction means 10 for extracting graphic information of the input pattern stored in the image data storage means 301.
5 and the plurality of candidates obtained as the determination result for the input pattern obtained by the character recognition means 103 are collated from the determination results for the patterns before and after the input pattern using the knowledge of the input pattern, Post-processing means 1 for determining one character as a recognition result from the plurality of candidates of the input pattern
04, the binarization / normalization means 101, the image data storage means 301, the feature extraction means 102, the feature data storage means 302, the character recognition means 103, and the graphic information extraction means 105, and a data bus 401 for exchanging data.
And the binarization / normalization means 101, the image data storage means 301, the characteristic means means 102, the characteristic data storage means 302, the character recognition means 103, and the graphic information extraction means 10.
It is composed of a control means 201 for controlling the number 5.

【００１８】まず、光電変換された文字パタンが入力さ
れ、イメージデータ格納手段３０１に格納される（ステ
ップ２１）。First, the photoelectrically converted character pattern is input and stored in the image data storage means 301 (step 21).

【００１９】次に、２値化・正規化手段１０１におい
て、２値化を行い、さらに正規化を行う（ステップ２
２）。２値化は１文字分のイメージ全体で同一の閾値を
用いればよい。この際の閾値決定法としては例えば、
「大津展之：パターン認識における特徴抽出に関する数
理的研究，電総研研究報告第８１８号（１９９１．
７）」で述べられている、判別分析法を用いて実現可能
である。また、正規化については識別を容易にするた
め、同一字種内での変動が抑えられ、異なる字種間の差
が明確になればよく、例えば、「Ｊ．Ｔｓｕｋｕｍｏ，
ｅｔａｌ．：ｃｌａｓｓｉｆｉｃａｔｉｏｎｏｆ
ｈａｎｄｐｒｉｎｔｅｄＣｈｉｎｅｓｅＣｈａｒａ
ｃｔｅｒｓＵｓｉｎｇＮｏｎ−ｌｉｎｅａｒＮｏ
ｒｍａｌｉｚａｔｉｏｎａｎｄＣｏｒｒｅｌａｔｉ
ｏｎＭｅｔｈｏｄｓ，Ｐｒｏｃ．ｏｆ９ｔｈＩＣＰ
Ｒ（１９９０．７）」を用いて実現できる。Next, the binarization / normalization means 101 performs binarization and further normalization (step 2).
2). For binarization, the same threshold may be used for the entire image for one character. As a threshold determination method at this time, for example,
"Otsu Nobuyuki: Mathematical Research on Feature Extraction in Pattern Recognition, IEICE Research Report No. 818 (1991.
7) ”, which can be realized by using the discriminant analysis method. For normalization, in order to facilitate identification, it is sufficient to suppress fluctuations within the same character type and clarify differences between different character types. For example, "J. Tsukumo,
et al. : Classification of
handprinted China Chara
cters Using Non-linear No
rmalization and Correlati
on Methods, Proc. of9th ICP
R (1990.7) ”.

【００２０】次に、２値化、正規化が終わった文字パタ
ンに対し、特徴抽出を行う（ステップ２３）。文字の判
別に有効な特徴量が抽出できればよく、例えば、「濱中
他：手書き漢字認識における非線形正規化と特徴抽出の
整合性について，信学技報，ＰＲＵ９１−８５（１９９
１．１１）」を用いて実現できる。抽出した文字パタン
に対する特徴は、特徴データ格納手段３０２に格納す
る。Next, feature extraction is performed on the character pattern that has been binarized and normalized (step 23). It suffices that a feature amount effective for character discrimination can be extracted. For example, “Hamachu et al .: Consistency between nonlinear normalization and feature extraction in handwritten kanji recognition,” IEICE Technical Report, PRU 91-85 (199).
1.11) ”. The feature for the extracted character pattern is stored in the feature data storage unit 302.

【００２１】次に、文字認識手段１０３において、特徴
データ格納手段３０２に格納されている特徴量を用い
て、文字の判別を行う（ステップ２４）。抽出した特徴
データから、それがどの文字であるかを判別する。ステ
ップ２３で例示した特徴量を用いるとすると、例えば
「津雲淳：方向パタンマッチング法の改良と手書き漢字
認識への応用，信学技報，ＰＵＲ９０−２０（１９９
０．７）」が適当である。認識結果は信頼度の高い順に
複数、例えば１０候補出力する。Next, the character recognizing means 103 discriminates the character by using the characteristic amount stored in the characteristic data storing means 302 (step 24). From the extracted characteristic data, it is determined which character it is. If the feature quantity exemplified in step 23 is used, for example, “Atsushi Tsukumo: Improvement of direction pattern matching method and application to handwritten Chinese character recognition, IEICE Technical Report, PUR 90-20 (199).
0.7) ”is suitable. A plurality of recognition results, for example, 10 candidates are output in descending order of reliability.

【００２２】次に、後処理手段１０４において、認識結
果と知識との照合を行う（ステップ２５）。ここでは１
単語単位の認識結果を知識を用いて検証し、１文字毎の
個別の認識結果を修正する。この際、知識と一致しない
文字を検出し、最後に、照合結果を出力する（ステップ
２６）。Next, the post-processing means 104 collates the recognition result with the knowledge (step 25). Here 1
The recognition result for each word is verified using knowledge, and the individual recognition result for each character is corrected. At this time, a character that does not match the knowledge is detected, and finally the collation result is output (step 26).

【００２３】ステップ２５における処理手順の一例を図
３を用いて説明する。図３は文字毎の個別の認識結果と
知識との照合の一実現例を示すフローチャートである。An example of the processing procedure in step 25 will be described with reference to FIG. FIG. 3 is a flow chart showing an example of realizing an individual recognition result for each character and knowledge.

【００２４】まず、個別の認識結果が出力した候補と知
識との照合を行う（ステップ３１）。First, the candidate output from the individual recognition result is collated with the knowledge (step 31).

【００２５】次に、個別文字認識結果が出力した候補中
に照合文字が全てあるかを判定する（ステップ３２）。
すなわち、知識により照合した際、全文字が個別文字認
識の判定結果の候補中にあるかをチェックする。ここ
で、全文字が個別文字認識の判定結果の候補中にあれ
ば、処理を終了する。逆に照合の結果、知識と未対応、
すなわち、認識結果の候補中に知識で修正しようとした
文字が存在しない場合、該当する部分の文字パタンに対
して図形情報を抽出する（ステップ３３）。この図形情
報としては、例えば文字の線幅を抽出する。Next, it is determined whether or not there are all matching characters in the candidates output from the individual character recognition result (step 32).
That is, when matching is performed based on knowledge, it is checked whether all the characters are included in the candidates for the determination result of the individual character recognition. Here, if all the characters are in the candidates of the judgment result of the individual character recognition, the processing is ended. On the contrary, as a result of matching, knowledge and uncorrespondence,
That is, when the character to be corrected by knowledge does not exist in the recognition result candidates, graphic information is extracted from the character pattern of the corresponding portion (step 33). As this graphic information, for example, the line width of a character is extracted.

【００２６】次に、該当する部分の文字パタンの２値化
の閾値を文字線幅に応じて変化させる（ステップ３
４）。線幅が太ければ細くなる方向へ、逆に線幅が細け
れば太くなる方向へ変化させて線幅を抽出し（ステップ
３５）、再度線幅を検出する（ステップ３６）。これを
線幅が変化するまで繰り返す（ループ１）。尚、閾値の
変化のステップは任意でよい。線幅の変化が検出された
ら、そのパタンを再度個別の文字認識にかけ（ステップ
３７）、判定結果の候補を出力し（ステップ３８）、再
度単語照合を行い（ステップ３１）、候補中に照合した
い文字が現れるまで繰り返す（ループ２）。照合したい
文字が現れたら、照合結果を出力し、閾値を変更しても
最後まで照合したい文字が判定結果の候補中に現れなか
ったら、その部分は認識不能として照合できた部分と合
わせて出力する（ステップ３９）。Next, the threshold for binarizing the character pattern of the corresponding portion is changed according to the character line width (step 3).
4). If the line width is thick, the line width is reduced, and conversely, if the line width is thin, the line width is changed to be thick to extract the line width (step 35), and the line width is detected again (step 36). This is repeated until the line width changes (loop 1). The step of changing the threshold may be arbitrary. When a change in line width is detected, the pattern is again subjected to individual character recognition (step 37), candidates for the determination result are output (step 38), word matching is performed again (step 31), and it is desired to match in the candidates. Repeat until a character appears (loop 2). When the character you want to match appears, the matching result is output. If the character you want to match does not appear in the candidates for the judgment result until the end even if you change the threshold value, that part is output as unrecognizable and is also output. (Step 39).

【００２７】尚、図形情報としては前述の文字の線幅に
よらず、連結線分数、ループ数もしくはそれらの組み合
わせを用いても構わない。As the graphic information, the number of connecting line segments, the number of loops, or a combination thereof may be used regardless of the line width of the character.

【００２８】次に、本発明における第２の実施例を説明
する。第１の実施例においては、ステップ３２において
知識により照合した文字全てが個別文字認識の判定結果
の候補中に存在すれば、照合を終了し、結果を出力して
いた。これに対し、第２の実施例では照合した文字全て
が個別の文字認識の判定結果の第１位に上がるまで、上
述のパタンの２値化と再認識を繰り返すものである。個
別の文字認識を繰り返す際に２値化の閾値を変更するた
めに検出する図形情報としては、第１の実施例と同様、
文字の線幅、連結成分数、ループ数およびそれらの組み
合わせのいずれを用いても構わない。Next, a second embodiment of the present invention will be described. In the first embodiment, if all the characters collated by knowledge in step 32 are present in the candidates for the determination result of the individual character recognition, the collation is ended and the result is output. On the other hand, in the second embodiment, the binarization and re-recognition of the above pattern are repeated until all the collated characters reach the first place in the judgment result of the individual character recognition. As the graphic information detected to change the binarization threshold value when repeating the individual character recognition, as in the first embodiment,
Any of the character line width, the number of connected components, the number of loops, and a combination thereof may be used.

【００２９】次に、本発明における第３の実施例を説明
する。本実施例においてはステップ３２において、知識
により照合可能な文字が候補中に複数現れる場合にいず
れかの文字が個別の文字認識の判定結果の第１位に上が
るまで、上述のパタンの２値化と再認識を繰り返すもの
である。個別の文字認識を繰り返す際に２値化の閾値を
変更するために検出する図形情報としては、第１の実施
例と同様、文字の線幅、連結成分数、ループ数およびそ
れらの組み合わせのいずれを用いても構わない。Next, a third embodiment of the present invention will be described. In the present embodiment, in step 32, when a plurality of characters that can be collated by knowledge appear in a candidate, any one of the above characters is binarized until it reaches the first position in the judgment result of individual character recognition. It is to repeat recognition again. As in the first embodiment, the graphic information detected to change the binarization threshold value when repeating individual character recognition includes any of the line width of the character, the number of connected components, the number of loops, and a combination thereof. May be used.

【００３０】[0030]

【発明の効果】以上説明したように本発明は、照合した
い文字が個別文字認識の判定結果の候補中に現れない場
合に、該当部分のパタンの図形情報を抽出し、それが変
化するまで２値化の閾値を変化させ、判定結果候補に上
がるまで認識を繰り返すことにより、知識との完全な対
応が可能となり、文字の認識精度が著しく向上する効果
がある。As described above, according to the present invention, when the character to be collated does not appear in the candidate of the judgment result of the individual character recognition, the graphic information of the pattern of the corresponding portion is extracted, and the pattern information is changed until it changes. By changing the threshold of binarization and repeating the recognition until the judgment result candidate is reached, it is possible to completely correspond to the knowledge, and there is an effect that the recognition accuracy of the character is significantly improved.

[Brief description of the drawings]

【図１】本発明の一実施例の文字認識装置のブロック図
である。FIG. 1 is a block diagram of a character recognition device according to an embodiment of the present invention.

【図２】図１に示す文字認識装置の動作の一例を示すフ
ローチャートである。FIG. 2 is a flowchart showing an example of the operation of the character recognition device shown in FIG.

【図３】図１に示す文字認識装置の知識照合時の動作の
一例を示すフローチャートである。FIG. 3 is a flowchart showing an example of an operation of the character recognition device shown in FIG. 1 at the time of knowledge matching.

【図４】本発明の原理を説明するための説明図である。FIG. 4 is an explanatory diagram for explaining the principle of the present invention.

【図５】本発明の原理を説明するための説明図である。FIG. 5 is an explanatory diagram for explaining the principle of the present invention.

【図６】従来の文字認識装置の判定方法を示す図であ
る。FIG. 6 is a diagram showing a determination method of a conventional character recognition device.

【図７】従来の文字認識装置の別の判定方法を示す図で
ある。FIG. 7 is a diagram showing another determination method of the conventional character recognition device.

[Explanation of symbols]

１０１２値化・正規化手段１０２特徴抽出手段１０３文字認識手段１０４後処理手段１０５図形情報抽出手段２０１制御手段３０１イメージデータ格納手段３０２特徴データ格納手段４０１データバス 101 Binarization / Normalization Means 102 Feature Extraction Means 103 Character Recognition Means 104 Post-Processing Means 105 Graphic Information Extraction Means 201 Control Means 301 Image Data Storage Means 302 Feature Data Storage Means 401 Data Bus

Claims

[Claims]

1. A character recognition method in which an input character pattern is binarized and word matching is performed, a pattern in which a binarization threshold value is changed is regenerated for a character that has not been matched at the time of word matching, and an individual character is generated. A character recognition method characterized in that character recognition is performed, and word matching is performed by referring to a new candidate of a discrimination result obtained there.

2. The character recognition method according to claim 1, wherein the binarization threshold value is changed according to the character line width, the number of connecting line segments, or the number of loops.

3. An image data storage means for storing a signal obtained by a photoelectric conversion means for converting an input character pattern into an electric signal, and the image data stored in the image data storage portion are binarized to have the same character type. To distinguish the characters from the binarization / normalization means that absorbs the fluctuations in the above and emphasizes the difference between different character types, and the input character pattern processed by the binarization / normalization means. Feature extraction means for extracting an effective feature amount, feature data storage means for storing the feature amount effective for discrimination of the character extracted by the feature extraction means, and based on the feature amount stored in the feature data storage means A character recognition unit that determines which character type the input character pattern corresponds to and outputs a plurality of candidates in descending order of possibility as a determination result; The graphic information extraction means for extracting the graphic information of the input character pattern that is present, and the knowledge of the input character pattern for the plurality of candidates obtained as the determination result for the input character pattern obtained by the character recognition means. The post-processing means for performing collation based on the determination results of the patterns before and after the input character pattern and for determining one character as the recognition result from the plurality of candidates of the input character pattern, the post-processing means can determine the recognition result. If not, the graphic information of the input character pattern stored in the image data storage unit is extracted by the graphic information extracting means, and the binarizing / normalizing means outputs 2 until the graphic information is different from the current seed. For the pattern obtained by changing the threshold for binarization, the feature extraction by the feature extraction means, the sentence until the recognition result is confirmed by the post-processing means. Determination by the recognition unit, a character recognition apparatus characterized by repeating the correction by the post-processing means.

4. The input character pattern stored in the image data storage unit when the post-processing unit adopts a judgment result other than the first candidate of the judgment result output by the character recognition unit. The graphic information is extracted by the graphic information extraction means, the binarization threshold value is changed in the binarization / normalization means until it is different from the current value, and the pattern obtained in the feature extraction means is used for the obtained pattern. 4. The feature extraction, the determination by the character recognition means, and the correction by the post-processing means are repeated until the first candidate of the determination result is determined as the recognition result by the post-processing means. Character recognizer.

5. When the candidate of the judgment result output by the character recognition means is adopted by the post-processing means, if there are a plurality of collable characters among the candidates, the candidate is stored in the image data storage section. The graphic information of the input character pattern is extracted by the graphic information extraction means, the binarization threshold value is changed in the binarization / normalization means until it is different from the current value, and the obtained pattern is The feature extraction in the feature extraction unit, the determination by the character recognition unit, and the correction by the post-processing unit are repeated until the first candidate of the determination result is confirmed as the recognition result by the post-processing unit. The character recognition device according to claim 3.

6. The character recognition device according to claim 3, 4 or 5, wherein said graphic information extracting means extracts the line width of the image data, the number of connected components or the number of loops as the graphic information.