JP2973898B2

JP2973898B2 - Character recognition method and device

Info

Publication number: JP2973898B2
Application number: JP7296571A
Authority: JP
Inventors: 大輔西脇
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1995-11-15
Filing date: 1995-11-15
Publication date: 1999-11-08
Anticipated expiration: 2015-11-15
Also published as: JPH09138834A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、入力された文字を
自動的に読み取る文字認識装置に関する。The present invention relates to a character recognition device for automatically reading input characters.

【０００２】[0002]

【従来の技術】文字を高精度に自動認識する場合に、個
別の文字認識結果が出力する認識結果の候補に対して、
単語照合等の後処理を施すことで、全ての文字が完全に
読めない場合でも単語単位で読み取れる場合があり、氏
名や住所等の知識の利用が可能な局面では読み取り精度
向上のために積極的に用いられている。2. Description of the Related Art In the case of automatically recognizing characters with high precision, candidates for recognition results output from individual character recognition results are:
By performing post-processing such as word matching, even if all characters cannot be read completely, they may be able to be read in word units, and in situations where knowledge such as names and addresses can be used, aggressively improve reading accuracy It is used for

【０００３】以下に住所読み取りの具体例を２つ上げて
説明する。まず、特開平２−１０９１８７号公報「べた
書き住所の文字認識後処理方式」では、個別の文字認識
結果と住所辞書との照合に文字認識結果と住所辞書との
一致数を計算し、最も一致数の多い住所名を住所辞書か
ら取り出し、それを認識結果としている。Hereinafter, two specific examples of address reading will be described. First, in Japanese Patent Application Laid-Open No. 2-109187, "Post-processing method for character recognition of solid address", the number of matches between the character recognition result and the address dictionary is calculated for matching the individual character recognition result with the address dictionary, A large number of address names are extracted from the address dictionary and are used as recognition results.

【０００４】次に、特開昭６２−１０３７８５号公報
「文字読取装置」では、個別の文字認識結果が出力する
候補を照合範囲とし、その範囲内で住所辞書との照合を
行い、照合が完了した際に照合結果を認識結果としてい
る。Japanese Patent Laid-Open Publication No. Sho 62-103785 discloses a character reading device, in which a candidate output from an individual character recognition result is set as a collation range, and collation with an address dictionary is performed within the collation range. In this case, the matching result is regarded as the recognition result.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、上述１
番目のような、単純に一致数のみで判定しようとした場
合、例えば図６に示すように、「東町」が正解で「東」
がたまたま個別文字認識で読み取れず、同じ地域に「東
町」と「本町」があった場合には、一致数は個別文字認
識結果「？町」に対して「東町」、「本町」とも１で同
じとなり、住所の特定ができなくなる。これは、一致数
が個別文字の読み取れない部分に対してドントケアにな
ってることに起因する。However, the above-mentioned 1
As shown in FIG. 6, for example, as shown in FIG.
If "Higashimachi" and "Honmachi" are in the same area by chance and cannot be read by individual character recognition, the number of matches is 1 for both "Higashimachi" and "Honmachi" for the individual character recognition result "? It becomes the same, and the address cannot be specified. This is because the number of matches becomes don't care for an unreadable portion of the individual character.

【０００６】これに対して、上述２番目では、照合範囲
を個別の文字認識結果の出力する候補として、その中に
照合すべき候補があれば引き上げるというものである。
第１番目の例に比べ、読み取れない部分を個別文字認識
結果の下位の候補から探索する分、照合精度の向上が期
待できる。例えば、前例のように「東町」が正解で、図
７に示すように「東」が個別文字認識結果の第１位に現
れなくても、下位の候補に「東」が存在すれば、「東
町」である可能性はかなり高く、上述１番目の例よりも
高精度な照合が行える。しかしながら、この場合でも下
位の候補に「東」、「本」の両方が存在した場合、上位
の候補が「東」であったとしても、一般に下位候補の順
位の信頼度が不安定なことから確実ではなく、「本」が
上位に現れた場合や、候補中に正解文字が存在しない場
合には上述１番目と同様に正しい照合結果が得られな
い。埋め込みが可能な候補文字が複数あり、互いに類似
している本例のような場合は特に顕著である。文字がつ
ぶれている場合にも特徴抽出がうまく行われずそれだけ
不安定になる。[0006] On the other hand, in the above-mentioned second, the collation range is raised as a candidate for outputting the individual character recognition result, if there is a candidate to be collated.
Compared to the first example, the matching accuracy can be expected to be improved by searching for a part that cannot be read from candidates lower in the individual character recognition result. For example, as shown in FIG. 7, even if "Higashimachi" is the correct answer and "East" does not appear in the first place of the individual character recognition result as shown in FIG. The possibility of "Higashimachi" is quite high, and the matching can be performed with higher accuracy than the first example. However, even in this case, if both “East” and “Book” are present in the lower candidate, even if the upper candidate is “East”, the reliability of the rank of the lower candidate is generally unstable. If it is not certain, and "book" appears at the top, or if there is no correct character in the candidate, a correct collation result cannot be obtained as in the first case. This is particularly noticeable in the case where there are a plurality of candidate characters that can be embedded and these are similar to each other. Even if the characters are crushed, the feature extraction is not performed well and the character becomes unstable.

【０００７】これは、個別文字認識部で読み取れない文
字パタンが、照合の結果補完可能であった場合に、本当
にその文字に補完してよいのかを何らかの手段で確認し
ていないことに起因する。[0007] This is because if a character pattern that cannot be read by the individual character recognition unit can be complemented as a result of collation, it is not confirmed by any means whether or not the character can be complemented.

【０００８】本発明の目的は、上述のごとき照合手法を
持つ問題点を解決し、より高精度な後処理が行える文字
認識装置を提供することにある。It is an object of the present invention to provide a character recognition device which solves the above-mentioned problems with the collation method and can perform post-processing with higher accuracy.

【０００９】[0009]

【課題を解決するための手段】本発明の文字認識装置
は、入力文字パタンを電気信号に変換する光電変換手段
によって得られた信号を格納するイメージデータ格納手
段と、前記イメージデータ格納部に格納したイメージデ
ータを２値化し、同じ字種内での変動を吸収し、異なる
字種間の違いを強調する処理を行う２値化・正規化手段
と、前記２値化・正規化手段により処理を行った入力文
字パタンから、文字の判別に有効な特徴量を抽出する特
徴抽出手段と、前記特徴抽出手段によって抽出した文字
の判別に有効な特徴量を格納する特徴データ格納手段
と、前記特徴データ格納手段に格納した特徴量をもと
に、該入力文字パタンがどの字種に該当するかを判別
し、それらを判定結果として可能性の高い順に複数候補
出力する文字認識手段と、前記イメージデータ格納手段
に格納されている該入力文字パタンの図形情報を抽出す
る図形情報抽出手段と、前記文字認識手段で得られた該
入力文字パタンに対する判定結果として得られた該複数
候補に対し、入力文字パタンの知識を用いて該入力文字
パタンの前後のパタンに対する判定結果から照合を行
い、該入力文字パタンの該複数候補から１文字を認識結
果として確定する後処理手段とを備え、前記後処理手段
において認識結果を確定できなかった場合に、以下の
１．〜３．の処理を行うことを特徴とする。１．前記図形情報抽出手段により抽出した図形情報が変
化するまで、前記２値化・正規化手段における２値化の
閾値を変化させる２．前記図形情報の変化が検出された時点での２値化・
正規化された文字パタンに対して、前記文字認識手段に
より認識を行う３．前記１．〜２．の処理を、前記後処理手段において
認識結果が確定するまで繰り返す A character recognition device according to the present invention.
Is a photoelectric conversion means for converting an input character pattern into an electric signal
Image data storage means for storing the signal obtained by
And the image data stored in the image data storage unit.
Binarize data, absorb fluctuations within the same character type, and
Binarization / normalization means that emphasizes differences between character types
And an input sentence processed by the binarization / normalization means
A feature for extracting feature values effective for character discrimination from character patterns
Character extraction means and characters extracted by the feature extraction means
Feature data storage means for storing feature values effective for discrimination
And the feature amount stored in the feature data storage means.
To determine which character type the input character pattern corresponds to
Multiple candidates in descending order of possibility as the judgment result.
Character recognition means for outputting, and said image data storage means
Extract the graphic information of the input character pattern stored in the
Graphic information extracting means, and the character information obtained by the character recognizing means.
The plurality obtained as the determination result for the input character pattern
For the candidate, using the input character pattern knowledge, the input character
Matching is performed based on the judgment results for the patterns before and after the pattern.
And recognizes one character from the plurality of candidates for the input character pattern.
Post-processing means for determining the result
If the recognition result could not be determined in
1. ~ 3. Is performed. 1. The graphic information extracted by the graphic information extracting means is changed.
Until the binarization / normalization means
2 Ru by changing the threshold. Binarization at the time when the change of the graphic information is detected;
For the normalized character pattern, the character recognition means
2. Recognize more . 1. ~ 2. In the post-processing means
Repeat until the recognition result is determined

【００１０】[0010]

【００１１】[0011]

【作用】本発明の作用を図４を用いて説明する。The operation of the present invention will be described with reference to FIG.

【００１２】図４のグラフは単語照合時に図６または図
７で引用した例で未照合だった文字パタン「東」の濃度
ヒストグラムであり、横軸は濃度レベル、縦軸は頻度を
表している。The graph of FIG. 4 is a density histogram of the character pattern "east" which has not been verified in the example cited in FIG. 6 or 7 at the time of word verification, the horizontal axis represents the density level, and the vertical axis represents the frequency. .

【００１３】最初の個別文字認識における２値化の閾値
をｓ１とし、その時に得られる２値パタンを図５（ａ）
に示す。つぶれが生じていて、「東」か「本」はこの時
点で個別文字認識手段では正確には判断できない。そこ
で、２値化の閾値を図４において線幅の細くなる方、ｓ
２へ変えることを行う。その時に得られる２値パタンを
図５（ｂ）に示す。このパタンにおいては、文字の潰れ
がなくストロークがきれいにでているので、安定な特徴
が抽出され、その結果個別の文字認識手段は「東」と正
しく判定できる。A threshold value for binarization in the first individual character recognition is s1, and a binary pattern obtained at that time is shown in FIG.
Shown in Since the crushing has occurred, “East” or “Book” cannot be accurately determined at this time by the individual character recognition means. Therefore, the threshold value of the binarization is set such that the line width becomes smaller in FIG.
Perform the change to 2. FIG. 5B shows the binary pattern obtained at that time. In this pattern, since the strokes are clear and the characters are not crushed, a stable feature is extracted. As a result, the individual character recognition means can correctly determine "east".

【００１４】このように本発明によれば、単語照合時に
未照合となった文字に対し、２値化の閾値を変えたパタ
ンを再度生成し、個別の文字認識を実行し、そこで得ら
れる判別結果の新たな候補を参照することにより照合を
行うので、従来の技術では誤った照合を行う場合にも正
しい文字の照合が可能となる。As described above, according to the present invention, for a character that has not been collated during word collation, a pattern in which the threshold for binarization is changed is generated again, individual character recognition is executed, and the discrimination obtained therefrom is performed. Since the collation is performed by referring to a new candidate of the result, the conventional technology enables the collation of the correct character even when the erroneous collation is performed.

【００１５】[0015]

【発明の実施の形態】次に本発明の第１実施例について
図面を用いて説明する。Next, a first embodiment of the present invention will be described with reference to the drawings.

【００１６】図１は本発明の実施例を示す文字認識装置
のブロック図、図２はその動作を示すフローチャートで
ある。FIG. 1 is a block diagram of a character recognition apparatus showing an embodiment of the present invention, and FIG. 2 is a flowchart showing the operation thereof.

【００１７】本発明の文字認識装置は、入力文字パタン
を電気信号に変換する光電変換手段によって得られた信
号を格納するイメージデータ格納手段３０１と、前記イ
メージデータ格納手段３０１に格納したイメージデータ
を２値化し、同じ字種内での変動を吸収し、異なる字種
間の違いを強調するような処理を行う２値化・正規化手
段１０１、前記２値化・正規化手段１０１により処理を
行った入力パタンから、文字の判別に有効な特徴量を抽
出する特徴抽出手段１０２と、前記特徴抽出手段１０２
によって抽出した文字の判別に有効な特徴量を格納する
特徴データ格納手段３０２、前記特徴データ格納手段３
０２に格納した特徴量をもとに、該入力パタンがどの字
種に該当するかを判別し、それらを判定結果として可能
性の高い順に複数候補出力する文字認識手段１０３と、
前記イメージデータ格納手段３０１に格納されている該
入力パタンの図形情報を抽出する図形情報抽出手段１０
５と、前記文字認識手段１０３で得られた該入力パタン
に対する判定結果として得られた該複数候補に対し、入
力パタンの知識を用いて該入力パタンの前後のパタンに
対する判定結果から照合を行い、該入力パタンの該複数
候補から１文字を認識結果として確定する後処理手段１
０４と、前記２値化・正規化手段１０１、イメージデー
タ格納手段３０１、特徴抽出手段１０２、特徴データ格
納手段３０２、文字認識手段１０３、図形情報抽出手段
１０５間のデータのやりとりを行うデータバス４０１
と、前記２値化・正規化手段１０１、イメージデータ格
納手段３０１、特徴手段手段１０２、特徴データ格納手
段３０２、文字認識手段１０３、図形情報抽出手段１０
５を制御する制御手段２０１で構成されている。The character recognition device according to the present invention comprises an image data storage means 301 for storing a signal obtained by a photoelectric conversion means for converting an input character pattern into an electric signal, and an image data stored in the image data storage means 301. The binarization / normalization unit 101 performs a process of binarizing and absorbing variations within the same character type and emphasizing a difference between different character types. A feature extraction unit for extracting a feature amount effective for character discrimination from the input pattern;
Feature data storage means 302 for storing a feature amount effective for discriminating the character extracted by the above, and the feature data storage means 3
Character recognition means 103, which determines which character type the input pattern corresponds to based on the feature amount stored in the input pattern 02, and outputs a plurality of candidates as a determination result in a descending order of possibility.
Graphic information extracting means 10 for extracting graphic information of the input pattern stored in the image data storing means 301
5 and the plurality of candidates obtained as the determination result for the input pattern obtained by the character recognizing means 103 are compared from the determination results for patterns before and after the input pattern using knowledge of the input pattern, Post-processing means 1 for determining one character from the plurality of candidates of the input pattern as a recognition result
And a data bus 401 for exchanging data among the binarization / normalization means 101, the image data storage means 301, the feature extraction means 102, the feature data storage means 302, the character recognition means 103, and the graphic information extraction means 105.
The binarization / normalization means 101, the image data storage means 301, the characteristic means means 102, the characteristic data storage means 302, the character recognition means 103, and the graphic information extraction means 10
5 is constituted by control means 201 for controlling the control unit 5.

【００１８】まず、光電変換された文字パタンが入力さ
れ、イメージデータ格納手段３０１に格納される（ステ
ップ２１）。First, the photoelectrically converted character pattern is input and stored in the image data storage means 301 (step 21).

【００１９】次に、２値化・正規化手段１０１におい
て、２値化を行い、さらに正規化を行う（ステップ２
２）。２値化は１文字分のイメージ全体で同一の閾値を
用いればよい。この際の閾値決定法としては例えば、
「大津展之：パターン認識における特徴抽出に関する数
理的研究，電総研研究報告第８１８号（１９９１．
７）」で述べられている、判別分析法を用いて実現可能
である。また、正規化については識別を容易にするた
め、同一字種内での変動が抑えられ、異なる字種間の差
が明確になればよく、例えば、「Ｊ．Ｔｓｕｋｕｍｏ，
ｅｔａｌ．：ｃｌａｓｓｉｆｉｃａｔｉｏｎｏｆ
ｈａｎｄｐｒｉｎｔｅｄＣｈｉｎｅｓｅＣｈａｒａ
ｃｔｅｒｓＵｓｉｎｇＮｏｎ−ｌｉｎｅａｒＮｏ
ｒｍａｌｉｚａｔｉｏｎａｎｄＣｏｒｒｅｌａｔｉ
ｏｎＭｅｔｈｏｄｓ，Ｐｒｏｃ．ｏｆ９ｔｈＩＣＰ
Ｒ（１９９０．７）」を用いて実現できる。Next, the binarization / normalization means 101 performs binarization and further normalization (step 2).
2). Binarization may use the same threshold value for the entire image of one character. As a threshold value determination method at this time, for example,
"Nobuyuki Otsu: Mathematical Study on Feature Extraction in Pattern Recognition, AIST Research Report No. 818 (1991.
7), and can be realized using a discriminant analysis method. In addition, for normalization, in order to facilitate identification, fluctuations within the same character type may be suppressed and differences between different character types may be clarified. For example, “J. Tsukumo,
et al. : Classification of
handprinted Chinese Chara
cters Using Non-linear No
rmalization and Correlati
on Methods, Proc. of9th ICP
R (1990.7) ".

【００２０】次に、２値化、正規化が終わった文字パタ
ンに対し、特徴抽出を行う（ステップ２３）。文字の判
別に有効な特徴量が抽出できればよく、例えば、「濱中
他：手書き漢字認識における非線形正規化と特徴抽出の
整合性について，信学技報，ＰＲＵ９１−８５（１９９
１．１１）」を用いて実現できる。抽出した文字パタン
に対する特徴は、特徴データ格納手段３０２に格納す
る。Next, feature extraction is performed on the character pattern that has been binarized and normalized (step 23). It is sufficient that a feature amount effective for character discrimination can be extracted. For example, "Hamanaka et al .: Consistency between nonlinear normalization and feature extraction in handwritten kanji recognition," IEICE Technical Report, PRU91-85 (199)
1.11)). The feature for the extracted character pattern is stored in the feature data storage unit 302.

【００２１】次に、文字認識手段１０３において、特徴
データ格納手段３０２に格納されている特徴量を用い
て、文字の判別を行う（ステップ２４）。抽出した特徴
データから、それがどの文字であるかを判別する。ステ
ップ２３で例示した特徴量を用いるとすると、例えば
「津雲淳：方向パタンマッチング法の改良と手書き漢字
認識への応用，信学技報，ＰＵＲ９０−２０（１９９
０．７）」が適当である。認識結果は信頼度の高い順に
複数、例えば１０候補出力する。Next, the character recognizing means 103 determines the character using the characteristic amount stored in the characteristic data storing means 302 (step 24). From the extracted feature data, it is determined which character it is. Assuming that the feature amounts exemplified in step 23 are used, for example, “Jun Tsugumo: Improvement of Direction Pattern Matching Method and Application to Handwritten Kanji Recognition,” IEICE Technical Report, PUR90-20 (199)
0.7) "is appropriate. A plurality of recognition results, for example, 10 candidates are output in the order of higher reliability.

【００２２】次に、後処理手段１０４において、認識結
果と知識との照合を行う（ステップ２５）。ここでは１
単語単位の認識結果を知識を用いて検証し、１文字毎の
個別の認識結果を修正する。この際、知識と一致しない
文字を検出し、最後に、照合結果を出力する（ステップ
２６）。Next, in the post-processing means 104, the recognition result is compared with the knowledge (step 25). Here 1
The recognition result for each word is verified using knowledge, and the individual recognition result for each character is corrected. At this time, a character that does not match the knowledge is detected, and finally, a collation result is output (step 26).

【００２３】ステップ２５における処理手順の一例を図
３を用いて説明する。図３は文字毎の個別の認識結果と
知識との照合の一実現例を示すフローチャートである。An example of the processing procedure in step 25 will be described with reference to FIG. FIG. 3 is a flowchart showing an example of realizing the collation between the individual recognition result for each character and the knowledge.

【００２４】まず、個別の認識結果が出力した候補と知
識との照合を行う（ステップ３１）。First, the candidate output from the individual recognition result is collated with the knowledge (step 31).

【００２５】次に、個別文字認識結果が出力した候補中
に照合文字が全てあるかを判定する（ステップ３２）。
すなわち、知識により照合した際、全文字が個別文字認
識の判定結果の候補中にあるかをチェックする。ここ
で、全文字が個別文字認識の判定結果の候補中にあれ
ば、処理を終了する。逆に照合の結果、知識と未対応、
すなわち、認識結果の候補中に知識で修正しようとした
文字が存在しない場合、該当する部分の文字パタンに対
して図形情報を抽出する（ステップ３３）。この図形情
報としては、例えば文字の線幅を抽出する。Next, it is determined whether all of the collation characters are present in the candidates output as the individual character recognition results (step 32).
That is, when collating based on knowledge, it is checked whether all the characters are among the candidates of the determination result of the individual character recognition. Here, if all the characters are among the candidates of the determination result of the individual character recognition, the process is terminated. Conversely, as a result of matching,
That is, if there is no character to be corrected by knowledge among the candidates of the recognition result, the graphic information is extracted from the character pattern of the corresponding portion (step 33). As this graphic information, for example, the line width of a character is extracted.

【００２６】次に、該当する部分の文字パタンの２値化
の閾値を文字線幅に応じて変化させる（ステップ３
４）。線幅が太ければ細くなる方向へ、逆に線幅が細け
れば太くなる方向へ変化させて線幅を抽出し（ステップ
３５）、再度線幅を検出する（ステップ３６）。これを
線幅が変化するまで繰り返す（ループ１）。尚、閾値の
変化のステップは任意でよい。線幅の変化が検出された
ら、そのパタンを再度個別の文字認識にかけ（ステップ
３７）、判定結果の候補を出力し（ステップ３８）、再
度単語照合を行い（ステップ３１）、候補中に照合した
い文字が現れるまで繰り返す（ループ２）。照合したい
文字が現れたら、照合結果を出力し、閾値を変更しても
最後まで照合したい文字が判定結果の候補中に現れなか
ったら、その部分は認識不能として照合できた部分と合
わせて出力する（ステップ３９）。Next, the threshold for binarizing the character pattern of the corresponding portion is changed according to the character line width (step 3).
4). The line width is extracted by changing the line width in the direction of becoming thinner if the line width is large, and conversely, the line width is extracted in the direction of becoming thicker if the line width is small (step 35), and the line width is detected again (step 36). This is repeated until the line width changes (loop 1). The step of changing the threshold value may be optional. If a change in line width is detected, the pattern is subjected to individual character recognition again (step 37), candidates for the determination result are output (step 38), word matching is performed again (step 31), and matching is desired among the candidates. Repeat until a character appears (loop 2). If the character to be collated appears, the collation result is output. If the character to be collated until the end does not appear in the candidate of the decision result even if the threshold value is changed, the part is output together with the part which can be collated as unrecognizable. (Step 39).

【００２７】尚、図形情報としては前述の文字の線幅に
よらず、連結線分数、ループ数もしくはそれらの組み合
わせを用いても構わない。As the graphic information, the number of connected line segments, the number of loops, or a combination thereof may be used regardless of the line width of the character.

【００２８】次に、本発明における第２の実施例を説明
する。第１の実施例においては、ステップ３２において
知識により照合した文字全てが個別文字認識の判定結果
の候補中に存在すれば、照合を終了し、結果を出力して
いた。これに対し、第２の実施例では照合した文字全て
が個別の文字認識の判定結果の第１位に上がるまで、上
述のパタンの２値化と再認識を繰り返すものである。個
別の文字認識を繰り返す際に２値化の閾値を変更するた
めに検出する図形情報としては、第１の実施例と同様、
文字の線幅、連結成分数、ループ数およびそれらの組み
合わせのいずれを用いても構わない。Next, a second embodiment of the present invention will be described. In the first embodiment, if all the characters collated based on the knowledge in step 32 exist in the candidates for the determination result of the individual character recognition, the collation is terminated and the result is output. On the other hand, in the second embodiment, the above-described binarization and re-recognition of the pattern are repeated until all the collated characters reach the first place in the determination result of the individual character recognition. As the graphic information detected to change the binarization threshold when individual character recognition is repeated, as in the first embodiment,
Any of the line width of the character, the number of connected components, the number of loops, and a combination thereof may be used.

【００２９】次に、本発明における第３の実施例を説明
する。本実施例においてはステップ３２において、知識
により照合可能な文字が候補中に複数現れる場合にいず
れかの文字が個別の文字認識の判定結果の第１位に上が
るまで、上述のパタンの２値化と再認識を繰り返すもの
である。個別の文字認識を繰り返す際に２値化の閾値を
変更するために検出する図形情報としては、第１の実施
例と同様、文字の線幅、連結成分数、ループ数およびそ
れらの組み合わせのいずれを用いても構わない。Next, a third embodiment of the present invention will be described. In the present embodiment, in step 32, when a plurality of characters that can be collated by knowledge appear in the candidates, the above-described binarization of the pattern is performed until one of the characters rises to the first place in the determination result of the individual character recognition. And re-recognition is repeated. As the graphic information to be detected in order to change the binarization threshold when individual character recognition is repeated, as in the first embodiment, any one of the character line width, the number of connected components, the number of loops, and a combination thereof is used. May be used.

【００３０】[0030]

【発明の効果】以上説明したように本発明は、照合した
い文字が個別文字認識の判定結果の候補中に現れない場
合に、該当部分のパタンの図形情報を抽出し、それが変
化するまで２値化の閾値を変化させ、判定結果候補に上
がるまで認識を繰り返すことにより、知識との完全な対
応が可能となり、文字の認識精度が著しく向上する効果
がある。As described above, according to the present invention, when a character to be collated does not appear in a candidate for the judgment result of individual character recognition, graphic information of a pattern of a corresponding portion is extracted, and the pattern information is changed until it changes. By changing the threshold of the binarization and repeating the recognition until the result becomes a determination result candidate, it is possible to completely cope with the knowledge, and there is an effect that the accuracy of character recognition is remarkably improved.

[Brief description of the drawings]

【図１】本発明の一実施例の文字認識装置のブロック図
である。FIG. 1 is a block diagram of a character recognition device according to an embodiment of the present invention.

【図２】図１に示す文字認識装置の動作の一例を示すフ
ローチャートである。FIG. 2 is a flowchart showing an example of the operation of the character recognition device shown in FIG.

【図３】図１に示す文字認識装置の知識照合時の動作の
一例を示すフローチャートである。FIG. 3 is a flowchart showing an example of an operation at the time of knowledge collation of the character recognition device shown in FIG. 1;

【図４】本発明の原理を説明するための説明図である。FIG. 4 is an explanatory diagram for explaining the principle of the present invention.

【図５】本発明の原理を説明するための説明図である。FIG. 5 is an explanatory diagram for explaining the principle of the present invention.

【図６】従来の文字認識装置の判定方法を示す図であ
る。FIG. 6 is a diagram illustrating a determination method of a conventional character recognition device.

【図７】従来の文字認識装置の別の判定方法を示す図で
ある。FIG. 7 is a diagram showing another determination method of the conventional character recognition device.

[Explanation of symbols]

１０１２値化・正規化手段１０２特徴抽出手段１０３文字認識手段１０４後処理手段１０５図形情報抽出手段２０１制御手段３０１イメージデータ格納手段３０２特徴データ格納手段４０１データバス DESCRIPTION OF SYMBOLS 101 Binarization / normalization means 102 Feature extraction means 103 Character recognition means 104 Post-processing means 105 Graphic information extraction means 201 Control means 301 Image data storage means 302 Feature data storage means 401 Data bus

Claims

(57) [Claims]

1. An image data storage means for storing a signal obtained by a photoelectric conversion means for converting an input character pattern into an electric signal; Binarization that absorbs fluctuations in the text and emphasizes differences between different character types.
A normalization unit, a feature extraction unit that extracts a feature amount effective for character discrimination from the input character pattern processed by the binarization / normalization unit, and a character extraction unit that discriminates the character extracted by the feature extraction unit. A feature data storage unit that stores an effective feature amount; and a character type corresponding to the input character pattern is determined based on the feature amount stored in the feature data storage unit, and these can be determined as a determination result. Character recognition means for outputting a plurality of candidates in descending order of likelihood, graphic information extraction means for extracting graphic information of the input character pattern stored in the image data storage means, and the input character obtained by the character recognition means The plurality of candidates obtained as the determination result for the pattern are collated from the determination results for the patterns before and after the input character pattern using knowledge of the input character pattern, and the input is performed. And a post-processing means for determining a recognition result a character from the plurality of candidates of character patterns, it could not determine the recognition result in the post means situ
In this case, the following 1. ~ 3. Characterized by performing the processing of
Character recognition device. 1. The graphic information extracted by the graphic information extracting means is changed.
Until the binarization / normalization means
2 Ru by changing the threshold. Binarization at the time when the change of the graphic information is detected;
For the normalized character pattern, the character recognition means
2. Recognize more . 1. ~ 2. In the post-processing means
Repeat until the recognition result is determined

2. When the post-processing means adopts a judgment result output by the character recognition means other than the first candidate, the following 1. ~ 3. Characterized by performing
The character recognition device according to claim 1. 1. The graphic information extracted by the graphic information extracting means is changed.
Until the binarization / normalization means
2 Ru by changing the threshold. Binarization at the time when the change of the graphic information is detected;
For the normalized character pattern, the character recognition means
2. Recognize more . 1. ~ 2. In the post-processing means
Repeat until the recognition result is determined, and
The candidate is processed until the post-processing means determines the recognition result.
U

3. The method according to claim 1, wherein when the post-processing means adopts a candidate for the judgment result output by the character recognition means, if there are a plurality of collatable characters in the candidate, the following 1. ~ 3. Processing
The character recognition device according to claim 1, wherein the character recognition is performed.
1. The graphic information extracted by the graphic information extracting means is changed.
Until the binarization / normalization means
2 Ru by changing the threshold. Binarization at the time when the change of the graphic information is detected;
For the normalized character pattern, the character recognition means
2. Recognize more . 1. ~ 2. In the post-processing means
Repeat until the recognition result is determined, and
The candidate is processed until the post-processing means determines the recognition result.
U

4. A character recognition apparatus according to claim 1 , wherein said graphic information extracting means extracts the line width, the number of connected components or the number of loops of the image data as the graphic information.