JPH09138834A - Character recognition device/method - Google Patents

Character recognition device/method

Info

Publication number
JPH09138834A
JPH09138834A JP7296571A JP29657195A JPH09138834A JP H09138834 A JPH09138834 A JP H09138834A JP 7296571 A JP7296571 A JP 7296571A JP 29657195 A JP29657195 A JP 29657195A JP H09138834 A JPH09138834 A JP H09138834A
Authority
JP
Japan
Prior art keywords
character
pattern
character recognition
binarization
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP7296571A
Other languages
Japanese (ja)
Other versions
JP2973898B2 (en
Inventor
Daisuke Nishiwaki
大輔 西脇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to JP7296571A priority Critical patent/JP2973898B2/en
Publication of JPH09138834A publication Critical patent/JPH09138834A/en
Application granted granted Critical
Publication of JP2973898B2 publication Critical patent/JP2973898B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PROBLEM TO BE SOLVED: To highly precisely execute a post-processing by generating again a pattern whose threshold of binarization is changed for a character which is not collated yet, executing individual character recognition and referring to the new candidate of a discriminated result obtained by character recognition to execute word collation. SOLUTION: A character pattern which is photo-electrically converted is inputted and it is stored in an image data storage means 301. A binarization/ normalization means 101 executes binarization/normalization by using the same threshold by the whole image for one character. A feature extraction means 102 extracts feature quantity from the input pattern and stores it in a feature data storage means 302. A character recognition means 103 discriminates the character by using the feature quantity. Then, a post-processing means 104 collates a recognized result with knowledge for the plural candidates obtained by the character recognition means 103. The pattern whose threshold of binarization is changed is generated again for the character which is not collated yet. Individual character recognition is executed and the word is collated by referring to the new candidate of the discriminated result obtained by character recognition.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【発明の属する技術分野】本発明は、入力された文字を
自動的に読み取る文字認識装置に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device for automatically reading input characters.

【0002】[0002]

【従来の技術】文字を高精度に自動認識する場合に、個
別の文字認識結果が出力する認識結果の候補に対して、
単語照合等の後処理を施すことで、全ての文字が完全に
読めない場合でも単語単位で読み取れる場合があり、氏
名や住所等の知識の利用が可能な局面では読み取り精度
向上のために積極的に用いられている。
2. Description of the Related Art In the case of automatically recognizing a character with high accuracy, individual character recognition results are output as recognition result candidates.
By performing post-processing such as word matching, even if all characters are not completely readable, it may be possible to read in word units, and in the situation where knowledge such as name and address is available, positively improve reading accuracy. Is used for.

【0003】以下に住所読み取りの具体例を2つ上げて
説明する。まず、特開平2−109187号公報「べた
書き住所の文字認識後処理方式」では、個別の文字認識
結果と住所辞書との照合に文字認識結果と住所辞書との
一致数を計算し、最も一致数の多い住所名を住所辞書か
ら取り出し、それを認識結果としている。
Two specific examples of address reading will be described below. First, in Japanese Unexamined Patent Application Publication No. 2-109187, "Post-character recognition post-processing method for solid address," the number of matches between the character recognition result and the address dictionary is calculated to match the individual character recognition result with the address dictionary, and the best match is obtained. A large number of address names are extracted from the address dictionary and are used as recognition results.

【0004】次に、特開昭62−103785号公報
「文字読取装置」では、個別の文字認識結果が出力する
候補を照合範囲とし、その範囲内で住所辞書との照合を
行い、照合が完了した際に照合結果を認識結果としてい
る。
Next, in Japanese Patent Laid-Open No. 62-103785, "Character Reader", the candidates output by individual character recognition results are set as a collation range, and collation with an address dictionary is performed within the range, and collation is completed. The matching result is used as the recognition result.

【0005】[0005]

【発明が解決しようとする課題】しかしながら、上述1
番目のような、単純に一致数のみで判定しようとした場
合、例えば図6に示すように、「東町」が正解で「東」
がたまたま個別文字認識で読み取れず、同じ地域に「東
町」と「本町」があった場合には、一致数は個別文字認
識結果「?町」に対して「東町」、「本町」とも1で同
じとなり、住所の特定ができなくなる。これは、一致数
が個別文字の読み取れない部分に対してドントケアにな
ってることに起因する。
However, the above-mentioned 1
If you try to make a judgment based only on the number of coincidences as in the second example, for example, as shown in FIG. 6, “Higashimachi” is the correct answer and “East”
When it happened that individual characters could not be read and there were "Higashimachi" and "Honmachi" in the same area, the number of matches was 1 for both "Higashimachi" and "Honmachi" for the individual character recognition result "? Machi". It will be the same and the address cannot be specified. This is because the number of matches is don't care for the unreadable part of the individual character.

【0006】これに対して、上述2番目では、照合範囲
を個別の文字認識結果の出力する候補として、その中に
照合すべき候補があれば引き上げるというものである。
第1番目の例に比べ、読み取れない部分を個別文字認識
結果の下位の候補から探索する分、照合精度の向上が期
待できる。例えば、前例のように「東町」が正解で、図
7に示すように「東」が個別文字認識結果の第1位に現
れなくても、下位の候補に「東」が存在すれば、「東
町」である可能性はかなり高く、上述1番目の例よりも
高精度な照合が行える。しかしながら、この場合でも下
位の候補に「東」、「本」の両方が存在した場合、上位
の候補が「東」であったとしても、一般に下位候補の順
位の信頼度が不安定なことから確実ではなく、「本」が
上位に現れた場合や、候補中に正解文字が存在しない場
合には上述1番目と同様に正しい照合結果が得られな
い。埋め込みが可能な候補文字が複数あり、互いに類似
している本例のような場合は特に顕著である。文字がつ
ぶれている場合にも特徴抽出がうまく行われずそれだけ
不安定になる。
On the other hand, in the above-mentioned second method, the collation range is raised as a candidate for outputting the individual character recognition result, and if there is a candidate to be collated among them.
Compared to the first example, since the unreadable portion is searched from the lower candidates of the individual character recognition result, the collation accuracy can be expected to be improved. For example, if “Higashimachi” is the correct answer as in the previous example and “Higashi” does not appear in the first place of the individual character recognition result as shown in FIG. There is a high possibility that it is "Higashimachi", and the collation can be performed with higher accuracy than in the first example above. However, even in this case, if both “East” and “Book” exist in the lower candidates, the reliability of the rank of the lower candidates is generally unstable even if the upper candidate is “East”. If the "book" appears in the upper rank or if there is no correct character in the candidates, the correct collation result cannot be obtained as in the first case. This is particularly noticeable in the case of this example in which there are a plurality of candidate characters that can be embedded and they are similar to each other. Even when the characters are crushed, the feature extraction is not performed well, and it becomes unstable.

【0007】これは、個別文字認識部で読み取れない文
字パタンが、照合の結果補完可能であった場合に、本当
にその文字に補完してよいのかを何らかの手段で確認し
ていないことに起因する。
This is because, if a character pattern that cannot be read by the individual character recognition unit can be complemented as a result of collation, it is not confirmed by any means whether or not the character can be complemented.

【0008】本発明の目的は、上述のごとき照合手法を
持つ問題点を解決し、より高精度な後処理が行える文字
認識装置を提供することにある。
An object of the present invention is to provide a character recognition device which solves the problems associated with the collating method as described above and can perform post-processing with higher accuracy.

【0009】[0009]

【課題を解決するための手段】本発明の文字認識方法
は、入力文字パタンを2値化し、単語照合を行う文字認
識方法において、単語照合時に未照合となった文字に対
し、2値化の閾値を変えたパタンを再度生成し、個別の
文字認識を実行し、そこで得られる判別結果の新たな候
補を参照することにより単語照合を行うことを特徴とす
る。
A character recognition method of the present invention is a character recognition method for binarizing an input character pattern and performing word matching, in which a character that has not been collated at the time of word matching is binarized. A feature is that word matching is performed by regenerating patterns with different thresholds, executing individual character recognition, and referring to new candidates of the discrimination result obtained there.

【0010】また、本発明の文字認識装置は、入力文字
パタンを電気信号に変換する光電変換手段によって得ら
れた信号を格納するイメージデータ格納手段と、前記イ
メージデータ格納部に格納したイメージデータを2値化
し、同じ字種内での変動を吸収し、異なる字種間の違い
を強調する処理を行う2値化・正規化手段と、前記2値
化・正規化手段により処理を行った入力文字パタンか
ら、文字の判別に有効な特徴量を抽出する特徴抽出手段
と、前記特徴抽出手段によって抽出した文字の判別に有
効な特徴量を格納する特徴データ格納手段と、前記特徴
データ格納手段に格納した特徴量をもとに、該入力文字
パタンがどの字種に該当するかを判別し、それらを判定
結果として可能性の高い順に複数候補出力する文字認識
手段と、前記イメージデータ格納手段に格納されている
該入力文字パタンの図形情報を抽出する図形情報抽出手
段と、前記文字認識手段で得られた該入力文字パタンに
対する判定結果として得られた該複数候補に対し、入力
文字パタンの知識を用いて該入力文字パタンの前後のパ
タンに対する判定結果から照合を行い、該入力文字パタ
ンの該複数候補から1文字を認識結果として確定する後
処理手段とを備え、前記後処理手段において認識結果を
確定できなかった際に、前記イメージデータ格納部に格
納されている該入力文字パタンの図形情報を前記図形情
報抽出手段により抽出し、それが現在の種と異なるまで
前記2値化・正規化手段において2値化の閾値を変化さ
せて得られたパタンに対し、前記後処理手段において認
識結果が確定するまで前記特徴抽出手段における特徴抽
出、前記文字認識手段による判定、前記後処理手段によ
る修正を繰り返すことを特徴とする。
Further, the character recognition device of the present invention stores the image data stored in the image data storage means and the image data storage means for storing the signal obtained by the photoelectric conversion means for converting the input character pattern into an electric signal. Binarization / normalization means for performing binarization, absorbing variations within the same character type, and emphasizing differences between different character types, and input processed by the binarization / normalization means A feature extraction unit that extracts a feature amount effective for discriminating characters from a character pattern, a feature data storage unit that stores the feature amount effective for discriminating characters extracted by the feature extraction unit, and the feature data storage unit. A character recognition unit that determines which character type the input character pattern corresponds to on the basis of the stored characteristic amount, and outputs a plurality of candidates in descending order of possibility as a determination result; The graphic information extraction means for extracting the graphic information of the input character pattern stored in the data storage means, and the input to the plurality of candidates obtained as the determination result for the input character pattern obtained by the character recognition means Post-processing means for performing collation based on judgment results of patterns before and after the input character pattern using knowledge of the character pattern, and for determining one character as a recognition result from the plurality of candidates of the input character pattern. When the recognition result cannot be confirmed by the means, the graphic information of the input character pattern stored in the image data storage unit is extracted by the graphic information extracting means, and the binary information is extracted until it is different from the current seed. For the pattern obtained by changing the binarization threshold value in the normalization / normalization means, the feature extraction procedure is performed until the recognition result is finalized in the post-processing means. Feature extraction in the determination by the character recognition means, and repeating the correction by the post-processing means.

【0011】[0011]

【作用】本発明の作用を図4を用いて説明する。The operation of the present invention will be described with reference to FIG.

【0012】図4のグラフは単語照合時に図6または図
7で引用した例で未照合だった文字パタン「東」の濃度
ヒストグラムであり、横軸は濃度レベル、縦軸は頻度を
表している。
The graph of FIG. 4 is a density histogram of the character pattern "East" which has not been matched in the example quoted in FIG. 6 or 7 at the time of word matching. The horizontal axis represents the density level and the vertical axis represents the frequency. .

【0013】最初の個別文字認識における2値化の閾値
をs1とし、その時に得られる2値パタンを図5(a)
に示す。つぶれが生じていて、「東」か「本」はこの時
点で個別文字認識手段では正確には判断できない。そこ
で、2値化の閾値を図4において線幅の細くなる方、s
2へ変えることを行う。その時に得られる2値パタンを
図5(b)に示す。このパタンにおいては、文字の潰れ
がなくストロークがきれいにでているので、安定な特徴
が抽出され、その結果個別の文字認識手段は「東」と正
しく判定できる。
The threshold for binarization in the first individual character recognition is set to s1, and the binary pattern obtained at that time is shown in FIG.
Shown in Since the crushing has occurred, "East" or "Book" cannot be accurately determined by the individual character recognition means at this point. Therefore, the threshold for binarization is set to the one in which the line width becomes thin in FIG.
Change to 2. The binary pattern obtained at that time is shown in FIG. In this pattern, the characters are not crushed and the strokes are clear, so stable features are extracted, and as a result, the individual character recognition means can correctly determine "East".

【0014】このように本発明によれば、単語照合時に
未照合となった文字に対し、2値化の閾値を変えたパタ
ンを再度生成し、個別の文字認識を実行し、そこで得ら
れる判別結果の新たな候補を参照することにより照合を
行うので、従来の技術では誤った照合を行う場合にも正
しい文字の照合が可能となる。
As described above, according to the present invention, for a character which has not been collated at the time of word collation, a pattern in which the threshold for binarization is changed is generated again, individual character recognition is executed, and the discrimination obtained there is carried out. Since the collation is performed by referring to the new candidate of the result, the conventional technique enables the collation of the correct character even when the collation is incorrect.

【0015】[0015]

【発明の実施の形態】次に本発明の第1実施例について
図面を用いて説明する。
BEST MODE FOR CARRYING OUT THE INVENTION Next, a first embodiment of the present invention will be described with reference to the drawings.

【0016】図1は本発明の実施例を示す文字認識装置
のブロック図、図2はその動作を示すフローチャートで
ある。
FIG. 1 is a block diagram of a character recognition apparatus showing an embodiment of the present invention, and FIG. 2 is a flow chart showing its operation.

【0017】本発明の文字認識装置は、入力文字パタン
を電気信号に変換する光電変換手段によって得られた信
号を格納するイメージデータ格納手段301と、前記イ
メージデータ格納手段301に格納したイメージデータ
を2値化し、同じ字種内での変動を吸収し、異なる字種
間の違いを強調するような処理を行う2値化・正規化手
段101、前記2値化・正規化手段101により処理を
行った入力パタンから、文字の判別に有効な特徴量を抽
出する特徴抽出手段102と、前記特徴抽出手段102
によって抽出した文字の判別に有効な特徴量を格納する
特徴データ格納手段302、前記特徴データ格納手段3
02に格納した特徴量をもとに、該入力パタンがどの字
種に該当するかを判別し、それらを判定結果として可能
性の高い順に複数候補出力する文字認識手段103と、
前記イメージデータ格納手段301に格納されている該
入力パタンの図形情報を抽出する図形情報抽出手段10
5と、前記文字認識手段103で得られた該入力パタン
に対する判定結果として得られた該複数候補に対し、入
力パタンの知識を用いて該入力パタンの前後のパタンに
対する判定結果から照合を行い、該入力パタンの該複数
候補から1文字を認識結果として確定する後処理手段1
04と、前記2値化・正規化手段101、イメージデー
タ格納手段301、特徴抽出手段102、特徴データ格
納手段302、文字認識手段103、図形情報抽出手段
105間のデータのやりとりを行うデータバス401
と、前記2値化・正規化手段101、イメージデータ格
納手段301、特徴手段手段102、特徴データ格納手
段302、文字認識手段103、図形情報抽出手段10
5を制御する制御手段201で構成されている。
The character recognition apparatus of the present invention stores image data storage means 301 for storing a signal obtained by a photoelectric conversion means for converting an input character pattern into an electric signal, and image data stored in the image data storage means 301. Binarization / normalization means 101 that performs binarization, absorbs fluctuations within the same character type, and emphasizes differences between different character types, and the binarization / normalization means 101 performs the processing. Feature extraction means 102 for extracting a feature amount effective for character discrimination from the input pattern, and the feature extraction means 102.
Characteristic data storage means 302 for storing the characteristic amount effective for discrimination of the characters extracted by the above-mentioned characteristic data storage means 3
A character recognition unit 103 that determines which character type the input pattern corresponds to based on the feature amount stored in 02, and outputs a plurality of candidates in order of high possibility as a determination result;
Graphic information extraction means 10 for extracting graphic information of the input pattern stored in the image data storage means 301.
5 and the plurality of candidates obtained as the determination result for the input pattern obtained by the character recognition means 103 are collated from the determination results for the patterns before and after the input pattern using the knowledge of the input pattern, Post-processing means 1 for determining one character as a recognition result from the plurality of candidates of the input pattern
04, the binarization / normalization means 101, the image data storage means 301, the feature extraction means 102, the feature data storage means 302, the character recognition means 103, and the graphic information extraction means 105, and a data bus 401 for exchanging data.
And the binarization / normalization means 101, the image data storage means 301, the characteristic means means 102, the characteristic data storage means 302, the character recognition means 103, and the graphic information extraction means 10.
It is composed of a control means 201 for controlling the number 5.

【0018】まず、光電変換された文字パタンが入力さ
れ、イメージデータ格納手段301に格納される(ステ
ップ21)。
First, the photoelectrically converted character pattern is input and stored in the image data storage means 301 (step 21).

【0019】次に、2値化・正規化手段101におい
て、2値化を行い、さらに正規化を行う(ステップ2
2)。2値化は1文字分のイメージ全体で同一の閾値を
用いればよい。この際の閾値決定法としては例えば、
「大津展之:パターン認識における特徴抽出に関する数
理的研究,電総研研究報告 第818号 (1991.
7)」で述べられている、判別分析法を用いて実現可能
である。また、正規化については識別を容易にするた
め、同一字種内での変動が抑えられ、異なる字種間の差
が明確になればよく、例えば、「J.Tsukumo,
et al.:classification of
handprinted Chinese Chara
cters Using Non−linear No
rmalization and Correlati
on Methods,Proc.of9th ICP
R(1990.7)」を用いて実現できる。
Next, the binarization / normalization means 101 performs binarization and further normalization (step 2).
2). For binarization, the same threshold may be used for the entire image for one character. As a threshold determination method at this time, for example,
"Otsu Nobuyuki: Mathematical Research on Feature Extraction in Pattern Recognition, IEICE Research Report No. 818 (1991.
7) ”, which can be realized by using the discriminant analysis method. For normalization, in order to facilitate identification, it is sufficient to suppress fluctuations within the same character type and clarify differences between different character types. For example, "J. Tsukumo,
et al. : Classification of
handprinted China Chara
cters Using Non-linear No
rmalization and Correlati
on Methods, Proc. of9th ICP
R (1990.7) ”.

【0020】次に、2値化、正規化が終わった文字パタ
ンに対し、特徴抽出を行う(ステップ23)。文字の判
別に有効な特徴量が抽出できればよく、例えば、「濱中
他:手書き漢字認識における非線形正規化と特徴抽出の
整合性について,信学技報,PRU91−85(199
1.11)」を用いて実現できる。抽出した文字パタン
に対する特徴は、特徴データ格納手段302に格納す
る。
Next, feature extraction is performed on the character pattern that has been binarized and normalized (step 23). It suffices that a feature amount effective for character discrimination can be extracted. For example, “Hamachu et al .: Consistency between nonlinear normalization and feature extraction in handwritten kanji recognition,” IEICE Technical Report, PRU 91-85 (199).
1.11) ”. The feature for the extracted character pattern is stored in the feature data storage unit 302.

【0021】次に、文字認識手段103において、特徴
データ格納手段302に格納されている特徴量を用い
て、文字の判別を行う(ステップ24)。抽出した特徴
データから、それがどの文字であるかを判別する。ステ
ップ23で例示した特徴量を用いるとすると、例えば
「津雲淳:方向パタンマッチング法の改良と手書き漢字
認識への応用,信学技報,PUR90−20(199
0.7)」が適当である。認識結果は信頼度の高い順に
複数、例えば10候補出力する。
Next, the character recognizing means 103 discriminates the character by using the characteristic amount stored in the characteristic data storing means 302 (step 24). From the extracted characteristic data, it is determined which character it is. If the feature quantity exemplified in step 23 is used, for example, “Atsushi Tsukumo: Improvement of direction pattern matching method and application to handwritten Chinese character recognition, IEICE Technical Report, PUR 90-20 (199).
0.7) ”is suitable. A plurality of recognition results, for example, 10 candidates are output in descending order of reliability.

【0022】次に、後処理手段104において、認識結
果と知識との照合を行う(ステップ25)。ここでは1
単語単位の認識結果を知識を用いて検証し、1文字毎の
個別の認識結果を修正する。この際、知識と一致しない
文字を検出し、最後に、照合結果を出力する(ステップ
26)。
Next, the post-processing means 104 collates the recognition result with the knowledge (step 25). Here 1
The recognition result for each word is verified using knowledge, and the individual recognition result for each character is corrected. At this time, a character that does not match the knowledge is detected, and finally the collation result is output (step 26).

【0023】ステップ25における処理手順の一例を図
3を用いて説明する。図3は文字毎の個別の認識結果と
知識との照合の一実現例を示すフローチャートである。
An example of the processing procedure in step 25 will be described with reference to FIG. FIG. 3 is a flow chart showing an example of realizing an individual recognition result for each character and knowledge.

【0024】まず、個別の認識結果が出力した候補と知
識との照合を行う(ステップ31)。
First, the candidate output from the individual recognition result is collated with the knowledge (step 31).

【0025】次に、個別文字認識結果が出力した候補中
に照合文字が全てあるかを判定する(ステップ32)。
すなわち、知識により照合した際、全文字が個別文字認
識の判定結果の候補中にあるかをチェックする。ここ
で、全文字が個別文字認識の判定結果の候補中にあれ
ば、処理を終了する。逆に照合の結果、知識と未対応、
すなわち、認識結果の候補中に知識で修正しようとした
文字が存在しない場合、該当する部分の文字パタンに対
して図形情報を抽出する(ステップ33)。この図形情
報としては、例えば文字の線幅を抽出する。
Next, it is determined whether or not there are all matching characters in the candidates output from the individual character recognition result (step 32).
That is, when matching is performed based on knowledge, it is checked whether all the characters are included in the candidates for the determination result of the individual character recognition. Here, if all the characters are in the candidates of the judgment result of the individual character recognition, the processing is ended. On the contrary, as a result of matching, knowledge and uncorrespondence,
That is, when the character to be corrected by knowledge does not exist in the recognition result candidates, graphic information is extracted from the character pattern of the corresponding portion (step 33). As this graphic information, for example, the line width of a character is extracted.

【0026】次に、該当する部分の文字パタンの2値化
の閾値を文字線幅に応じて変化させる(ステップ3
4)。線幅が太ければ細くなる方向へ、逆に線幅が細け
れば太くなる方向へ変化させて線幅を抽出し(ステップ
35)、再度線幅を検出する(ステップ36)。これを
線幅が変化するまで繰り返す(ループ1)。尚、閾値の
変化のステップは任意でよい。線幅の変化が検出された
ら、そのパタンを再度個別の文字認識にかけ(ステップ
37)、判定結果の候補を出力し(ステップ38)、再
度単語照合を行い(ステップ31)、候補中に照合した
い文字が現れるまで繰り返す(ループ2)。照合したい
文字が現れたら、照合結果を出力し、閾値を変更しても
最後まで照合したい文字が判定結果の候補中に現れなか
ったら、その部分は認識不能として照合できた部分と合
わせて出力する(ステップ39)。
Next, the threshold for binarizing the character pattern of the corresponding portion is changed according to the character line width (step 3).
4). If the line width is thick, the line width is reduced, and conversely, if the line width is thin, the line width is changed to be thick to extract the line width (step 35), and the line width is detected again (step 36). This is repeated until the line width changes (loop 1). The step of changing the threshold may be arbitrary. When a change in line width is detected, the pattern is again subjected to individual character recognition (step 37), candidates for the determination result are output (step 38), word matching is performed again (step 31), and it is desired to match in the candidates. Repeat until a character appears (loop 2). When the character you want to match appears, the matching result is output. If the character you want to match does not appear in the candidates for the judgment result until the end even if you change the threshold value, that part is output as unrecognizable and is also output. (Step 39).

【0027】尚、図形情報としては前述の文字の線幅に
よらず、連結線分数、ループ数もしくはそれらの組み合
わせを用いても構わない。
As the graphic information, the number of connecting line segments, the number of loops, or a combination thereof may be used regardless of the line width of the character.

【0028】次に、本発明における第2の実施例を説明
する。第1の実施例においては、ステップ32において
知識により照合した文字全てが個別文字認識の判定結果
の候補中に存在すれば、照合を終了し、結果を出力して
いた。これに対し、第2の実施例では照合した文字全て
が個別の文字認識の判定結果の第1位に上がるまで、上
述のパタンの2値化と再認識を繰り返すものである。個
別の文字認識を繰り返す際に2値化の閾値を変更するた
めに検出する図形情報としては、第1の実施例と同様、
文字の線幅、連結成分数、ループ数およびそれらの組み
合わせのいずれを用いても構わない。
Next, a second embodiment of the present invention will be described. In the first embodiment, if all the characters collated by knowledge in step 32 are present in the candidates for the determination result of the individual character recognition, the collation is ended and the result is output. On the other hand, in the second embodiment, the binarization and re-recognition of the above pattern are repeated until all the collated characters reach the first place in the judgment result of the individual character recognition. As the graphic information detected to change the binarization threshold value when repeating the individual character recognition, as in the first embodiment,
Any of the character line width, the number of connected components, the number of loops, and a combination thereof may be used.

【0029】次に、本発明における第3の実施例を説明
する。本実施例においてはステップ32において、知識
により照合可能な文字が候補中に複数現れる場合にいず
れかの文字が個別の文字認識の判定結果の第1位に上が
るまで、上述のパタンの2値化と再認識を繰り返すもの
である。個別の文字認識を繰り返す際に2値化の閾値を
変更するために検出する図形情報としては、第1の実施
例と同様、文字の線幅、連結成分数、ループ数およびそ
れらの組み合わせのいずれを用いても構わない。
Next, a third embodiment of the present invention will be described. In the present embodiment, in step 32, when a plurality of characters that can be collated by knowledge appear in a candidate, any one of the above characters is binarized until it reaches the first position in the judgment result of individual character recognition. It is to repeat recognition again. As in the first embodiment, the graphic information detected to change the binarization threshold value when repeating individual character recognition includes any of the line width of the character, the number of connected components, the number of loops, and a combination thereof. May be used.

【0030】[0030]

【発明の効果】以上説明したように本発明は、照合した
い文字が個別文字認識の判定結果の候補中に現れない場
合に、該当部分のパタンの図形情報を抽出し、それが変
化するまで2値化の閾値を変化させ、判定結果候補に上
がるまで認識を繰り返すことにより、知識との完全な対
応が可能となり、文字の認識精度が著しく向上する効果
がある。
As described above, according to the present invention, when the character to be collated does not appear in the candidate of the judgment result of the individual character recognition, the graphic information of the pattern of the corresponding portion is extracted, and the pattern information is changed until it changes. By changing the threshold of binarization and repeating the recognition until the judgment result candidate is reached, it is possible to completely correspond to the knowledge, and there is an effect that the recognition accuracy of the character is significantly improved.

【図面の簡単な説明】[Brief description of the drawings]

【図1】本発明の一実施例の文字認識装置のブロック図
である。
FIG. 1 is a block diagram of a character recognition device according to an embodiment of the present invention.

【図2】図1に示す文字認識装置の動作の一例を示すフ
ローチャートである。
FIG. 2 is a flowchart showing an example of the operation of the character recognition device shown in FIG.

【図3】図1に示す文字認識装置の知識照合時の動作の
一例を示すフローチャートである。
FIG. 3 is a flowchart showing an example of an operation of the character recognition device shown in FIG. 1 at the time of knowledge matching.

【図4】本発明の原理を説明するための説明図である。FIG. 4 is an explanatory diagram for explaining the principle of the present invention.

【図5】本発明の原理を説明するための説明図である。FIG. 5 is an explanatory diagram for explaining the principle of the present invention.

【図6】従来の文字認識装置の判定方法を示す図であ
る。
FIG. 6 is a diagram showing a determination method of a conventional character recognition device.

【図7】従来の文字認識装置の別の判定方法を示す図で
ある。
FIG. 7 is a diagram showing another determination method of the conventional character recognition device.

【符号の説明】[Explanation of symbols]

101 2値化・正規化手段 102 特徴抽出手段 103 文字認識手段 104 後処理手段 105 図形情報抽出手段 201 制御手段 301 イメージデータ格納手段 302 特徴データ格納手段 401 データバス 101 Binarization / Normalization Means 102 Feature Extraction Means 103 Character Recognition Means 104 Post-Processing Means 105 Graphic Information Extraction Means 201 Control Means 301 Image Data Storage Means 302 Feature Data Storage Means 401 Data Bus

Claims (6)

【特許請求の範囲】[Claims] 【請求項1】入力文字パタンを2値化し、単語照合を行
う文字認識方法において、単語照合時に未照合となった
文字に対し、2値化の閾値を変えたパタンを再度生成
し、個別の文字認識を実行し、そこで得られる判別結果
の新たな候補を参照することにより単語照合を行うこと
を特徴とする文字認識方法。
1. A character recognition method in which an input character pattern is binarized and word matching is performed, a pattern in which a binarization threshold value is changed is regenerated for a character that has not been matched at the time of word matching, and an individual character is generated. A character recognition method characterized in that character recognition is performed, and word matching is performed by referring to a new candidate of a discrimination result obtained there.
【請求項2】前記2値化の閾値を文字線幅、連結線分数
又はループ数に応じて変化させることを特徴とする請求
項1記載の文字認識方法。
2. The character recognition method according to claim 1, wherein the binarization threshold value is changed according to the character line width, the number of connecting line segments, or the number of loops.
【請求項3】入力文字パタンを電気信号に変換する光電
変換手段によって得られた信号を格納するイメージデー
タ格納手段と、 前記イメージデータ格納部に格納したイメージデータを
2値化し、同じ字種内での変動を吸収し、異なる字種間
の違いを強調する処理を行う2値化・正規化手段と、 前記2値化・正規化手段により処理を行った入力文字パ
タンから、文字の判別に有効な特徴量を抽出する特徴抽
出手段と、 前記特徴抽出手段によって抽出した文字の判別に有効な
特徴量を格納する特徴データ格納手段と、 前記特徴データ格納手段に格納した特徴量をもとに、該
入力文字パタンがどの字種に該当するかを判別し、それ
らを判定結果として可能性の高い順に複数候補出力する
文字認識手段と、 前記イメージデータ格納手段に格納されている該入力文
字パタンの図形情報を抽出する図形情報抽出手段と、 前記文字認識手段で得られた該入力文字パタンに対する
判定結果として得られた該複数候補に対し、入力文字パ
タンの知識を用いて該入力文字パタンの前後のパタンに
対する判定結果から照合を行い、該入力文字パタンの該
複数候補から1文字を認識結果として確定する後処理手
段とを備え、 前記後処理手段において認識結果を確定できなかった際
に、前記イメージデータ格納部に格納されている該入力
文字パタンの図形情報を前記図形情報抽出手段により抽
出し、それが現在の種と異なるまで前記2値化・正規化
手段において2値化の閾値を変化させて得られたパタン
に対し、前記後処理手段において認識結果が確定するま
で前記特徴抽出手段における特徴抽出、前記文字認識手
段による判定、前記後処理手段による修正を繰り返すこ
とを特徴とする文字認識装置。
3. An image data storage means for storing a signal obtained by a photoelectric conversion means for converting an input character pattern into an electric signal, and the image data stored in the image data storage portion are binarized to have the same character type. To distinguish the characters from the binarization / normalization means that absorbs the fluctuations in the above and emphasizes the difference between different character types, and the input character pattern processed by the binarization / normalization means. Feature extraction means for extracting an effective feature amount, feature data storage means for storing the feature amount effective for discrimination of the character extracted by the feature extraction means, and based on the feature amount stored in the feature data storage means A character recognition unit that determines which character type the input character pattern corresponds to and outputs a plurality of candidates in descending order of possibility as a determination result; The graphic information extraction means for extracting the graphic information of the input character pattern that is present, and the knowledge of the input character pattern for the plurality of candidates obtained as the determination result for the input character pattern obtained by the character recognition means. The post-processing means for performing collation based on the determination results of the patterns before and after the input character pattern and for determining one character as the recognition result from the plurality of candidates of the input character pattern, the post-processing means can determine the recognition result. If not, the graphic information of the input character pattern stored in the image data storage unit is extracted by the graphic information extracting means, and the binarizing / normalizing means outputs 2 until the graphic information is different from the current seed. For the pattern obtained by changing the threshold for binarization, the feature extraction by the feature extraction means, the sentence until the recognition result is confirmed by the post-processing means. Determination by the recognition unit, a character recognition apparatus characterized by repeating the correction by the post-processing means.
【請求項4】前記文字認識手段の出力する判定結果の第
1位の候補以外の判定結果を前記後処理手段が採用した
場合に、前記イメージデータ格納部に格納されている該
入力文字パタンの図形情報を前記図形情報抽出手段によ
り抽出し、それが現在の値と異なるまで前記2値化・正
規化手段において2値化の閾値を変化させ、得られたパ
タンに対し、前記特徴抽出手段における特徴抽出、前記
文字認識手段による判定、前記後処理手段による修正を
繰返し、判定結果の第1位の候補を前記後処理手段が認
識結果として確定するまで行うことを特徴とする請求項
3記載の文字認識装置。
4. The input character pattern stored in the image data storage unit when the post-processing unit adopts a judgment result other than the first candidate of the judgment result output by the character recognition unit. The graphic information is extracted by the graphic information extraction means, the binarization threshold value is changed in the binarization / normalization means until it is different from the current value, and the pattern obtained in the feature extraction means is used for the obtained pattern. 4. The feature extraction, the determination by the character recognition means, and the correction by the post-processing means are repeated until the first candidate of the determination result is determined as the recognition result by the post-processing means. Character recognizer.
【請求項5】前記文字認識手段の出力する判定結果の候
補を前記後処理手段が採用する際、候補中に照合可能な
文字が複数存在した場合に、前記イメージデータ格納部
に格納されている該入力文字パタンの図形情報を前記図
形情報抽出手段により抽出し、それが現在の値と異なる
まで前記2値化・正規化手段において2値化の閾値を変
化させ、得られたパタンに対し、前記特徴抽出手段にお
ける特徴抽出、前記文字認識手段による判定、前記後処
理手段による修正を繰返し、判定結果の第1位の候補を
前記後処理手段が認識結果として確定するまで行うこと
を特徴とする請求項3記載の文字認識装置。
5. When the candidate of the judgment result output by the character recognition means is adopted by the post-processing means, if there are a plurality of collable characters among the candidates, the candidate is stored in the image data storage section. The graphic information of the input character pattern is extracted by the graphic information extraction means, the binarization threshold value is changed in the binarization / normalization means until it is different from the current value, and the obtained pattern is The feature extraction in the feature extraction unit, the determination by the character recognition unit, and the correction by the post-processing unit are repeated until the first candidate of the determination result is confirmed as the recognition result by the post-processing unit. The character recognition device according to claim 3.
【請求項6】前記図形情報抽出手段において、図形情報
としてイメージデータの線幅,連結成分数又はループ数
を抽出することを特徴とする請求項3,4又は5記載の
文字認識装置。
6. The character recognition device according to claim 3, 4 or 5, wherein said graphic information extracting means extracts the line width of the image data, the number of connected components or the number of loops as the graphic information.
JP7296571A 1995-11-15 1995-11-15 Character recognition method and device Expired - Lifetime JP2973898B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP7296571A JP2973898B2 (en) 1995-11-15 1995-11-15 Character recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP7296571A JP2973898B2 (en) 1995-11-15 1995-11-15 Character recognition method and device

Publications (2)

Publication Number Publication Date
JPH09138834A true JPH09138834A (en) 1997-05-27
JP2973898B2 JP2973898B2 (en) 1999-11-08

Family

ID=17835271

Family Applications (1)

Application Number Title Priority Date Filing Date
JP7296571A Expired - Lifetime JP2973898B2 (en) 1995-11-15 1995-11-15 Character recognition method and device

Country Status (1)

Country Link
JP (1) JP2973898B2 (en)

Also Published As

Publication number Publication date
JP2973898B2 (en) 1999-11-08

Similar Documents

Publication Publication Date Title
EP3425563B1 (en) Automatic extraction method, device and system for driving licence expiration date, and storage medium
US5410611A (en) Method for identifying word bounding boxes in text
US8300942B2 (en) Area extraction program, character recognition program, and character recognition device
US5917941A (en) Character segmentation technique with integrated word search for handwriting recognition
US6834121B2 (en) Apparatus for rough classification of words, method for rough classification of words, and record medium recording a control program thereof
US5005205A (en) Handwriting recognition employing pairwise discriminant measures
JP2000155803A (en) Character reading method and optical character reader
KR100480316B1 (en) Character recognition method and apparatus using writer-specific reference vectors generated during character-recognition processing
CN109034166B (en) Confusable character recognition model training method and device
CN115543915A (en) Automatic database building method and system for personnel file directory
JP2973898B2 (en) Character recognition method and device
Chuang et al. A heuristic algorithm for the recognition of printed Chinese characters
CN113361666A (en) Handwritten character recognition method, system and medium
KR0186172B1 (en) Character recognition apparatus
Basil et al. Comparative analysis of MSER and DTW for offline signature recognition
JPH0749926A (en) Character recognizing device
CN116049461B (en) Question conversion system based on big data cloud platform
JP3209197B2 (en) Character recognition device and recording medium storing character recognition program
JP3805831B2 (en) Character recognition device
JP3391223B2 (en) Character recognition device
Eldhose et al. Disambiguated archieve rectification
JPH01191992A (en) Character recognizing device
Nagajyothi et al. Recognition of Hand written Numerals on bank Cheques using Neural Networks
CN116453150A (en) Certificate number identification method based on image matching
KR19990010213A (en) Character Recognition Method with Improved Matching Speed

Legal Events

Date Code Title Description
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 19990803