JP3421200B2

JP3421200B2 - Character recognition method and device

Info

Publication number: JP3421200B2
Application number: JP22372096A
Authority: JP
Inventors: 克仁藤本; 洋鎌田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1996-08-26
Filing date: 1996-08-26
Publication date: 2003-06-30
Anticipated expiration: 2016-08-26
Also published as: JPH1063784A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は文字認識方法および
装置に関し、さらに詳細には、高精度かつ高速な文字認
識を実行するために、文字認識結果の信頼度を出力する
ことができる文字認識方法および装置に関するものであ
る。近年、オフィスにおけるワークフローの効率化のた
めに文書を電子的にファイリングし、必要に応じてコー
ド化するための文字認識装置が強く求められている。特
に、文字認識装置は文字列情報のコード化のために必須
であり、実用化と広範囲にわたる普及のために、認識精
度を維持したままでの高速な文字種推定が求められてい
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition method and apparatus, and more particularly, to a character recognition method capable of outputting the reliability of a character recognition result in order to carry out highly accurate and high speed character recognition. And the device. In recent years, there has been a strong demand for a character recognition device for electronically filing a document and encoding it as needed in order to improve the efficiency of a workflow in an office. In particular, a character recognition device is indispensable for encoding character string information, and for practical use and widespread widespread use, high-speed character type estimation while maintaining recognition accuracy is required.

【０００２】文字認識結果の信頼度を出力することによ
り、文字認識結果の信頼度は低いが高速な文字認識手法
と、文字認識結果の信頼度は高いが低速な文字認識手法
とを統合することが可能となり、その結果、両者の長所
をとった文字認識結果の信頼度が高くかつ高速な文字認
識機能を提供することができる。また、文字認識結果の
信頼度により、検査・修正の対象を絞り込むことがで
き、後処理を効率化することもできる。すなわち、文字
認識結果の信頼度を出力する文字認識装置は、実用的な
文書認識装置を構築する上で重要な技術要素としての役
割を果す。By outputting the reliability of the character recognition result, it is possible to combine a character recognition method in which the reliability of the character recognition result is low but high speed with a character recognition method in which the reliability of the character recognition result is high but slow. As a result, it is possible to provide a high-speed and high-accuracy character recognition function for the character recognition result that has both advantages. Further, the target of inspection / correction can be narrowed down by the reliability of the character recognition result, and the post-processing can be made efficient. That is, the character recognition device that outputs the reliability of the character recognition result plays a role as an important technical element in constructing a practical document recognition device.

【０００３】[0003]

【従来の技術】信頼度を出力する文字認識方式として
は、文字認識結果の第１候補の距離値をそのまま信頼度
として用いる文字認識方式が知られている。図７は上記
した従来の文字認識方式を説明する図であり、同図によ
り従来技術について説明する。2. Description of the Related Art As a character recognition method for outputting reliability, a character recognition method is known in which the distance value of the first candidate of the character recognition result is used as the reliability as it is. FIG. 7 is a diagram for explaining the above-described conventional character recognition method, and the conventional technique will be described with reference to FIG.

【０００４】同図において、認識対象となる文字パター
ンから特徴を抽出し（Ｓ１）、特徴ベクトルを得る。次
に、認識辞書Ｄ１に格納された各文字に対する代表特徴
ベクトルと上記文字パターンから得られた特徴ベクトル
とを照合するために、距離値を算出する（Ｓ２）。つい
で、上記距離値をソートして、第１，第２、第３、…の
候補文字とその距離値ｄ１，ｄ２，…を得る。そして、
上記第１の候補文字の距離値ｄ１を文字認識結果の信頼
度として出力する。In the figure, features are extracted from a character pattern to be recognized (S1) to obtain a feature vector. Next, a distance value is calculated in order to collate the representative feature vector for each character stored in the recognition dictionary D1 with the feature vector obtained from the character pattern (S2). Then, the distance values are sorted to obtain first, second, third, ... Candidate characters and their distance values d1, d2 ,. And
The distance value d1 of the first candidate character is output as the reliability of the character recognition result.

【０００５】[0005]

【発明が解決しようとする課題】上記した第１候補の文
字認識結果をそのまま信頼度として出力する文字認識方
式は、出力された文字認識結果の信頼度と、その結果の
正読確率との間の対応関係が明確でないため、次のよう
な問題が発生する。従来の文字認識方式を単独で用
いて信頼度の低い文字認識結果をリジェクトしたりある
いは誤りの訂正の対象とする場合の信頼度のしきい値の
設定を行うに際し、上記のように信頼度とその結果の正
読確率との間の対応関係が明確でないため、実験的にし
きい値を求めるなどの手続きが必要となり、煩雑である
という問題が発生する。The character recognition method for directly outputting the character recognition result of the first candidate described above as the reliability is between the reliability of the output character recognition result and the correct reading probability of the result. Since the correspondence relationship of is not clear, the following problems occur. When using the conventional character recognition method alone to reject a character recognition result with low reliability or to set the threshold value of the reliability when it is the target of error correction, the reliability Since the correspondence between the result and the correct reading probability is not clear, a procedure such as experimentally obtaining a threshold value is required, which causes a problem of complexity.

【０００６】従来の第１の候補の距離値をそのまま
信頼度として用いる場合、異なった２つの文字認識手法
を統合して両者の持つ長所を兼ね備えた文字認識装置を
構築しようとしても、それぞれの文字認識手法において
得られた信頼度は、それぞれの文字認識手法の出力する
距離値、文字認識手法に依存した性質を持つため、例え
ば信頼度が高い方を出力するという統合方法をとる場
合、信頼度を互いに比較することが困難となる。When the distance value of the conventional first candidate is used as it is as reliability, even if an attempt is made to integrate two different character recognition methods to construct a character recognition apparatus having the advantages of both, Since the reliability obtained by the recognition method has properties depending on the distance value output by each character recognition method and the character recognition method, for example, when the integrated method of outputting the one with higher reliability is used, the reliability Becomes difficult to compare with each other.

【０００７】本発明は上記した事情に鑑みなされたもの
であり、本発明の第１の目的は、文字認識結果の正読確
率の推定値を信頼度として出力することにより、信頼度
の意味付けを明確にでき、後処理を容易にした文字認識
方法および装置を提供することである。本発明の第２の
目的は、文字認識結果の正読確率の推定値を信頼度とし
て出力することにより、異なる文字認識手法の信頼度を
互いに比較できるようにし、異なる文字認識手法を統合
した高精度かつ高速な文字認識を可能とすることであ
る。The present invention has been made in view of the above circumstances, and a first object of the present invention is to make sense of reliability by outputting an estimated value of the correct reading probability of a character recognition result as reliability. The object of the present invention is to provide a character recognition method and device that can clarify the above and facilitate post-processing. A second object of the present invention is to output the estimated value of the correct reading probability of the character recognition result as the reliability, thereby making it possible to compare the reliability of different character recognition methods with each other, and to integrate the different character recognition methods. It is to enable accurate and high-speed character recognition.

【０００８】[0008]

【課題を解決するための手段】図１は本発明の原理図で
ある。図１（ａ）は、文字認識結果から正読確率を出力
する本発明の請求項１〜３、請求項５〜請求項７の原理
図を示している。同図において、１は文字パターンから
文字認識を行う文字認識手段、２は文字認識手段により
得られた候補文字の距離値から認識結果の確信度を示す
確率パラメータを算出する手段、３は正読確率出力手段
であり、上記確率パラメータ算出手段２により得られた
確率パラメータに対応する正読確率を確率パラメータ−
正読確率変換テーブル５から読み出して出力する。ま
た、４は確率パラメータ−正読確率変換テーブル生成手
段であり、文字パターン、正解文字コードの集合から確
率パラメータを正読確率に変換するための変換テーブル
５を生成する。FIG. 1 shows the principle of the present invention. FIG. 1A shows a principle diagram of claims 1 to 3 and claims 5 to 7 of the present invention for outputting a correct reading probability from a character recognition result. In the figure, 1 is a character recognition means for recognizing a character from a character pattern, 2 is a means for calculating a probability parameter indicating a certainty factor of a recognition result from a distance value of a candidate character obtained by the character recognition means, and 3 is a correct reading. The probability output means is a probability parameter-correction probability corresponding to the probability parameter obtained by the probability parameter calculation means 2.
The correct reading probability conversion table 5 is read and output. Further, 4 is a probability parameter-correct reading probability conversion table generating means, which generates a conversion table 5 for converting the probability parameter into the correct reading probability from a set of character patterns and correct character codes.

【０００９】図１（ｂ）は、正読確率を用いて高速だが
低精度の第１の文字認識手段と、低速だが高精度の第２
の文字認識手段を統合し、高速かつ高精度の文字認識を
行うようにした、本発明の請求項４，請求項８の原理図
を示している。同図において、６ａは高速だが低精度の
第１の文字認識手段であり、文字パターンから高速、低
精度の文字認識を行い候補文字、距離値を出力する。７
は正読確率推定手段であり、上記距離値より確率パラメ
ータを求め、確率パラメータ−正読確率変換テーブル５
を参照して正読確率を推定する。８は正読確率比較手段
であり、正読確率推定手段７で推定した正読確率をしき
い値と比較する。６ｂは低速だが高精度の第２の文字認
識手段であり、上記正読確率推定手段７で推定した正読
確率がしきい値より低いとき、低速、高精度の文字認識
を行う。９は認識結果出力手段であり、正読確率推定手
段７で推定した正読確率がしきい値より大きいとき、第
１の文字認識手段６ａによる文字認識結果を出力し、正
読確率推定手段７で推定した正読確率がしきい値より大
きいとき、第２の文字認識手段６ｂによる文字認識結果
を出力する。FIG. 1 (b) shows a first character recognizing means which is high speed but low accuracy by using the correct reading probability and a second character recognition means which is low speed but high accuracy.
The principle diagrams of claims 4 and 8 of the present invention, in which the character recognition means of (1) are integrated to perform high-speed and high-accuracy character recognition, are shown. In the figure, 6a is a high-speed but low-accuracy first character recognition means, which performs high-speed and low-accuracy character recognition from a character pattern and outputs candidate characters and distance values. 7
Is a correct reading probability estimating means, which calculates a probability parameter from the distance value, and converts the probability parameter into the correct reading probability table 5.
The correct reading probability is estimated by referring to. Reference numeral 8 is a correct reading probability comparing means, which compares the correct reading probability estimated by the correct reading probability estimating means 7 with a threshold value. 6b is a low-speed but high-accuracy second character recognition means. When the correct reading probability estimated by the correct-reading probability estimating means 7 is lower than a threshold value, low-speed and high-accuracy character recognition is performed. Reference numeral 9 is a recognition result output means. When the correct reading probability estimated by the correct reading probability estimating means 7 is larger than a threshold value, the character recognition result by the first character recognizing means 6a is output, and the correct reading probability estimating means 7 is provided. When the correct reading probability estimated in step 1 is larger than the threshold value, the character recognition result by the second character recognition means 6b is output.

【００１０】図１に示すように本発明は次のようにして
前記課題を解決する。（１）文字パターンを入力とし、文字認識手段により、
入力された文字パターンに対応する複数の候補文字とそ
れぞれに対応する距離値を算出し、候補文字の少なくと
も第１の候補の距離値、第２の候補の距離値に基づいて
確率パラメータを計算し、確率パラメータとそれに対応
する正読確率を格納したテーブルを参照して、確率パラ
メータに対応する正読確率を出力する。（２）上記（１）において、予め与えられた学習文字パ
ターン集合に対する確率パラメータを計算し、同じ確率
パラメータを持つ学習文字パターン数と、その中で第１
候補が正解であった学習文字パターンの比を正読確率の
推定値として、確率パラメータと正読確率の変換テーブ
ルを生成する。As shown in FIG. 1, the present invention solves the above problems as follows. (1) With a character pattern as input, the character recognition means
It calculates the distance values corresponding to a plurality of candidate characters corresponding to the input character pattern, the less candidate character
Also, the probability parameter is calculated based on the distance value of the first candidate and the distance value of the second candidate, and the correct reading corresponding to the probability parameter is referred by referring to the table storing the probability parameter and the correct reading probability. Probability is output. (2) In (1) above, the probability parameter for the learning character pattern set given in advance is calculated, and the number of learning character patterns having the same probability parameter and the first
A conversion table between the probability parameter and the correct reading probability is generated by using the ratio of the learned character patterns whose candidates are correct as the estimated value of the correct reading probability.

【００１１】（３）文字パターンを入力とし、比較的高
速かつ低精度の第１の文字認識手段で文字認識を行い、
文字認識結果として得られた候補文字の少なくとも第１
の候補の距離値、第２の候補の距離値に基づいて確率パ
ラメータを計算し、上記確率パラメータとそれに対応す
る正読確率を格納したテーブルを参照して、上記確率パ
ラメータに対応する正読確率の推定値を求め、上記正読
確率の推定値が所定のしきい値より大きいとき、上記第
１の文字認識手段により得られた文字認識結果を出力
し、上記正読確率が所定のしきい値より小さいとき、比
較的低速で高精度な第２の文字認識手段を用いて上記文
字パターンについて文字認識を行い、上記第２の文字認
識手段により得られた文字認識結果を出力する。( 3 ) Using a character pattern as input, character recognition is performed by the first character recognition means of relatively high speed and low accuracy,
At least the first of the candidate characters obtained as the character recognition result
The probability parameter is calculated based on the distance value of the candidate and the distance value of the second candidate, and the correct reading probability corresponding to the probability parameter is referred to with reference to the table storing the probability parameter and the correct reading probability. When the estimated value of the correct reading probability is larger than a predetermined threshold value, the character recognition result obtained by the first character recognition means is output, and the correct reading probability is determined to be the predetermined threshold value. When the value is smaller than the value, character recognition is performed on the character pattern using the second character recognition means which is relatively slow and highly accurate, and the character recognition result obtained by the second character recognition means is output.

【００１２】（４）文字パターンを入力とし、大分類照
合手段により、第１の認識辞書に格納された特徴ベクト
ルと認識対象となる上記文字パターンの特徴ベクトルを
照合し、距離が小さい順にＫ２個の候補文字を出力し、
詳細分類照合手段により、大分類照合手段が出力するＫ
２個（Ｋ１≦Ｋ２）の候補文字の内、距離値が小さいＫ
１個について、第２の認識辞書から読みだした特徴ベク
トルと照合し、上記照合により得られた候補文字の距離
値から確率パラメータを計算し、上記確率パラメータと
それに対応する正読確率を格納したテーブルを参照し
て、上記確率パラメータに対応する正読確率の推定値を
求め、正読率がしきい値より大きい場合には、その候補
文字を認識結果として出力し、上記正読率がしきい値よ
り低い場合に、前記大分類照合手段が出力するＫ２個の
候補文字の特徴ベクトルを第２の認識辞書から読みだし
て、文字パターンの特徴ベクトルと照合し、上記照合に
より得られた候補文字の距離値から確率パラメータを計
算し、上記確率パラメータとそれに対応する正読確率を
格納したテーブルを参照して、上記確率パラメータに対
応する正読確率の推定値を求め、正読率がしきい値より
大きい場合に、その候補文字を認識結果として出力す
る。（５）文字認識装置を、文字パターンを入力とし、入力
された文字パターンから複数の候補文字とそれぞれに対
応する距離値を出力する文字認識手段と、文字認識手段
により得られた複数の候補文字の少なくとも第１の候補
の距離値、第２の候補の距離値に基づいて確率パラメー
タを算出する確率パラメータ計算手段と、確率パラメー
タのそれぞれに対応する正読確率の推定値を記憶した変
換テーブルと、変換テーブルを参照し、上記確率パラメ
ータ計算手段により求めた確率パラメータに対応した正
読確率を求めて出力する正読確率出力手段とから構成す
る。(4) A character pattern is used as an input,
The feature vector stored in the first recognition dictionary by the matching means.
The feature vector of the character pattern
Match and output K2 candidate characters in ascending order of distance,
K output by the large classification matching means by the detailed classification matching means
Of the two (K1 ≤ K2) candidate characters, K with a smaller distance value
The feature vector read from the second recognition dictionary for one
Distance of the candidate character obtained by the above matching
Probability parameter is calculated from the value and
Refer to the table that stores the probability of correct reading
The estimated probability of correct reading corresponding to the above probability parameter
If the correct reading rate is greater than the threshold value, the candidate
Characters are output as recognition results, and the correct reading rate above is the threshold value.
If it is lower than K2, the K2
Read the feature vector of the candidate character from the second recognition dictionary
Match the feature vector of the character pattern, and
The probability parameter is calculated from the distance value of the candidate character obtained by
The above probability parameter and the corresponding correct reading probability
Referring to the stored table,
The estimated correct reading probability is calculated, and the correct reading rate exceeds the threshold value.
If it is large, the candidate character is output as the recognition result.
It (5) A character recognition device, which receives a character pattern as an input, outputs a plurality of candidate characters from the input character pattern and distance values corresponding thereto, and a plurality of candidate characters obtained by the character recognition device. At least first candidate for
The probability parameter calculation means for calculating the probability parameter based on the distance value of the second candidate and the distance value of the second candidate, the conversion table storing the estimated value of the correct reading probability corresponding to each probability parameter, and the conversion table are referred to. , And a correct reading probability output means for calculating and outputting a correct reading probability corresponding to the probability parameter calculated by the probability parameter calculating means.

【００１３】（６）上記（５）における正読確率の推定
値を記憶した変換テーブルを、予め与えられた学習文字
パターン集合に対する確率パラメータを計算し、同じ確
率パラメータを持つ学習文字パターン数と、その中で第
１候補が正解であった学習文字パターンの比より正読確
率の推定値を求め、各正読確率の推定値とそれぞれに対
応する確率パラメータを記憶手段に記憶することにより
生成する。( 6 ) The conversion table storing the estimated value of the correct reading probability in (5) above is used to calculate a probability parameter for a given learning character pattern set, and the number of learning character patterns having the same probability parameter, An estimated value of the correct reading probability is obtained from the ratio of the learned character patterns in which the first candidate is the correct answer, and the estimated value of each correct reading probability and the probability parameter corresponding to the estimated value are stored in the storage means. .

【００１４】（７）文字認識装置を、入力された文字パ
ターンの文字認識を行う比較的高速かつ低精度の第１の
文字認識手段と、入力された文字パターンの文字認識を
行う比較的低速で高精度な第２の文字認識手段と、確率
パラメータとそれに対応する正読確率を格納した変換テ
ーブルと、候補文字の少なくとも第１の候補の距離値、
第２の候補の距離値に基づき確率パラメータを計算し、
上記変換テーブルを参照して、上記確率パラメータに対
応する正読確率の推定値を求める正読確率推定手段と、
上記第１の文字認識手段により得られた候補文字のつい
て上記正読確率推定手段により求めた正読確率の推定値
を所定のしきい値と比較し、上記正読確率が所定のしき
い値より大きいとき、上記第１の文字認識手段により得
られた文字認識結果を出力し、上記正読確率の推定値が
所定のしきい値より小さいとき、上記第２の文字認識手
段により得られた文字認識結果を出力する正読確率比較
手段から構成する。（８）文字認識装置を、文字パターンを入力とし、第１
の認識辞書に格納された特徴ベクトルと認識対象となる
上記文字パターンの特徴ベクトルを照合し、距離が小さ
い順にＫ２個の候補文字を出力する大分類照合手段と、
大分類照合手段が出力するＫ２個（Ｋ１≦Ｋ２）の候補
文字の内、距離値が小さいＫ１個について、第２の認識
辞書から読みだした特徴ベクトルと照合する詳細分類照
合手段と、確率パラメータとそれに対応する正読確率を
格納した変換テーブルと、上記照合により得られた候補
文字の距離値から確率パラメータを計算する確率パラメ
ータ計算手段と、上記算出された確率パラメータとそれ
に対応する正読確率を格納した上記変換テーブルを参照
して、上記確率パラメータに対応する正読確率の推定値
を求める正読確率推定手段と、正読率がしきい値より大
きい場合には、その候補文字を認識結果として出力し、
上記正読率がしきい値より低い場合に、上記詳細分類照
合手段に、大分類照合手段が出力するＫ２個の候補文字
の特徴ベクトルと文字パターンの特徴ベクトルと照合を
依頼し、上記照合により得られた確率パラメータに対応
する正読確率がしきい値より大きい場合に、その候補文
字を認識結果として出力する正読確率比較手段とから構
成する。( 7 ) The character recognition device is provided with a relatively high-speed and low-accuracy first character recognition means for recognizing an input character pattern, and a relatively low speed for recognizing an input character pattern. A highly accurate second character recognition means, a conversion table storing probability parameters and correct reading probabilities corresponding thereto , distance values of at least first candidates of candidate characters ,
Calculating a probability parameter based on the distance value of the second candidate ,
Correct reading probability estimating means for obtaining an estimated value of the correct reading probability corresponding to the probability parameter with reference to the conversion table,
The estimated value of the correct reading probability obtained by the correct reading probability estimating means for the candidate character obtained by the first character recognition means is compared with a predetermined threshold value, and the correct reading probability is the predetermined threshold value. When it is larger, the character recognition result obtained by the first character recognition means is output, and when the estimated value of the correct reading probability is smaller than a predetermined threshold value, it is obtained by the second character recognition means. It comprises correct reading probability comparing means for outputting the character recognition result. (8) The character recognition device uses the character pattern as input and
To be recognized as a feature vector stored in the recognition dictionary of
Compare the feature vector of the above character pattern and check that the distance is small.
A large classification matching means for outputting K2 candidate characters in the order of
K2 (K1 ≦ K2) candidates output by the large classification matching means
Second recognition of K1 characters with small distance values
A detailed classification reference that matches the feature vector read from the dictionary.
The matching means, the probability parameter and the correct reading probability
The stored conversion table and the candidates obtained by the above matching
Probability parameter that calculates the probability parameter from the distance value of the character
Data calculation means, the calculated probability parameter and the
Refer to the conversion table above that stores the correct reading probability corresponding to
Then, the estimated value of the correct reading probability corresponding to the above probability parameter
The correct reading probability estimation method that obtains
If it is, output the candidate character as a recognition result,
If the correct reading rate is lower than the threshold value, the detailed classification reference
K2 candidate characters output by the major classification matching means to the matching means
Match the feature vector of the character pattern with the feature vector of
Ask, corresponds to the probability parameter obtained by the above verification
If the correct reading probability is greater than the threshold, the candidate sentence
The correct reading probability comparison means outputs a character as a recognition result .

【００１５】本発明の請求項１〜２および請求項５〜６
の発明においては、上記（１）〜（２）および（５）〜
（６）のように構成したので、意味付けが明確な正読確
率を文字認識結果の評価基準として使用することがで
き、後処理が容易となる。本発明の請求項３，４，７，
８の発明においては、上記（３），（４），（７），
（８）のように構成したので、異なる文字認識手法を統
合した、高速で高精度な文字認識結果を得ることができ
る。Claims 1 and 2 and 5 to 6 of the present invention
In the invention of, the above (1) to ( 2 ) and (5) to
Since it is configured as in ( 6 ), the correct reading probability with clear meaning can be used as an evaluation criterion of the character recognition result, and post-processing becomes easy. Claims 3, 4, 7, of the present invention
In the invention of 8, the above (3), (4), (7),
Since it is configured as in (8), it is possible to obtain a high-speed and highly accurate character recognition result by integrating different character recognition methods.

【００１６】[0016]

【発明の実施の形態】図２は本発明の第１の実施例のシ
ステムの構成を示す図であり、同図は、確率パラメータ
から正読確率を得るテーブルを設け、候補文字の距離値
から確率パラメータを算出し、上記テーブルを参照して
確率パラメータに対応した正読確率を読み出すことによ
り、正読確率を出力できるようにした文字認識装置の構
成を示している。FIG. 2 is a diagram showing a configuration of a system according to a first embodiment of the present invention. In FIG. 2, a table for obtaining a correct reading probability from a probability parameter is provided and a distance value of a candidate character is calculated. It shows a configuration of a character recognition device capable of outputting a correct reading probability by calculating a probability parameter and reading the correct reading probability corresponding to the probability parameter with reference to the table.

【００１７】同図において、１１は認識対象となる文字
パターン、Ｄ１は複数の文字の文字コードとその特徴ベ
クトルと格納した認識辞書、１２は文字認識手段であ
る。文字認識手段１２は文字パターン１１から特徴を抽
出して特徴ベクトルを求め、上記認識辞書に格納された
各文字の特徴ベクトルと照合し距離値を算出する。そし
て、距離値が近い順にソートして、各文字コードとその
距離値ｄ１，ｄ２，ｄ３，…を第１，第２，…の候補と
して出力する。１３は確率パラメータ算出手段であり、
例えば、上記第１候補文字の距離値ｄ１と第２の候補文
字の距離値ｄ２に基づき次の式（１）により確率パラメ
ータｒを算出する。ｒ＝ｄ２／（ｄ１＋ｄ２）（１）In the figure, 11 is a character pattern to be recognized, D1 is a recognition dictionary storing character codes of a plurality of characters and their feature vectors, and 12 is a character recognition means. The character recognition means 12 extracts a characteristic from the character pattern 11 to obtain a characteristic vector, and collates with the characteristic vector of each character stored in the recognition dictionary to calculate a distance value. Then, the character codes and the distance values d1, d2, d3, ... Are sorted as the distance values are closer to each other, and are output as the first, second, ... Candidates. 13 is a probability parameter calculation means,
For example, the probability parameter r is calculated by the following equation (1) based on the distance value d1 of the first candidate character and the distance value d2 of the second candidate character. r = d2 / (d1 + d2) (1)

【００１８】Ｄ２は文字パターンと正解文字コードの集
合を格納した記憶手段、１５は確率パラメータ−正読確
率変換テーブル生成手段であり、上記記憶手段Ｄ２に格
納された文字パターンと正解文字コードの集合に基づ
き、後述するように確率パラメータ−正読確率変換テー
ブルＴ(r) を生成する。１４は正読確率出力手段であ
り、上記確率パラメータ算出手段１３により算出した確
率パラメータｒに対応する正読確率を上記確率パラメー
タ−正読確率変換テーブルＴ(r) から読み出し正読確率
ｐを出力する。D2 is a storage means for storing a set of character patterns and correct character codes, and 15 is a probability parameter-correct reading probability conversion table generating means, which is a set of character patterns and correct character codes stored in the storage means D2. Based on the above, a probability parameter-correct reading probability conversion table T (r) is generated as described later. 14 is a correct reading probability output means, which reads the correct reading probability corresponding to the probability parameter r calculated by the probability parameter calculating means 13 from the probability parameter-correct reading probability conversion table T (r) and outputs the correct reading probability p. To do.

【００１９】上記文字パターン１１から第１候補，第２
候補，…およびその距離値ｄ１，ｄ２，ｄ３…を得る手
法は周知であるので、以下、確率パラメータ−正読確率
変換テーブルＴ(r) の生成について説明する。図３は上
記確率パラメータ−正読確率変換テーブルＴ(r) を生成
する手法の一例を説明する図である。同図において、ま
ず、ステップＳ１において、文字パターン、正解文字コ
ードの集合からなる学習データを格納した記憶手段Ｄ３
から第１番目の文字パターンと正解文字コードとを読み
だす。ステップＳ２において文字認識を行い、第１，第
２，第３，…の候補と距離値ｄ１，ｄ２，…を得る。す
なわち、前記したように、認識辞書（図示せず）に格納
された各文字と上記文字パターンとを照合して距離値を
算出し、距離値が近い順にソートして、各文字コードと
その距離値ｄ１，ｄ２，ｄ３，…を得る。From the character pattern 11 to the first candidate and the second
.. and their distance values d1, d2, d3, .. are well known, so that the generation of the probability parameter-correct reading probability conversion table T (r) will be described below. FIG. 3 is a diagram illustrating an example of a method for generating the probability parameter-correct reading probability conversion table T (r). In the figure, first, at step S1, a storage means D3 storing learning data composed of a set of character patterns and correct character codes.
The first character pattern and the correct character code are read from. In step S2, character recognition is performed to obtain first, second, third, ... Candidates and distance values d1, d2 ,. That is, as described above, the distance value is calculated by collating each character stored in the recognition dictionary (not shown) with the above character pattern, and the distance values are sorted in the order of decreasing distance value, and each character code and its distance are sorted. The values d1, d2, d3, ... Are obtained.

【００２０】ステップＳ３において、上記第１候補の文
字コードと記憶手段Ｄ３から読みだした正解文字コード
を比較して正読、誤読の判定を行い、正読／誤読の判定
結果を得る。一方、ステップＳ４において、前記式
（１）により確率パラメータＲを計算する。ステップＳ
５において、判定結果が正読であるか誤読であるかを調
べる。判定結果が正読の場合には、ステップＳ６に行
き、同図に示すヒストグラム（最初は全ての値が０に設
定されている）のｒ_n≦Ｒ≦ｒ_n＋δｒの正解文字数Ｎ
ok（ｒ_n）の値に１を加え、ステップＳ７に行き、ヒス
トグラムのｒ_n≦Ｒ≦ｒ_n＋δｒの学習パターン数Ｎ
（ｒ_n）の値に１を加える。In step S3, the character code of the first candidate is compared with the correct character code read from the storage means D3 to determine whether it is correct reading or erroneous reading, and the result of correct reading / misreading is obtained. On the other hand, in step S4, the probability parameter R is calculated by the equation (1). Step S
In 5, it is checked whether the determination result is correct reading or erroneous reading. If the determination result is correct reading, the process proceeds to step S6, and the number N of correct characters of r _n ≦ R ≦ r _n + δr in the histogram shown in the figure (all values are initially set to 0).
1 is added to the value of ok (r _n ), and the procedure proceeds to step S7, where the learning pattern number N of r _n ≦ R ≦ r _n + δr in the histogram
Add 1 to the value of (r _n ).

【００２１】また、判定結果が誤読の場合には、ステッ
プＳ７に行き上記と同様ヒストグラムのｒ_n≦Ｒ≦ｒ_n
＋δｒの学習パターン数Ｎ（ｒ_n）の値に１を加える。
次いで、ステップＳ８において、記憶手段Ｄ３に記憶さ
れた全ての文字パターンについて、正解文字数Ｎok（ｒ
_n）、学習パターン数Ｎ（ｒ_n）を求めたかを判定し、
全ての学習パターンについて計算が終わっていない場合
にはステップＳ１に戻り、上記処理を繰り返す。全ての
学習パターンについて計算が終了すると、ステップＳ９
に行き、上記ヒストグラムの全ｒの値について、Ｐ（ｒ
_n）＝Ｎok（ｒ_n）／Ｎ（ｒ_n）を計算し、正読確率Ｐ
（ｒ_n）を求める。その結果、各確率パラメータｒ_nに
ついて同図に示す変換テーブルＴ(r_n) を得る。Further, when the determination result is misreading, r _{_n} ≦ R ≦ r _n of the same histogram goes to step S7
1 is added to the value of the learning pattern number N (r _n ) of + δr.
Next, in step S8, the number of correct characters Nok (r) is set for all the character patterns stored in the storage unit D3.
_n ), it is determined whether the number N (r _n ) of learning patterns is obtained,
If the calculation has not been completed for all learning patterns, the process returns to step S1 and the above process is repeated. When the calculation is completed for all learning patterns, step S9
Go to P (r
_n ) = Nok (r _n ) / N (r _n ) is calculated, and the correct reading probability P is calculated.
Find (r _n ). As a result, the conversion table T (r _n ) shown in the figure is obtained for each probability parameter r _n .

【００２２】なお、上記のようにして得た変換テーブル
Ｔ(r_n) は離散的なｒ_nの値に対する正読確率Ｐ(r_n)
であるが、任意のｒの値に対する正読確率Ｐ(r) は、ｒ
_n≦ｒ≦ｒ_n+1（ｒ_n+1＝ｒ_n＋δｒ）よりｒ_nとｒ
_n+1を求め、ｒ_nに対する正読確率Ｐ(r_n) とｒ_n+1に
対する正読確率Ｐ(r_n+1）から補間演算により求めるこ
とができる。上記のようにして変換テーブルＴ(r) を生
成することにより、図２に示した確率パラメータ算出手
段１３により計算した確率パラメータｒから正読確率Ｐ
(r_n) を得ることができる。上記のようにして、正読確
率を求めることにより、信頼度の意味付けが明確にな
り、後処理が容易となる。[0022] Incidentally, the conversion table T obtained as above (r _n) is pattern deformations are expressed as a statistic index P for the values of the discrete r _{_n} (r _n)
However, the correct reading probability P (r) for any value of r is r
_{From n} ≤ r ≤ r _{n + 1} (r _{n + 1} = r _n + δr), r _n and r
seek _{n + 1,} can be obtained by interpolation from the pattern deformations are expressed as a statistic index P (r _{n + 1)} pattern deformations are expressed as a statistic rate for r _n P and (r _n) with respect to r _{n + 1.} By generating the conversion table T (r) as described above, the correct reading probability P is calculated from the probability parameter r calculated by the probability parameter calculating means 13 shown in FIG.
(r _n ) can be obtained. By obtaining the correct reading probability as described above, the meaning of the reliability is clarified and post-processing becomes easy.

【００２３】次に、上記正読確率を用いて異なる文字認
識手法を統合し、高精度かつ高速な文字認識を行うこと
ができるようにした実施例について説明する。図４は上
記した第２の実施例のシステムの構成を示す図である。
同図において、２１は認識対象となる文字パターン、２
２は文字パターンから特徴ベクトルを抽出する特徴ベク
トル抽出手段である。Ｄ１１はＭ次元の特徴ベクルトを
持つＬ文字種（例えば３６００文字種）の文字を格納し
た第１の認識辞書、２３は大分類照合手段であり、大分
類照合手段２３は上記第１の認識辞書Ｄ１１に格納され
たＬ文字分の特徴ベクトル（Ｍ次元）と上記文字パター
ン２１の特徴ベクトルを照合し、距離が小さい順にＫ２
個（例えばＫ２＝３００）の候補文字を出力する。Next, a description will be given of an embodiment in which different character recognition methods are integrated by using the above-mentioned correct reading probabilities to enable high-accuracy and high-speed character recognition. FIG. 4 is a diagram showing the configuration of the system of the second embodiment described above.
In the figure, 21 is a character pattern to be recognized, 2
Reference numeral 2 is a feature vector extraction means for extracting a feature vector from a character pattern. D11 is a first recognition dictionary that stores characters of L character type (for example, 3600 character type) having M-dimensional feature vector, 23 is a large classification matching means, and the large classification matching means 23 is the first recognition dictionary D11. The stored L-characteristic feature vector (M-dimensional) is collated with the feature vector of the character pattern 21, and K2 is calculated in ascending order of distance.
Output (for example, K2 = 300) candidate characters.

【００２４】Ｄ１２はＮ次元（Ｍ≦Ｎ）の特徴ベクルト
を持つＬ文字種の文字を格納した第２の認識辞書、２４
は詳細分類照合手段である。詳細分類照合手段２４は、
まず、上記大分類照合手段２３が出力するＫ２個の候補
文字の内、距離値が小さいＫ１（Ｋ１≦Ｋ２；例えばＫ
１＝７０）文字分の特徴ベクトル（Ｎ次元）を上記第２
の認識辞書Ｄ１２から読み出して文字パターン２１の特
徴ベクトルと照合し、その結果得られた候補文字の正読
率が低い場合には、大分類照合手段２３が出力するＫ２
文字分の候補文字の特徴ベクトル（Ｎ次元）を第２の認
識辞書Ｄ１２から読み出して、文字パターン２１の特徴
ベクトルと照合する。D12 is a second recognition dictionary storing characters of L character type having N-dimensional (M ≦ N) feature vector, 24
Is a detailed classification matching means. The detailed classification matching means 24,
First, of the K2 candidate characters output by the large classification matching means 23, the distance value is K1 (K1 ≦ K2;
1 = 70) character feature vector (N-dimensional)
When the correct reading rate of the candidate character obtained as a result of reading out from the recognition dictionary D12 of No. 2 and matching with the feature vector of the character pattern 21 is low, K2 output by the large classification matching unit 23.
The characteristic vector (N-dimensional) of the candidate character for the character is read from the second recognition dictionary D12 and collated with the characteristic vector of the character pattern 21.

【００２５】２５は、確率パラメータ算出手段であり、
上記詳細分類照合手段２４により得られた候補文字の距
離値から例えば前記式（１）により確率パラメータｒを
算出する。２６は、正読確率比較手段であり、確率パラ
メータ算出手段２５により求めた確率パラメータｒに対
応する正読確率Ｐ（ｒ）を確率パラメータ−正読確率変
換テーブルＴ（ｒ）から読み出し、予め定められたしき
い値Ｐthと比較する。そして、上記詳細分類照合手段２
４において、Ｋ１文字分の照合結果により得られた候補
文字の正読確率が上記しきい値Ｐthより大きい場合に
は、その候補文字を認識結果として出力し、上記正読確
率が上記しきい値Ｐthより小さい場合には、詳細分類照
合手段２４に対して上記Ｋ２文字分の照合を依頼する。Reference numeral 25 is a probability parameter calculating means,
The probability parameter r is calculated from the distance value of the candidate character obtained by the detailed classification matching means 24, for example, by the above equation (1). Reference numeral 26 is a correct reading probability comparing means, which reads the correct reading probability P (r) corresponding to the probability parameter r obtained by the probability parameter calculating means 25 from the probability parameter-correct reading probability conversion table T (r), and is determined in advance. The calculated threshold value Pth is compared. Then, the detailed classification matching means 2
In 4, when the correct reading probability of the candidate character obtained by the matching result of K1 characters is larger than the threshold value Pth, the candidate character is output as a recognition result, and the correct reading probability is the threshold value. If it is smaller than Pth, the detailed classification collating means 24 is requested to collate the above K2 characters.

【００２６】図５、図６は図４の文字認識システムの動
作を示すフローチャートであり、同図により本実施例の
動作をさらに詳細に説明する。認識対象となる文字パタ
ーンが与えられると、ステップＳ１において、特徴抽出
を行い、Ｎ次元の特徴ベクトルを求める。ステップＳ２
において、第１の認識辞書Ｄ１１からＬ文字分のＭ次元
の特徴ベクトルを読み出し、読み出したＬ文字種の文字
と上記認識対象となる文字パターン（Ｍ次元）との距離
計算（Ｌ×Ｍ次元）を行い、候補文字とその距離値ｄ
１’，ｄ２’，ｄ３’，…を得る。ステップＳ３におい
て、Ｋ＝７０に設定し、ステップＳ４において、上記ス
テップＳ２で得た候補文字から、距離値の小さい順にＫ
（＝７０）文字分の文字を取り出す。5 and 6 are flow charts showing the operation of the character recognition system of FIG. 4, and the operation of this embodiment will be described in more detail with reference to the same figures. When a character pattern to be recognized is given, feature extraction is performed in step S1 to obtain an N-dimensional feature vector. Step S2
In, the M-dimensional feature vector for L characters is read from the first recognition dictionary D11, and the distance calculation (L × M-dimension) between the read character of L character type and the character pattern (M-dimension) to be recognized is calculated. The candidate character and its distance value d
1 ', d2', d3 ', ... Are obtained. In step S3, K = 70 is set, and in step S4, K is selected in ascending order of distance value from the candidate character obtained in step S2.
(= 70) characters are extracted.

【００２７】ステップＳ５において、上記Ｋ文字分のＮ
次元の特徴ベクトルを第２の認識辞書Ｄ１２から読み出
し、読み出したＫ文字分の文字と上記認識対象となる文
字パターン（Ｎ次元）との距離計算（Ｋ×Ｎ次元）を
行う。ステップＳ６において、上記距離計算の結果得
られた候補文字の距離値を小さい方からソートし、第
１，第２，…の候補文字とその距離値ｄ１，ｄ２，…を
得る。ステップＳ７において、上記距離値ｄ１，ｄ２に
基づき前記式（１）により確率パラメータｒを求め、図
６のステップＳ８において、変換テーブルＴ（ｒ）を参
照して上記確率パラメータｒに対応した正読確率Ｐを求
める。なお、変換テーブルＴ（ｒ）に上記確率パラメー
タｒに一致する値が格納されていない場合には、前記し
たように補間により正読確率Ｐを求める。In step S5, the N characters for the above K characters
The dimensional feature vector is read from the second recognition dictionary D12, and the distance calculation (K × N dimension) between the read K characters and the character pattern (N dimension) to be recognized is performed. In step S6, the distance values of the candidate characters obtained as a result of the distance calculation are sorted in ascending order to obtain the first, second, ... Candidate characters and their distance values d1, d2 ,. In step S7, the probability parameter r is obtained from the equation (1) based on the distance values d1 and d2, and in step S8 of FIG. 6, the correct reading corresponding to the probability parameter r is performed with reference to the conversion table T (r). Find the probability P. When the conversion table T (r) does not store a value that matches the probability parameter r, the correct reading probability P is obtained by interpolation as described above.

【００２８】ステップＳ９において、上記処理が２回目
であるかを調べ、２回目でない場合には、ステップＳ１
０において、正読確率Ｐ（ｒ）と予め定められたしきい
値Ｐthとを比較する。正読確率Ｐ（ｒ）がしきい値Ｐth
以上の場合には、上記候補文字と距離値、正読率、確率
パラメータ等を文字認識結果として出力する。また、正
読確率Ｐ（ｒ）がしきい値Ｐthより小さい場合には、図
５のステップＳ１１に行き、Ｋ＝３００に設定してステ
ップＳ４に戻り、ステップＳ２で得た候補文字から、距
離値の小さい順に３００文字分の文字を取り出し、上記
と同様に、ステップＳ５で距離計算を行う。そして、
ステップＳ７〜ステップＳ８において、上記と同様に正
読確率Ｐを求め、ステップＳ１２において、得られた候
補文字と距離値、正読率、確率パラメータ等を文字認識
結果として出力する。In step S9, it is checked whether the above processing is the second one, and if it is not the second one, step S1.
At 0, the correct reading probability P (r) is compared with a predetermined threshold Pth. Correct reading probability P (r) is threshold Pth
In the above case, the candidate character, distance value, correct reading rate, probability parameter, etc. are output as the character recognition result. If the correct reading probability P (r) is smaller than the threshold value Pth, the process proceeds to step S11 of FIG. 5, K = 300 is set, the process returns to step S4, and the distance from the candidate character obtained in step S2 is changed. Characters of 300 characters are taken out in ascending order of value, and distance calculation is performed in step S5 in the same manner as above. And
In steps S7 to S8, the correct reading probability P is obtained in the same manner as described above, and in step S12, the obtained candidate character, distance value, correct reading rate, probability parameter and the like are output as a character recognition result.

【００２９】以上のように本実施例においては、大分類
照合を行ったのち、まず、Ｋ１文字分の候補文字につい
て詳細分類照合を行って正読確率を求め、正読確率がし
きい値以上のとき、上記詳細分類照合による文字認識結
果を出力し、上記正読確率がしきい値より小さいとき、
Ｋ２文字（Ｋ１≦Ｋ２）文字分の候補文字について詳細
分類照合を行ってその結果を文字認識結果として出力し
ており、比較的精度は低いが高速なＫ１文字分の候補文
字による照合処理と、高精度だが比較的低速なＫ２文字
分（Ｋ１≦Ｋ２）の照合処理を統合して文字認識を行っ
ているので、高速で高精度な文字認識を行うことができ
る。As described above, in this embodiment, after the large-class matching is performed, first, the detailed-class matching is performed on the candidate characters for K1 characters to obtain the correct reading probability, and the correct reading probability is equal to or more than the threshold value. When, the character recognition result by the detailed classification matching is output, and when the correct reading probability is smaller than the threshold value,
Detailed classification matching is performed on candidate characters for K2 characters (K1 ≦ K2) and the result is output as a character recognition result. Collation processing using candidate characters for K1 characters, which has relatively low accuracy but is high speed, Since character recognition is performed by integrating collation processing for K2 characters (K1 ≦ K2) that is highly accurate but relatively slow, high-speed and highly accurate character recognition can be performed.

【００３０】なお、上記第１、第２の実施例において
は、第１候補文字の距離値ｄ１と第２の候補文字の距離
値ｄ２に基づき前記式（１）により確率パラメータｒを
算出しているが、例えば、第１，第２の候補文字の距離
値ｄ１，ｄ２に加えて、第３，第４，…の候補文字の距
離値ｄ３，ｄ４，…を使用して確率パラメータを算出し
たり、あるいは、上記距離値の逆数を用いて確率パラメ
ータを算出する等、上記式（１）以外の式を用いて確率
パラメータを算出してもよい。In the first and second embodiments, the probability parameter r is calculated by the equation (1) based on the distance value d1 of the first candidate character and the distance value d2 of the second candidate character. However, for example, the probability parameter is calculated using the distance values d3, d4, ... Of the third, fourth, ... Candidate characters in addition to the distance values d1, d2 of the first and second candidate characters. Alternatively, the probability parameter may be calculated using a formula other than the formula (1), such as calculating the probability parameter using the reciprocal of the distance value.

【００３１】[0031]

【発明の効果】以上説明したように、本発明において
は、以下の効果を得ることができる。（１）認識結果の確信度を示す確率パラメータと正読確
率の変換テーブルを生成するとともに、文字認識結果よ
り候補文字の距離値より確率パラメータを算出し、上記
変換テーブルを参照して文字認識結果の正読確率を求め
ているので、意味付けの明確な正読確率を文字認識結果
の評価基準として使用することができ、文字認識結果の
後処理を容易にすることができる。As described above, the following effects can be obtained in the present invention. (1) A conversion table of the probability parameter indicating the certainty factor of the recognition result and the correct reading probability is generated, the probability parameter is calculated from the distance value of the candidate character from the character recognition result, and the character recognition result is referred to by referring to the conversion table. Since the correct reading probability of is obtained, the correct reading probability with clear meaning can be used as an evaluation criterion of the character recognition result, and post-processing of the character recognition result can be facilitated.

【００３２】（２）文字パターンを入力とし、比較的高
速かつ低精度の第１の文字認識手段で文字認識を行い、
上記文字認識結果として得られた候補文字の距離値から
確率パラメータを計算し、上記確率パラメータとそれに
対応する正読確率を格納したテーブルを参照して、上記
確率パラメータに対応する正読確率の推定値を求め、上
記正読確率の推定値が所定のしきい値より大きいとき、
上記第１の文字認識手段により得られた文字認識結果を
出力し、上記正読確率が所定のしきい値より小さいと
き、比較的低速で高精度な第２の文字認識手段を用いて
上記文字パターンについて文字認識を行い、上記第２の
文字認識手段により得られた文字認識結果を出力するよ
うにしているので、高速で高精度な文字認識を行うこと
ができる。(2) A character pattern is input, and character recognition is performed by the first character recognition means of relatively high speed and low accuracy,
A probability parameter is calculated from the distance value of the candidate character obtained as the character recognition result, and the probability of correct reading corresponding to the probability parameter is estimated by referring to the table storing the probability parameter and the correct reading probability. When the value of the correct reading probability is larger than a predetermined threshold value,
The character recognition result obtained by the first character recognizing means is output, and when the correct reading probability is smaller than a predetermined threshold value, the character is detected by the second character recognizing means which is relatively slow and highly accurate. Since character recognition is performed on the pattern and the character recognition result obtained by the second character recognition means is output, high-speed and highly accurate character recognition can be performed.

[Brief description of drawings]

【図１】本発明の原理図である。FIG. 1 is a principle diagram of the present invention.

【図２】本発明の第１の実施例のシステムの構成を示す
図である。FIG. 2 is a diagram showing a configuration of a system according to a first embodiment of the present invention.

【図３】図２における変換テーブルＴ(r) の生成を説明
する図である。FIG. 3 is a diagram illustrating generation of a conversion table T (r) in FIG.

【図４】本発明の第２の実施例のシステムの構成を示す
図である。FIG. 4 is a diagram showing a configuration of a system of a second exemplary embodiment of the present invention.

【図５】第２の実施例のシステムの動作を説明する図で
ある。FIG. 5 is a diagram for explaining the operation of the system of the second embodiment.

【図６】第２の実施例のシステムの動作を説明する図
（続き）である。FIG. 6 is a diagram (continuation) for explaining the operation of the system of the second embodiment.

【図７】従来例である。FIG. 7 is a conventional example.

[Explanation of symbols]

１文字認識手段２確率パラメータ算出手段３正読確率出力手段４確率パラメータ−正読確率変換テーブル生成手
段５確率パラメータ−正読確率変換テーブル６ａ第１の文字認識手段６ｂ第２の文字認識手段７正読確率推定手段８正読確率比較手段９認識結果出力手段１１認識対象となる文字パターン１２文字認識手段１３確率パラメータ算出手段１５確率パラメータ−正読確率変換テーブル生成手
段１４正読確率出力手段２１認識対象となる文字パターン２２特徴ベクトル抽出手段２３大分類照合手段２４詳細分類照合手段２５確率パラメータ算出手段２６正読確率比較手段Ｄ１認識辞書Ｄ２学習データを格納した記憶手段Ｄ１１第１の認識辞書Ｄ１２第２の認識辞書Ｔ(r) 確率パラメータ−正読確率変換テーブル1 character recognition means 2 probability parameter calculation means 3 correct reading probability output means 4 probability parameter-correct reading probability conversion table generating means 5 probability parameter-correct reading probability conversion table 6a first character recognition means 6b second character recognition means 7 Correct reading probability estimating means 8 Correct reading probability comparing means 9 Recognition result outputting means 11 Character pattern to be recognized 12 Character recognizing means 13 Probability parameter calculating means 15 Probability parameter-correct reading probability conversion table generating means 14 Correct reading probability output means 21 Character pattern to be recognized 22 Feature vector extracting means 23 Large classification matching means 24 Detailed classification matching means 25 Probability parameter calculating means 26 Correct reading probability comparing means D1 Recognition dictionary D2 Storage means D11 storing learning data First recognition dictionary D12 Second recognition dictionary T (r) probability parameter-correct reading probability conversion table

フロントページの続き (56)参考文献特開平１−251100（ＪＰ，Ａ) 特開昭63−225300（ＪＰ，Ａ) 特開昭60−93498（ＪＰ，Ａ) 特開平２−138682（ＪＰ，Ａ) 特開平２−228788（ＪＰ，Ａ) 特開昭59−99586（ＪＰ，Ａ) 特開平４−297975（ＪＰ，Ａ) Ｄ−361 正読確率を用いた高速高精度な文字認識方式，1996年電子情報通信学会情報・システムソサイエティ大会, 日本，1996年８月30日，ｐ．364 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06K 9/00 - 9/82 Continuation of front page (56) Reference JP-A-1-251100 (JP, A) JP-A-63-225300 (JP, A) JP-A-60-93498 (JP, A) JP-A-2-138682 (JP , A) JP-A-2-228788 (JP, A) JP-A-59-99586 (JP, A) JP-A-4-297975 (JP, A) D-361 High speed and high accuracy using correct reading probability. Character Recognition Method, 1996 IEICE Information and Systems Society Conference, Japan, August 30, 1996, p. 364 (58) Fields surveyed (Int.Cl. ⁷ , DB name) G06K 9/00-9/82

Claims

(57) [Claims]

1. A character pattern as an input, and a character recognition means calculates a plurality of candidate characters corresponding to the input character pattern and distance values corresponding to each of the candidate characters.
At least the distance value of the first candidate and the distance value of the second candidate
A character recognition method which calculates a probability parameter based on the probability parameter and outputs a correct reading probability corresponding to the probability parameter with reference to a table storing the probability parameter and the correct reading probability corresponding to the probability parameter.

2. A probability parameter for a given learning character pattern set is calculated, and the ratio between the number of learning character patterns having the same probability parameter and the ratio of the learning character patterns whose first candidate is correct in the correct reading is correctly read. The character recognition method according to claim 1, wherein a conversion table of the probability parameter and the correct reading probability is generated as the estimated value of the probability.

3. A character pattern is input, character recognition is performed by a first character recognition means of relatively high speed and low accuracy, and at least candidate characters obtained as the character recognition result are obtained.
The probability parameter is calculated based on the distance value of the first candidate and the distance value of the second candidate, and the probability is calculated by referring to the table storing the probability parameter and the correct reading probability corresponding to the probability parameter. An estimated value of the correct reading probability corresponding to the parameter is obtained, and when the estimated value of the correct reading probability is larger than a predetermined threshold value, the character recognition result obtained by the first character recognition means is output, When the reading probability is smaller than a predetermined threshold value, character recognition is performed on the character pattern using the second character recognition means that is relatively slow and highly accurate, and the character recognition obtained by the second character recognition means is performed. A character recognition method characterized by outputting a result.

4. A large-class matching operator using a character pattern as an input.
The stage, the feature vector stored in the first certification 識辞Manual
Match the feature vector of the above character pattern to be recognized
Then, K2 candidate characters are output in ascending order of distance, and K is output by the large classification matching means by the detailed classification matching means.
Of the two (K1 ≤ K2) candidate characters, K with a smaller distance value
The feature vector read from the second recognition dictionary for one
And the probability parameter from the distance value of the candidate character obtained by the above matching.
Compute the meter and correspond to the above stochastic parameters
Refer to the table that stores the correct reading probability and
Obtain the estimated value of the correct reading probability corresponding to the meter, and if the correct reading rate is greater than the threshold value, select the candidate character.
Output as a recognition result, and if the correct reading rate is lower than the threshold value, the large classification matching is performed.
The feature vectors of K2 candidate characters output by the means are second
Character pattern of the character pattern read from the recognition dictionary
And the probability parameter from the distance value of the candidate character obtained by the above matching.
Compute the meter and correspond to the above stochastic parameters
Refer to the table that stores the correct reading probability and
Obtain an estimate of the correct reading probability corresponding to the meter, and if the correct reading rate is greater than the threshold value, recognize that candidate character.
A character recognition method characterized in that it is output as a knowledge result.

5. A receives the character pattern, and character recognition means for outputting the distance value corresponding to the input character pattern and a plurality of candidate characters, the plurality of candidate characters obtained by the character recognition means small
At least based on the distance value of the first candidate and the distance value of the second candidate
A probability parameter calculating means for calculating a probability parameters have a conversion table storing pattern deformations are expressed as a statistic rate estimated value of which corresponds to each of the probability parameter, with reference to the conversion table, the probability parameter determined by the probability parameter calculating means A character recognition device comprising a correct reading probability output means for obtaining and outputting a corresponding correct reading probability.

6. A conversion table storing an estimated value of a correct reading probability calculates a probability parameter for a learning character pattern set given in advance, the number of learning character patterns having the same probability parameter, and the first candidate among them. Is obtained by calculating the estimated value of the correct reading probability from the ratio of the learned character patterns that were correct, and storing the estimated value of each correct reading probability and the probability parameter corresponding to each in the storage means. The character recognition device according to claim 5 .

7. A relatively high-speed and low-accuracy first character recognizing means for recognizing characters of an input character pattern, and a relatively low-speed and high-accuracy second character recognizing means for recognizing characters of an input character pattern. Character recognition means, a conversion table storing probability parameters and corresponding correct reading probabilities , distance values of at least first candidates of candidate characters , second candidates
The probability parameter is calculated based on the distance value of, and the correct reading probability estimation means for obtaining the estimated value of the correct reading probability corresponding to the probability parameter by referring to the conversion table, and the first character recognition means. The estimated value of the correct reading probability obtained by the correct reading probability estimating means for the candidate character is compared with a predetermined threshold value, and when the correct reading probability is larger than the predetermined threshold value, the first character recognition is performed. The character recognition result obtained by the means, and when the estimated value of the correct reading probability is smaller than a predetermined threshold value, the character recognition result obtained by the second character recognition means is output. A character recognition device comprising means.

8. A first recognition word using a character pattern as an input
Characters stored in the book and the above characters to be recognized
Match the feature vector of the pattern and K
A large classification matching unit that outputs two candidate characters and K2 (K1 ≦ K2) candidates that the large classification matching unit outputs
Second recognition of K1 characters with small distance values
A detailed classification reference that matches the feature vector read from the dictionary.
The matching means, the probability parameter and the corresponding correct reading probability
From the conversion table and the distance value of the candidate character obtained by the above matching.
Probability parameter calculation means for calculating the meter, the calculated probability parameter and the corresponding correct reading accuracy
The probability table is referenced by referring to the conversion table that stores the rate.
Probability of correct reading to obtain an estimate of correct reading probability corresponding to the parameter
An estimation unit, when the positive読率is greater than the threshold value, the candidate character
It is output as a recognition result, and when the correct reading rate is lower than the threshold value, the detailed classification reference
K2 candidate characters output by the major classification matching means to the matching means
Match the feature vector of the character pattern with the feature vector of
Requested and correct reading corresponding to the probability parameter obtained by the above collation
Recognize candidate characters if the probability is greater than a threshold
It is specially equipped with a correct reading probability comparison means for outputting as a result.
Character recognition device.