JPH08212300A

JPH08212300A - Method and device for recognizing character

Info

Publication number: JPH08212300A
Application number: JP7020815A
Authority: JP
Inventors: Yumi Nakayama; 由美中山; Shiko Yokozuka; 志行横塚
Original assignee: N T T DATA TSUSHIN KK; NTT Data Communications Systems Corp
Current assignee: N T T DATA TSUSHIN KK; NTT Data Corp
Priority date: 1995-02-08
Filing date: 1995-02-08
Publication date: 1996-08-20

Abstract

PURPOSE: To obtain a character recognition device leading out the evaluation value of a new recognition result for obtaining accurate and stable post processing result without depending on the kind of a character category. CONSTITUTION: The character recognition device collates the feature vector of a recognition object character pattern and a standard vector to obtain distance values and outputs a character category group making a character a high-ordered candidate in order of the smaller distance value and the string of distance values corresponding to individual character category. A difference value data string is prepared from the string of the distance values by a difference value preparing part 4 and a discrimination coefficient value is extracted from a coefficient memory 5 storing a discrimination coefficient value for discriminating correct/wrong reading in each character category. And, concerning this discrimination coefficient, a candidate probability calculation part 6 determines candidate probability to a first candidate category by inner-product- calculating the difference value data string and the string of the distance values with the discrimination coefficient value. In addition, a candidate probability calculation part 7 determines candidate probability to an i-th candidate character by multiplying first candidate probability by the ratio of a first candidate distance value to i-th (i>1) candidate distance value.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、文字認識装置における
文字単位の認識結果の確からしさ（以下、候補確度）を
決定する技術に関し、特に、認識結果に基づいて行う文
字列照合等の後処理の安定化に寄与する文字認識方法及
び文字認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a technique for determining the certainty (hereinafter, candidate certainty) of a recognition result in character units in a character recognition device, and particularly to post-processing such as character string collation based on the recognition result. The present invention relates to a character recognition method and a character recognition device that contribute to stabilization of characters.

【０００２】[0002]

【従来の技術】従来の一般的な文字認識装置は、図３に
示すように、特徴抽出部１と、辞書メモリ２と、特徴抽
出部１の出力を辞書メモリ２を用いて識別する識別部３
とを有している。特徴抽出部１は、例えば帳票上の認識
対象文字列を図示しないスキャナ又は文字切り出し装置
等の読取り手段によって文字単位に２値量子化された文
字パタンの特徴を抽出するものであり、その出力は当該
文字パタンの特徴ベクトルとなる。なお、本明細書でい
う「文字」とは、特に断らない限り、漢字、仮名等のほ
か、数字、記号等を含めるものとする。辞書メモリ２に
は、予め文字カテゴリの標準的な特徴を表す標準ベクト
ルが格納されている。識別部３では、辞書メモリ２から
抽出した標準ベクトルと特徴抽出部１より入力した特徴
ベクトルとを照合し、両者の類似度、例えば距離値（候
補距離値、以下、特に断らない場合は距離値と略称す
る）を計算して類似度の高い順（距離値であれば距離値
が小さい順）に上位候補となる文字カテゴリ群をソート
する。そして、より上位候補のカテゴリ情報と、そのと
きの距離値の列からなる距離値情報とを認識結果として
出力している。2. Description of the Related Art As shown in FIG. 3, a conventional general character recognition apparatus includes a feature extraction unit 1, a dictionary memory 2, and an identification unit for identifying the output of the feature extraction unit 1 using the dictionary memory 2. Three
And have. The feature extraction unit 1 extracts a feature of a character pattern that is binary quantized character by character by a reading means such as a scanner or a character slicing device (not shown) of a character string to be recognized on a form, and its output is It becomes the feature vector of the character pattern. It should be noted that the term “character” used in the present specification includes, in addition to Chinese characters, kana, etc., numbers, symbols, etc., unless otherwise specified. The dictionary memory 2 stores in advance standard vectors representing standard features of character categories. The identification unit 3 collates the standard vector extracted from the dictionary memory 2 with the feature vector input from the feature extraction unit 1 and calculates the similarity between them, for example, a distance value (candidate distance value, hereinafter, distance value unless otherwise specified). Will be abbreviated), and the character categories that will be the upper candidates will be sorted in descending order of similarity (in order of increasing distance value, increasing distance value). Then, category information of higher candidates and distance value information including a column of distance values at that time are output as recognition results.

【０００３】この認識結果の一例として、マハラノビス
距離（特徴間の統計的相関を補う距離測度）を用いたと
きの距離値情報に関する並び、即ち距離値列を図４に示
す。図４において、ｘ軸方向は認識結果の候補順位（上
位１位から１０位まで）、ｙ軸方向は各候補順位におけ
るそのときの距離値の大きさを示しており、距離値が小
さいほどその候補カテゴリが確からしいことを意味して
いる。また、パタンＡは類似カテゴリが存在するパタ
ン、パタンＢは小画数文字のパタンの一例を示してい
る。As an example of this recognition result, FIG. 4 shows a sequence relating to distance value information when a Mahalanobis distance (distance measure that compensates for statistical correlation between features) is used, that is, a distance value sequence. In FIG. 4, the x-axis direction shows the candidate ranks of the recognition results (from the first rank to the 10th rank), and the y-axis direction shows the magnitude of the distance value at that time in each candidate rank. It means that the candidate category is likely. Further, pattern A shows an example of a pattern in which similar categories exist, and pattern B shows an example of a pattern of small stroke number characters.

【０００４】ところで、文字認識装置によっては、上記
認識結果の評価値として距離値を用い、この評価値に基
づいて単語照合や文字列照合等の後処理を行うものがあ
る。例えば特開昭６３−１２１９８９号公報の「単語読
み取り方式」や、特願平６−５０８６５号公報の「文字
認識結果修正方式」においては、認識結果における各候
補文字の確からしさを表した値として距離値を用いて後
処理を行う様子が記述されている。Some character recognition devices use a distance value as the evaluation value of the recognition result and perform post-processing such as word matching and character string matching based on this evaluation value. For example, in the "word reading method" of Japanese Patent Application Laid-Open No. 63-121989 and the "character recognition result correction method" of Japanese Patent Application No. 6-50865, a value indicating the certainty of each candidate character in the recognition result is used. It describes how post-processing is performed using the distance value.

【０００５】[0005]

【発明が解決しようとする課題】通常、認識対象文字パ
タンについて候補カテゴリ列、及びそのときの距離値列
を求めた場合、類似カテゴリが存在する／しない、ある
いは漢字／非漢字である等の違いによってカテゴリ毎の
距離値の分布形状、例えば平均値，分散値等が異なって
くる。例えば図４の例では、パタンＡのように類似カテ
ゴリが存在する場合は距離値の分散値が小さく、パタン
Ｂのような小画数文字の場合は全体的に距離値が小さ
い、という傾向がある。Usually, when a candidate category sequence and a distance value sequence at that time are obtained for a character pattern to be recognized, differences such as presence / absence of similar categories, kanji / non-kanji, etc. Depending on the category, the distribution shape of the distance values, such as the average value and the variance value, differs. For example, in the example of FIG. 4, when the similar category exists like pattern A, the variance value of the distance values is small, and in the case of small stroke number characters like pattern B, the distance value tends to be small overall. .

【０００６】つまり、認識結果としての距離値は、文字
カテゴリの種類によって種々の分布形状をもつ傾向があ
る。そのため、従来のように距離値に基づいて後処理を
行う方式では、正読率や誤読率が文字カテゴリの種類に
よって変動し、安定した後処理結果が得られない問題が
あった。That is, the distance value as a recognition result tends to have various distribution shapes depending on the type of character category. Therefore, in the conventional method of performing the post-processing based on the distance value, there is a problem that the correct reading rate and the erroneous reading rate vary depending on the type of the character category, and a stable post-processing result cannot be obtained.

【０００７】本発明の課題は、かかる問題点に鑑み、文
字カテゴリの種類に過度に依存せずに高精度かつ安定的
な後処理結果を導出するための新たな認識結果の評価値
を得る技術を提供することにある。In view of such a problem, an object of the present invention is to obtain a new evaluation value of a recognition result for deriving a highly accurate and stable post-processing result without excessively depending on the type of character category. To provide.

【０００８】[0008]

【課題を解決するための手段】候補確度は、図３の文字
認識装置から出力される結果そのものが信頼できるかど
うかを表す評価値として使用するほか、複数の候補カテ
ゴリのそれぞれがどの程度信頼できるかを表す評価値と
して使用する観点がある。前者の観点に立てば、出力さ
れる候補確度は文字認識結果に対して１つであり、例え
ばリジェクト判定のための基準となり得る。これについ
ては、本願出願人が特願平５−７５０７１号明細書の
「文字認識方法及びそれを使用した文字認識装置」にお
いて提案しており、所期の効果が得られている。これに
対し、本発明では、候補確度に基づいて行われる後処理
を安定的に行うために、文字認識結果中の個々の候補カ
テゴリに対してそれぞれ候補確度を付与するものであ
る。これは後者の観点に立つものである。The candidate accuracy is used as an evaluation value indicating whether or not the result itself output from the character recognition device of FIG. 3 is reliable, and how reliable each of the plurality of candidate categories is. There is a viewpoint of using it as an evaluation value that represents From the former point of view, the candidate accuracy that is output is one for the character recognition result, and can be a reference for reject determination, for example. The applicant of the present invention has proposed this in "Character recognition method and character recognition device using the same" in Japanese Patent Application No. 5-75071, and the desired effect is obtained. On the other hand, in the present invention, in order to stably perform the post-processing performed based on the candidate certainty, the candidate certainty is given to each candidate category in the character recognition result. This is from the latter point of view.

【０００９】すなわち、本発明では、認識対象文字パタ
ンの特徴を表す特徴ベクトルと予め準備された文字カテ
ゴリ単位の標準ベクトルとの照合を行って文字カテゴリ
毎の距離値を算出し、該距離値が小さい順に上位候補と
なる文字カテゴリ群に対するカテゴリ情報と各文字カテ
ゴリに対応する距離値群からなる距離値情報とを認識結
果として出力した後、前記距離値情報から隣接距離値間
の差分値データを作成する。そして、作成された差分値
データの列及び前記距離値情報を各カテゴリの正読傾向
と誤読傾向とを区別するための判別係数値で内積して前
記認識結果の１位候補カテゴリに対する候補確度を決定
する。また、ｉ位候補カテゴリに対する候補確度を決定
する場合は、例えば１位候補カテゴリの候補確度に１位
候補距離値とｉ位候補距離値との比を乗じる。なお、本
発明は、個々の候補文字に対してそれぞれ最適な候補確
度を決定する点に主眼があるので、ｉ位候補文字に対す
る候補確度は、上記演算以外の手法によって決定しても
良い。That is, in the present invention, the distance vector for each character category is calculated by comparing the feature vector representing the characteristics of the recognition target character pattern with the standard vector prepared in advance for each character category, and the distance value is calculated. After outputting the category information for the character categories that are upper candidates in the ascending order and the distance value information including the distance value group corresponding to each character category as the recognition result, the difference value data between the adjacent distance values is calculated from the distance value information. create. Then, the column of the created difference value data and the distance value information are internally producted by the discrimination coefficient value for distinguishing the correct reading tendency and the misreading tendency of each category to obtain the candidate accuracy for the first-ranked candidate category of the recognition result. decide. Further, when determining the candidate certainty for the i-th candidate category, for example, the candidate certainty of the first-ranked candidate category is multiplied by the ratio of the first-ranked candidate distance value and the i-th candidate distance value. Since the present invention mainly focuses on determining the optimal candidate probability for each candidate character, the candidate probability for the i-th candidate character may be determined by a method other than the above calculation.

【００１０】このような手法は、以下の構成の文字認識
装置によって実施可能となる。（１）認識対象文字パタンの特徴を表す特徴ベクトルを
作成する特徴抽出部、（２）予め文字カテゴリ毎の標準
ベクトルを格納した辞書メモリ、（３）前記特徴ベクト
ルと前記辞書メモリに格納された標準ベクトルとを照合
して得た距離値の小さい順に上位候補となる文字カテゴ
リ群に対するカテゴリ情報と個々の文字カテゴリに対応
する距離値の列からなる距離値情報とを認識結果として
出力する識別部、（４）各文字カテゴリの正読傾向と誤
読傾向とを区別するためのカテゴリ毎の判別係数値を格
納した係数メモリ、（５）前記距離値情報から距離値列
の分布形状を表現する隣接距離値間の差分値データを作
成する差分値データ作成手段、（６）作成された前記差
分値データの列及び前記距離値情報を前記係数メモリ内
の該当文字カテゴリの判別係数値と内積することで前記
認識結果の１位候補カテゴリに対する候補確度を決定す
る第１の候補確度決定手段、（７）前記１位候補確度に
１位候補距離値とｉ位候補距離値との比に基づき前記認
識結果のｉ位候補文字に対する候補確度を決定する第２
の候補確度決定手段。差分値データ作成手段、第１及び
第２の候補確度決定手段は、例えばプログラムされた情
報処理装置により実現される。Such a method can be implemented by the character recognition device having the following configuration. (1) A feature extraction unit that creates a feature vector representing a feature of a recognition target character pattern; (2) a dictionary memory that stores a standard vector for each character category in advance; (3) the feature vector and the dictionary memory. An identification unit that outputs, as a recognition result, category information for character category groups that are upper candidates in the ascending order of distance values obtained by collating with a standard vector, and distance value information that is a sequence of distance values corresponding to individual character categories. (4) A coefficient memory that stores the discrimination coefficient value for each category for distinguishing the correct reading tendency and the erroneous reading tendency of each character category, (5) Adjacent that expresses the distribution shape of the distance value string from the distance value information. Difference value data creating means for creating difference value data between distance values, (6) the created character string of the difference value data and the distance value information in the corresponding character category in the coefficient memory First candidate probability determining means for determining the candidate probability of the recognition result for the first-rank candidate category by inner-product with the discrimination coefficient value of (7), the first-rank candidate distance value and the i-th candidate distance for the first-rank candidate probability. A second determining a candidate probability for the i-th candidate character of the recognition result based on a ratio with a value;
Candidate probability determination means. The difference value data creation means and the first and second candidate probability determination means are realized by, for example, a programmed information processing device.

【００１１】[0011]

【作用】本発明では、正読傾向と誤読傾向とを区別する
ためのカテゴリ毎の判別係数値を予め採取したパラメタ
に基づいて係数メモリに格納しておく。そして、識別部
から出力された距離値情報から隣接距離値間の差分値デ
ータを作成するとともに、認識結果のカテゴリ情報から
１位候補カテゴリに対応する判別係数値を係数メモリか
ら選択し、この判別係数値を第１の候補確度決定手段に
おいて距離値情報及び差分値データの列（以下、差分値
データ列）と内積して１位候補カテゴリに対する候補確
度を決定する。このとき、差分値データ列は、距離値列
全体の分布形状を表現している。次に、第２の候補確度
決定手段において１位候補確度に１位候補距離値とｉ位
候補距離値との比を乗じてｉ位候補文字に対する候補確
度を決定する。このように、カテゴリ毎の正読／誤読傾
向を反映し、かつ距離値列の分布形状を複合的に扱うこ
とで、１位候補確度の妥当性を高めることができる。こ
れにより、１位候補距離値よりも妥当性、安定性の高い
この１位候補確度を用いて２位以降候補文字の候補確度
を求めるので、結果として個々の候補文字に対してより
最適な候補確度を導出することができる。According to the present invention, the discrimination coefficient value for each category for distinguishing the correct reading tendency and the erroneous reading tendency is stored in the coefficient memory on the basis of the parameters collected in advance. Then, the difference value data between the adjacent distance values is created from the distance value information output from the identification unit, and the discrimination coefficient value corresponding to the first-ranked candidate category is selected from the coefficient memory from the category information of the recognition result. The coefficient value is inner-producted with the distance value information and the sequence of difference value data (hereinafter, difference value data sequence) by the first candidate certainty determining means to determine the candidate certainty for the first-ranked candidate category. At this time, the difference value data string represents the distribution shape of the entire distance value string. Then, the second candidate probability determining means determines the candidate probability for the i-th candidate character by multiplying the first-rank candidate probability by the ratio of the first-rank candidate distance value and the i-th candidate distance value. As described above, the correctness of the first-rank candidate probability can be increased by reflecting the correct reading / misreading tendency for each category and handling the distribution shape of the distance value sequence in a composite manner. As a result, the candidate probability of the second and subsequent candidate characters is obtained using this first-rank candidate probability, which is more valid and stable than the first-rank candidate distance value, and as a result, a more optimal candidate for each candidate character. The accuracy can be derived.

【００１２】[0012]

【実施例】以下、図面を参照して本発明の実施例を詳細
に説明する。図１は、本発明の一実施例に係る文字認識
装置の構成をブロック図によって示したものであり、従
来装置の構成を示した図３と同一要素については同一符
号を付してある。本実施例の文字認識装置は、識別部３
の後段に第１の候補確度算出部６を設けている。そし
て、識別部３の出力のうち、カテゴリ情報については直
接、距離値情報については差分値作成部４を介してそれ
ぞれ第１の候補確度算出部６に導いている。この第１の
候補確度算出部６には、係数メモリ５が接続されてお
り、また、その出力のうち、距離値情報及び候補確度
が、第２の候補確度算出部７に導かれており、この第２
の候補確度算出部７から出力される距離値情報と候補確
度、及び識別部３から出力されているカテゴリ情報がこ
の文字認識装置による最終的な認識結果となる。Embodiments of the present invention will be described below in detail with reference to the drawings. FIG. 1 is a block diagram showing the configuration of a character recognition apparatus according to an embodiment of the present invention. The same elements as those of FIG. 3 showing the configuration of the conventional apparatus are designated by the same reference numerals. The character recognition device according to the present embodiment has an identification unit 3
The first candidate probability calculating unit 6 is provided in the subsequent stage. In the output of the identification unit 3, the category information is directly led to the distance value information, and the distance value information is led to the first candidate probability calculation unit 6 via the difference value creation unit 4. The coefficient memory 5 is connected to the first candidate certainty calculating unit 6, and the distance value information and the candidate certainty of the output thereof are guided to the second candidate certainty calculating unit 7, This second
The distance value information and the candidate accuracy output from the candidate accuracy calculation unit 7 and the category information output from the identification unit 3 are the final recognition result by the character recognition device.

【００１３】ここに、差分値作成部４は、識別部３から
出力される距離値情報、すなわち距離値列全体の分布形
状を表す複数の差分値データを作成するものであり、係
数メモリ５は、予めカテゴリ毎に正読傾向と誤読傾向と
を判別するように判別分析を用いて学習した判別係数値
を格納してあるものである。また、第１の候補確度算出
部６は、差分値作成部６で作成した複数の差分値データ
列と認識結果の１位候補カテゴリに応じて決まる判別係
数値から所定の判別関数を用いてその内積をとり、該判
別関数値を１位候補確度として出力するものである。第
２の候補確度算出部７は、１位候補確度と前記候補距離
値列に基づいて２位以降候補確度を算出するものであ
る。Here, the difference value creating section 4 creates the distance value information output from the identifying section 3, that is, a plurality of difference value data representing the distribution shape of the entire distance value sequence. The discriminant coefficient value learned by using discriminant analysis so as to discriminate the correct reading tendency and the erroneous reading tendency for each category is stored in advance. In addition, the first candidate probability calculating unit 6 uses a predetermined discriminant function from the discriminant coefficient values determined according to the first-order candidate categories of the plurality of difference value data strings created by the difference value creating unit 6 and the recognition result. The inner product is calculated and the discriminant function value is output as the first-order candidate probability. The second candidate probability calculating unit 7 calculates the second and subsequent candidate probabilities based on the first candidate probability and the candidate distance value sequence.

【００１４】図２は、係数メモリ５の内容例を示す図で
あり、符号１１は、カテゴリ欄、１２は各カテゴリ毎の
判別係数欄である。判別係数欄１２の係数値は予め設定
されているものとする。次に、この文字認識装置による
候補確度決定処理について具体例を示して説明する。但
し、ここでは、識別部３から以下に示すようなカテゴリ
情報Ｃ及び距離値列（距離値情報）ｄが出力された場合
を仮定している。FIG. 2 is a diagram showing an example of the contents of the coefficient memory 5. Reference numeral 11 is a category column, and 12 is a discrimination coefficient column for each category. It is assumed that the coefficient value in the discrimination coefficient column 12 is preset. Next, a candidate certainty determination process by this character recognition device will be described with a specific example. However, here, it is assumed that the identification unit 3 outputs category information C and a distance value sequence (distance value information) d as shown below.

【００１５】[0015]

【数１】Ｃ＝｛Ｃ_i｜ｉ＝１，２，・・・，Ｉ｝ｄ＝｛ｄ_i｜ｉ＝１，２，・・・，Ｉ｝## EQU1 ## C = {C _i | i = 1, 2, ..., I} d = {d _i | i = 1, 2, ..., I}

【００１６】但し、Ｃ_iはｉ位候補カテゴリ、ｄ_iはｉ位
候補距離値であり、Ｉは対象とする最大候補カテゴリ数
である。まず、差分値作成部４が上記距離値列ｄを受け
取って、個々の隣接距離値間の差分値、すなわち差分値
データ列ｄ'を求める。この差分値データ列ｄ'の一例を
以下に示す。However, C _i is the i-th candidate category, d _i is the i-th candidate distance value, and I is the maximum number of target candidate categories. First, the difference value creating unit 4 receives the distance value sequence d and obtains the difference value between the individual adjacent distance values, that is, the difference value data sequence d ′. An example of this difference value data string d'is shown below.

【００１７】[0017]

【数２】ｄ'＝｛ｄ_i'：ｄ_i+1−ｄ_i｜ｉ＝１，２，・・
・，Ｉ−１｝## EQU2 ## d '= {d _i ': d _{i + 1} −d _i | i = 1, 2, ...
., I-1}

【００１８】但し、ｄ_i'はｉ位の候補カテゴリに対応す
る差分値データである。次に、第１の候補確度算出部６
において、１位候補カテゴリＣ₁に着目して１位候補確
度の算出に用いる判別係数値ｗを係数メモリ５より選択
抽出する。例えば１位候補カテゴリＣ₁が「阿」であれ
ば、図２から判別係数値ｗは下式の如く表わされる。However, d _i 'is difference value data corresponding to the i-th candidate category. Next, the first candidate probability calculation unit 6
At, the discriminant coefficient value w used for the calculation of the first-rank candidate probability is selected and extracted from the coefficient memory 5 by focusing on the first-rank candidate category C ₁ . For example, if the first-ranked candidate category C ₁ is “A”, the discrimination coefficient value w is represented by the following equation from FIG.

【００１９】[0019]

【数３】ｗ＝（ｗ₄₁，ｗ₄₂，・・・，ｗ_4m，ｗ_4(2I-1) ## EQU3 ## w = (w ₄₁ , w ₄₂ , ..., W _4m , w _{4 (2I-1)}

【００２０】そして、該認識結果の距離値列ｄと、差分
値作成部４で求めた距離値列ｄ'と、判別係数値ｗとを
パラメータとした判別関数ｆ（ｗ，ｄ，ｄ'）から該認
識結果の１位候補確度を以下のように算出する。Then, the discriminant function f (w, d, d ') using the distance value sequence d of the recognition result, the distance value sequence d'obtained by the difference value creating section 4, and the discriminant coefficient value w as parameters. Then, the 1st place candidate probability of the recognition result is calculated as follows.

【００２１】[0021]

【数４】１位候補確度＝ｆ（ｗ，ｄ，ｄ'）＝ｗ₄₁・ｄ₁＋ｗ₄₂・ｄ₂＋・・・＋ｗ_4I・ｄ_I' ＋ｗ_4(I+1)・ｄ₁'+ｗ_4(I+2)・ｄ₂'＋・・・＋ｗ_4(2I-1)・ｄ_I-1'[Number 4] # 1 candidate confidence = f (w, d, d ') = w 41 · d 1 + w 42 · d 2 + ··· + w 4I · d I' + w 4 (I + 1) · d 1 ' + w _{4 (I + 2)}・ d ₂ '+ ・・・ + w _{4 (2I-1)}・ d _I-1 '

【００２２】なお、上記のような距離値の引用、差分値
データ列ｄ’の作成は、サンプル的に行っており、他の
実施態様も考えられる。そして、第２の候補確度算出部
７では、上記１位候補確度と距離値列ｄとから以下のよ
うに２位以降候補確度を算出する。It should be noted that the quoting of the distance value and the creation of the difference value data string d'as described above are performed on a sample basis, and other embodiments are also conceivable. Then, the second candidate probability calculating section 7 calculates the second and subsequent candidate probabilities as follows from the first-rank candidate probability and the distance value sequence d.

【００２３】[0023]

【数５】ｉ位候補確度＝１位候補確度×（ｄ₁／ｄ_i）但し、２≦ｉ≦ＩEquation 5] i-position candidate confidence = 1-position candidate confidence × (d ₁ / d _i) where, 2 ≦ i ≦ I

【００２４】その後、１〜Ｉ位候補確度を個々の候補文
字の確からしさを数量化した値として、後段の後処理に
適用する。After that, the 1st to Ith candidate probabilities are applied to the post-processing in the subsequent stage as the quantified values of the probabilities of the individual candidate characters.

【００２５】上述のように、本実施例では、判別関数か
ら認識結果の確からしさを数量化する際に、各候補カテ
ゴリＣ_iの傾向に対応した判別係数ｗを用い、さらに、
距離値列ｄ、差分値データ列ｄ'を求め、距離値列全体
の分布形状がわかるように距離値情報を複合的に表現し
たので、従来のようにカテゴリに過度に依存する距離値
情報のみで表現した場合に比べて候補確度の妥当性、安
定性が格段に高まる。これにより、例えば文字認識結果
のうち距離値情報をそのまま使用する文字照合等の後処
理において、距離値情報に代えて上記導出した個々の候
補確度を用いることで、正読率の向上、誤読率の削減が
可能となる。したがって、本実施例の手法を私的機関あ
るいは公共機関でのデータエントリー業務で使用される
文字認識装置に適用した場合、オペレータの修正時間が
短くなり、人手の介在が小さくなる等の利点が生じる。As described above, in this embodiment, when quantifying the certainty of the recognition result from the discriminant function, the discriminant coefficient w corresponding to the tendency of each candidate category C _i is used, and further,
Since the distance value sequence d and the difference value data sequence d ′ are obtained and the distance value information is expressed in a composite manner so that the distribution shape of the entire distance value sequence can be understood, only the distance value information that excessively depends on the category as in the past. The relevance and stability of the candidate accuracy are significantly improved compared to the case expressed by. Thus, for example, in post-processing such as character matching that uses the distance value information as it is in the character recognition result, by using the individual candidate accuracy derived above instead of the distance value information, the correct reading rate is improved and the misreading rate is increased. Can be reduced. Therefore, when the method of the present embodiment is applied to the character recognition device used in the data entry work in a private institution or a public institution, the correction time of the operator is shortened and the manual intervention is reduced. .

【００２６】なお、上述の距離値列ｄの引用、差分値デ
ータ列ｄ’の作成は、サンプル的に行ったものなので、
必ずしも本実施例の内容に限定されず、他の実施内容で
あってよいのは勿論である。Since the above-mentioned citation of the distance value sequence d and the creation of the difference value data sequence d'are carried out on a sample basis,
It is needless to say that the contents of the present embodiment are not limited to the contents of the present embodiment and may be other contents.

【００２７】[0027]

【発明の効果】以上の説明から明かなように、本発明に
よれば、認識対象文字パタンに対する個々の候補カテゴ
リ毎に最適な候補確度が得られる効果がある。また、距
離値情報を複合的に扱って距離値列全体の分布形状を把
握し得るようにしたので、当該候補カテゴリに対して最
も妥当な候補確度の算出が安定的になされる効果があ
る。これにより、カテゴリに依存せずに高精度かつ安定
的な後処理結果を得るための新たな評価値を導出するこ
とができ、従来の問題点を解消することができる。As is apparent from the above description, according to the present invention, there is an effect that the optimum candidate probability can be obtained for each candidate category for the character pattern to be recognized. Further, since the distance value information is treated in a composite manner so that the distribution shape of the entire distance value sequence can be grasped, there is an effect that the most appropriate candidate probability can be stably calculated for the candidate category. Thereby, a new evaluation value for obtaining a highly accurate and stable post-processing result can be derived without depending on the category, and the conventional problems can be solved.

[Brief description of drawings]

【図１】本発明の一実施例に係る文字認識装置のブロッ
ク構成図。FIG. 1 is a block configuration diagram of a character recognition device according to an embodiment of the present invention.

【図２】本実施例の文字認識装置が備える係数メモリの
内容例の説明図。FIG. 2 is an explanatory diagram of an example of contents of a coefficient memory included in the character recognition device according to the present embodiment.

【図３】従来の文字認識装置の基本構成図。FIG. 3 is a basic configuration diagram of a conventional character recognition device.

【図４】従来の文字認識装置による文字認識処理に係る
候補順位と距離値との関係を示す実測図。FIG. 4 is an actual measurement diagram showing a relationship between candidate ranks and distance values related to character recognition processing by a conventional character recognition device.

[Brief description of reference numerals]

１特徴抽出部２辞書メモリ３識別部４差分値作成部５係数メモリ６第１の候補確度算出部７第２の候補確度算出部 1 Feature Extraction Unit 2 Dictionary Memory 3 Discrimination Unit 4 Difference Value Creation Unit 5 Coefficient Memory 6 First Candidate Probability Calculation Unit 7 Second Candidate Probability Calculation Unit

Claims

[Claims]

1. A candidate distance value for each category is calculated by collating a feature vector representing a feature of a recognition target character pattern with a standard vector prepared in advance for each character category.
In a character recognition method, which includes a step of outputting, as a recognition result, category information for character category groups that are upper candidate categories in ascending order of candidate distance values and distance value information that is a sequence of candidate distance values corresponding to individual character categories. , A step of creating difference value data between adjacent candidate distance values from the distance value information, and distinguishing the created column of the difference value data and the distance value information from the correct reading tendency and the misreading tendency of each character category. For determining the candidate probability for the highest (hereinafter, first) candidate category of the recognition result by inner product with the discriminant coefficient value for the character recognition method.

2. The character recognition method according to claim 1, wherein the candidate probability of the determined first-rank candidate category is multiplied by a ratio between the first-rank candidate distance value and the i (i> 1) -rank candidate distance value. And a step of deriving a candidate probability of the i-th candidate category of the recognition result.

3. A feature extraction unit that creates a feature vector representing a feature of a recognition target character pattern, a dictionary memory that stores a standard vector for each character category in advance, and the feature vector and a standard vector stored in the dictionary memory. An identification unit that outputs, as a recognition result, category information for a character category group that is a higher-ranked candidate in ascending order of candidate distance values obtained by collating with and distance value information that is a sequence of candidate distance values corresponding to individual character categories. In a character recognition device including, a coefficient memory storing a discrimination coefficient value for each category for distinguishing a correct reading tendency and a misreading tendency of each character category, and a difference between the adjacent candidate distance values from the distance value information. Difference value data creating means for creating the value data, and the created character string of the difference value data and the distance value information in the corresponding character category in the coefficient memory. Character recognition apparatus comprising: the first candidate reliability determining means for determining a candidate confidence for first candidate category of the recognition result by determining coefficients and dot product of Li, the.

4. The character recognition device according to claim 3, wherein the first-rank candidate probability is multiplied by a ratio of the first-rank candidate distance value and the i (i> 1) -rank candidate distance value to obtain the i-th candidate of the recognition result. A character recognition device further comprising a second candidate probability determination means for determining a candidate probability for a character.