JPH08212300A - Method and device for recognizing character - Google Patents

Method and device for recognizing character

Info

Publication number
JPH08212300A
JPH08212300A JP7020815A JP2081595A JPH08212300A JP H08212300 A JPH08212300 A JP H08212300A JP 7020815 A JP7020815 A JP 7020815A JP 2081595 A JP2081595 A JP 2081595A JP H08212300 A JPH08212300 A JP H08212300A
Authority
JP
Japan
Prior art keywords
candidate
character
category
distance value
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP7020815A
Other languages
Japanese (ja)
Inventor
Yumi Nakayama
由美 中山
Shiko Yokozuka
志行 横塚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
N T T DATA TSUSHIN KK
NTT Data Corp
Original Assignee
N T T DATA TSUSHIN KK
NTT Data Communications Systems Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by N T T DATA TSUSHIN KK, NTT Data Communications Systems Corp filed Critical N T T DATA TSUSHIN KK
Priority to JP7020815A priority Critical patent/JPH08212300A/en
Publication of JPH08212300A publication Critical patent/JPH08212300A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PURPOSE: To obtain a character recognition device leading out the evaluation value of a new recognition result for obtaining accurate and stable post processing result without depending on the kind of a character category. CONSTITUTION: The character recognition device collates the feature vector of a recognition object character pattern and a standard vector to obtain distance values and outputs a character category group making a character a high-ordered candidate in order of the smaller distance value and the string of distance values corresponding to individual character category. A difference value data string is prepared from the string of the distance values by a difference value preparing part 4 and a discrimination coefficient value is extracted from a coefficient memory 5 storing a discrimination coefficient value for discriminating correct/wrong reading in each character category. And, concerning this discrimination coefficient, a candidate probability calculation part 6 determines candidate probability to a first candidate category by inner-product- calculating the difference value data string and the string of the distance values with the discrimination coefficient value. In addition, a candidate probability calculation part 7 determines candidate probability to an i-th candidate character by multiplying first candidate probability by the ratio of a first candidate distance value to i-th (i>1) candidate distance value.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は、文字認識装置における
文字単位の認識結果の確からしさ(以下、候補確度)を
決定する技術に関し、特に、認識結果に基づいて行う文
字列照合等の後処理の安定化に寄与する文字認識方法及
び文字認識装置に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a technique for determining the certainty (hereinafter, candidate certainty) of a recognition result in character units in a character recognition device, and particularly to post-processing such as character string collation based on the recognition result. The present invention relates to a character recognition method and a character recognition device that contribute to stabilization of characters.

【0002】[0002]

【従来の技術】従来の一般的な文字認識装置は、図3に
示すように、特徴抽出部1と、辞書メモリ2と、特徴抽
出部1の出力を辞書メモリ2を用いて識別する識別部3
とを有している。特徴抽出部1は、例えば帳票上の認識
対象文字列を図示しないスキャナ又は文字切り出し装置
等の読取り手段によって文字単位に2値量子化された文
字パタンの特徴を抽出するものであり、その出力は当該
文字パタンの特徴ベクトルとなる。なお、本明細書でい
う「文字」とは、特に断らない限り、漢字、仮名等のほ
か、数字、記号等を含めるものとする。辞書メモリ2に
は、予め文字カテゴリの標準的な特徴を表す標準ベクト
ルが格納されている。識別部3では、辞書メモリ2から
抽出した標準ベクトルと特徴抽出部1より入力した特徴
ベクトルとを照合し、両者の類似度、例えば距離値(候
補距離値、以下、特に断らない場合は距離値と略称す
る)を計算して類似度の高い順(距離値であれば距離値
が小さい順)に上位候補となる文字カテゴリ群をソート
する。そして、より上位候補のカテゴリ情報と、そのと
きの距離値の列からなる距離値情報とを認識結果として
出力している。
2. Description of the Related Art As shown in FIG. 3, a conventional general character recognition apparatus includes a feature extraction unit 1, a dictionary memory 2, and an identification unit for identifying the output of the feature extraction unit 1 using the dictionary memory 2. Three
And have. The feature extraction unit 1 extracts a feature of a character pattern that is binary quantized character by character by a reading means such as a scanner or a character slicing device (not shown) of a character string to be recognized on a form, and its output is It becomes the feature vector of the character pattern. It should be noted that the term “character” used in the present specification includes, in addition to Chinese characters, kana, etc., numbers, symbols, etc., unless otherwise specified. The dictionary memory 2 stores in advance standard vectors representing standard features of character categories. The identification unit 3 collates the standard vector extracted from the dictionary memory 2 with the feature vector input from the feature extraction unit 1 and calculates the similarity between them, for example, a distance value (candidate distance value, hereinafter, distance value unless otherwise specified). Will be abbreviated), and the character categories that will be the upper candidates will be sorted in descending order of similarity (in order of increasing distance value, increasing distance value). Then, category information of higher candidates and distance value information including a column of distance values at that time are output as recognition results.

【0003】この認識結果の一例として、マハラノビス
距離(特徴間の統計的相関を補う距離測度)を用いたと
きの距離値情報に関する並び、即ち距離値列を図4に示
す。図4において、x軸方向は認識結果の候補順位(上
位1位から10位まで)、y軸方向は各候補順位におけ
るそのときの距離値の大きさを示しており、距離値が小
さいほどその候補カテゴリが確からしいことを意味して
いる。また、パタンAは類似カテゴリが存在するパタ
ン、パタンBは小画数文字のパタンの一例を示してい
る。
As an example of this recognition result, FIG. 4 shows a sequence relating to distance value information when a Mahalanobis distance (distance measure that compensates for statistical correlation between features) is used, that is, a distance value sequence. In FIG. 4, the x-axis direction shows the candidate ranks of the recognition results (from the first rank to the 10th rank), and the y-axis direction shows the magnitude of the distance value at that time in each candidate rank. It means that the candidate category is likely. Further, pattern A shows an example of a pattern in which similar categories exist, and pattern B shows an example of a pattern of small stroke number characters.

【0004】ところで、文字認識装置によっては、上記
認識結果の評価値として距離値を用い、この評価値に基
づいて単語照合や文字列照合等の後処理を行うものがあ
る。例えば特開昭63−121989号公報の「単語読
み取り方式」や、特願平6−50865号公報の「文字
認識結果修正方式」においては、認識結果における各候
補文字の確からしさを表した値として距離値を用いて後
処理を行う様子が記述されている。
Some character recognition devices use a distance value as the evaluation value of the recognition result and perform post-processing such as word matching and character string matching based on this evaluation value. For example, in the "word reading method" of Japanese Patent Application Laid-Open No. 63-121989 and the "character recognition result correction method" of Japanese Patent Application No. 6-50865, a value indicating the certainty of each candidate character in the recognition result is used. It describes how post-processing is performed using the distance value.

【0005】[0005]

【発明が解決しようとする課題】通常、認識対象文字パ
タンについて候補カテゴリ列、及びそのときの距離値列
を求めた場合、類似カテゴリが存在する/しない、ある
いは漢字/非漢字である等の違いによってカテゴリ毎の
距離値の分布形状、例えば平均値,分散値等が異なって
くる。例えば図4の例では、パタンAのように類似カテ
ゴリが存在する場合は距離値の分散値が小さく、パタン
Bのような小画数文字の場合は全体的に距離値が小さ
い、という傾向がある。
Usually, when a candidate category sequence and a distance value sequence at that time are obtained for a character pattern to be recognized, differences such as presence / absence of similar categories, kanji / non-kanji, etc. Depending on the category, the distribution shape of the distance values, such as the average value and the variance value, differs. For example, in the example of FIG. 4, when the similar category exists like pattern A, the variance value of the distance values is small, and in the case of small stroke number characters like pattern B, the distance value tends to be small overall. .

【0006】つまり、認識結果としての距離値は、文字
カテゴリの種類によって種々の分布形状をもつ傾向があ
る。そのため、従来のように距離値に基づいて後処理を
行う方式では、正読率や誤読率が文字カテゴリの種類に
よって変動し、安定した後処理結果が得られない問題が
あった。
That is, the distance value as a recognition result tends to have various distribution shapes depending on the type of character category. Therefore, in the conventional method of performing the post-processing based on the distance value, there is a problem that the correct reading rate and the erroneous reading rate vary depending on the type of the character category, and a stable post-processing result cannot be obtained.

【0007】本発明の課題は、かかる問題点に鑑み、文
字カテゴリの種類に過度に依存せずに高精度かつ安定的
な後処理結果を導出するための新たな認識結果の評価値
を得る技術を提供することにある。
In view of such a problem, an object of the present invention is to obtain a new evaluation value of a recognition result for deriving a highly accurate and stable post-processing result without excessively depending on the type of character category. To provide.

【0008】[0008]

【課題を解決するための手段】候補確度は、図3の文字
認識装置から出力される結果そのものが信頼できるかど
うかを表す評価値として使用するほか、複数の候補カテ
ゴリのそれぞれがどの程度信頼できるかを表す評価値と
して使用する観点がある。前者の観点に立てば、出力さ
れる候補確度は文字認識結果に対して1つであり、例え
ばリジェクト判定のための基準となり得る。これについ
ては、本願出願人が特願平5−75071号明細書の
「文字認識方法及びそれを使用した文字認識装置」にお
いて提案しており、所期の効果が得られている。これに
対し、本発明では、候補確度に基づいて行われる後処理
を安定的に行うために、文字認識結果中の個々の候補カ
テゴリに対してそれぞれ候補確度を付与するものであ
る。これは後者の観点に立つものである。
The candidate accuracy is used as an evaluation value indicating whether or not the result itself output from the character recognition device of FIG. 3 is reliable, and how reliable each of the plurality of candidate categories is. There is a viewpoint of using it as an evaluation value that represents From the former point of view, the candidate accuracy that is output is one for the character recognition result, and can be a reference for reject determination, for example. The applicant of the present invention has proposed this in "Character recognition method and character recognition device using the same" in Japanese Patent Application No. 5-75071, and the desired effect is obtained. On the other hand, in the present invention, in order to stably perform the post-processing performed based on the candidate certainty, the candidate certainty is given to each candidate category in the character recognition result. This is from the latter point of view.

【0009】すなわち、本発明では、認識対象文字パタ
ンの特徴を表す特徴ベクトルと予め準備された文字カテ
ゴリ単位の標準ベクトルとの照合を行って文字カテゴリ
毎の距離値を算出し、該距離値が小さい順に上位候補と
なる文字カテゴリ群に対するカテゴリ情報と各文字カテ
ゴリに対応する距離値群からなる距離値情報とを認識結
果として出力した後、前記距離値情報から隣接距離値間
の差分値データを作成する。そして、作成された差分値
データの列及び前記距離値情報を各カテゴリの正読傾向
と誤読傾向とを区別するための判別係数値で内積して前
記認識結果の1位候補カテゴリに対する候補確度を決定
する。また、i位候補カテゴリに対する候補確度を決定
する場合は、例えば1位候補カテゴリの候補確度に1位
候補距離値とi位候補距離値との比を乗じる。なお、本
発明は、個々の候補文字に対してそれぞれ最適な候補確
度を決定する点に主眼があるので、i位候補文字に対す
る候補確度は、上記演算以外の手法によって決定しても
良い。
That is, in the present invention, the distance vector for each character category is calculated by comparing the feature vector representing the characteristics of the recognition target character pattern with the standard vector prepared in advance for each character category, and the distance value is calculated. After outputting the category information for the character categories that are upper candidates in the ascending order and the distance value information including the distance value group corresponding to each character category as the recognition result, the difference value data between the adjacent distance values is calculated from the distance value information. create. Then, the column of the created difference value data and the distance value information are internally producted by the discrimination coefficient value for distinguishing the correct reading tendency and the misreading tendency of each category to obtain the candidate accuracy for the first-ranked candidate category of the recognition result. decide. Further, when determining the candidate certainty for the i-th candidate category, for example, the candidate certainty of the first-ranked candidate category is multiplied by the ratio of the first-ranked candidate distance value and the i-th candidate distance value. Since the present invention mainly focuses on determining the optimal candidate probability for each candidate character, the candidate probability for the i-th candidate character may be determined by a method other than the above calculation.

【0010】このような手法は、以下の構成の文字認識
装置によって実施可能となる。 (1)認識対象文字パタンの特徴を表す特徴ベクトルを
作成する特徴抽出部、(2)予め文字カテゴリ毎の標準
ベクトルを格納した辞書メモリ、(3)前記特徴ベクト
ルと前記辞書メモリに格納された標準ベクトルとを照合
して得た距離値の小さい順に上位候補となる文字カテゴ
リ群に対するカテゴリ情報と個々の文字カテゴリに対応
する距離値の列からなる距離値情報とを認識結果として
出力する識別部、(4)各文字カテゴリの正読傾向と誤
読傾向とを区別するためのカテゴリ毎の判別係数値を格
納した係数メモリ、(5)前記距離値情報から距離値列
の分布形状を表現する隣接距離値間の差分値データを作
成する差分値データ作成手段、(6)作成された前記差
分値データの列及び前記距離値情報を前記係数メモリ内
の該当文字カテゴリの判別係数値と内積することで前記
認識結果の1位候補カテゴリに対する候補確度を決定す
る第1の候補確度決定手段、(7)前記1位候補確度に
1位候補距離値とi位候補距離値との比に基づき前記認
識結果のi位候補文字に対する候補確度を決定する第2
の候補確度決定手段。差分値データ作成手段、第1及び
第2の候補確度決定手段は、例えばプログラムされた情
報処理装置により実現される。
Such a method can be implemented by the character recognition device having the following configuration. (1) A feature extraction unit that creates a feature vector representing a feature of a recognition target character pattern; (2) a dictionary memory that stores a standard vector for each character category in advance; (3) the feature vector and the dictionary memory. An identification unit that outputs, as a recognition result, category information for character category groups that are upper candidates in the ascending order of distance values obtained by collating with a standard vector, and distance value information that is a sequence of distance values corresponding to individual character categories. (4) A coefficient memory that stores the discrimination coefficient value for each category for distinguishing the correct reading tendency and the erroneous reading tendency of each character category, (5) Adjacent that expresses the distribution shape of the distance value string from the distance value information. Difference value data creating means for creating difference value data between distance values, (6) the created character string of the difference value data and the distance value information in the corresponding character category in the coefficient memory First candidate probability determining means for determining the candidate probability of the recognition result for the first-rank candidate category by inner-product with the discrimination coefficient value of (7), the first-rank candidate distance value and the i-th candidate distance for the first-rank candidate probability. A second determining a candidate probability for the i-th candidate character of the recognition result based on a ratio with a value;
Candidate probability determination means. The difference value data creation means and the first and second candidate probability determination means are realized by, for example, a programmed information processing device.

【0011】[0011]

【作用】本発明では、正読傾向と誤読傾向とを区別する
ためのカテゴリ毎の判別係数値を予め採取したパラメタ
に基づいて係数メモリに格納しておく。そして、識別部
から出力された距離値情報から隣接距離値間の差分値デ
ータを作成するとともに、認識結果のカテゴリ情報から
1位候補カテゴリに対応する判別係数値を係数メモリか
ら選択し、この判別係数値を第1の候補確度決定手段に
おいて距離値情報及び差分値データの列(以下、差分値
データ列)と内積して1位候補カテゴリに対する候補確
度を決定する。このとき、差分値データ列は、距離値列
全体の分布形状を表現している。次に、第2の候補確度
決定手段において1位候補確度に1位候補距離値とi位
候補距離値との比を乗じてi位候補文字に対する候補確
度を決定する。このように、カテゴリ毎の正読/誤読傾
向を反映し、かつ距離値列の分布形状を複合的に扱うこ
とで、1位候補確度の妥当性を高めることができる。こ
れにより、1位候補距離値よりも妥当性、安定性の高い
この1位候補確度を用いて2位以降候補文字の候補確度
を求めるので、結果として個々の候補文字に対してより
最適な候補確度を導出することができる。
According to the present invention, the discrimination coefficient value for each category for distinguishing the correct reading tendency and the erroneous reading tendency is stored in the coefficient memory on the basis of the parameters collected in advance. Then, the difference value data between the adjacent distance values is created from the distance value information output from the identification unit, and the discrimination coefficient value corresponding to the first-ranked candidate category is selected from the coefficient memory from the category information of the recognition result. The coefficient value is inner-producted with the distance value information and the sequence of difference value data (hereinafter, difference value data sequence) by the first candidate certainty determining means to determine the candidate certainty for the first-ranked candidate category. At this time, the difference value data string represents the distribution shape of the entire distance value string. Then, the second candidate probability determining means determines the candidate probability for the i-th candidate character by multiplying the first-rank candidate probability by the ratio of the first-rank candidate distance value and the i-th candidate distance value. As described above, the correctness of the first-rank candidate probability can be increased by reflecting the correct reading / misreading tendency for each category and handling the distribution shape of the distance value sequence in a composite manner. As a result, the candidate probability of the second and subsequent candidate characters is obtained using this first-rank candidate probability, which is more valid and stable than the first-rank candidate distance value, and as a result, a more optimal candidate for each candidate character. The accuracy can be derived.

【0012】[0012]

【実施例】以下、図面を参照して本発明の実施例を詳細
に説明する。図1は、本発明の一実施例に係る文字認識
装置の構成をブロック図によって示したものであり、従
来装置の構成を示した図3と同一要素については同一符
号を付してある。本実施例の文字認識装置は、識別部3
の後段に第1の候補確度算出部6を設けている。そし
て、識別部3の出力のうち、カテゴリ情報については直
接、距離値情報については差分値作成部4を介してそれ
ぞれ第1の候補確度算出部6に導いている。この第1の
候補確度算出部6には、係数メモリ5が接続されてお
り、また、その出力のうち、距離値情報及び候補確度
が、第2の候補確度算出部7に導かれており、この第2
の候補確度算出部7から出力される距離値情報と候補確
度、及び識別部3から出力されているカテゴリ情報がこ
の文字認識装置による最終的な認識結果となる。
Embodiments of the present invention will be described below in detail with reference to the drawings. FIG. 1 is a block diagram showing the configuration of a character recognition apparatus according to an embodiment of the present invention. The same elements as those of FIG. 3 showing the configuration of the conventional apparatus are designated by the same reference numerals. The character recognition device according to the present embodiment has an identification unit 3
The first candidate probability calculating unit 6 is provided in the subsequent stage. In the output of the identification unit 3, the category information is directly led to the distance value information, and the distance value information is led to the first candidate probability calculation unit 6 via the difference value creation unit 4. The coefficient memory 5 is connected to the first candidate certainty calculating unit 6, and the distance value information and the candidate certainty of the output thereof are guided to the second candidate certainty calculating unit 7, This second
The distance value information and the candidate accuracy output from the candidate accuracy calculation unit 7 and the category information output from the identification unit 3 are the final recognition result by the character recognition device.

【0013】ここに、差分値作成部4は、識別部3から
出力される距離値情報、すなわち距離値列全体の分布形
状を表す複数の差分値データを作成するものであり、係
数メモリ5は、予めカテゴリ毎に正読傾向と誤読傾向と
を判別するように判別分析を用いて学習した判別係数値
を格納してあるものである。また、第1の候補確度算出
部6は、差分値作成部6で作成した複数の差分値データ
列と認識結果の1位候補カテゴリに応じて決まる判別係
数値から所定の判別関数を用いてその内積をとり、該判
別関数値を1位候補確度として出力するものである。第
2の候補確度算出部7は、1位候補確度と前記候補距離
値列に基づいて2位以降候補確度を算出するものであ
る。
Here, the difference value creating section 4 creates the distance value information output from the identifying section 3, that is, a plurality of difference value data representing the distribution shape of the entire distance value sequence. The discriminant coefficient value learned by using discriminant analysis so as to discriminate the correct reading tendency and the erroneous reading tendency for each category is stored in advance. In addition, the first candidate probability calculating unit 6 uses a predetermined discriminant function from the discriminant coefficient values determined according to the first-order candidate categories of the plurality of difference value data strings created by the difference value creating unit 6 and the recognition result. The inner product is calculated and the discriminant function value is output as the first-order candidate probability. The second candidate probability calculating unit 7 calculates the second and subsequent candidate probabilities based on the first candidate probability and the candidate distance value sequence.

【0014】図2は、係数メモリ5の内容例を示す図で
あり、符号11は、カテゴリ欄、12は各カテゴリ毎の
判別係数欄である。判別係数欄12の係数値は予め設定
されているものとする。次に、この文字認識装置による
候補確度決定処理について具体例を示して説明する。但
し、ここでは、識別部3から以下に示すようなカテゴリ
情報C及び距離値列(距離値情報)dが出力された場合
を仮定している。
FIG. 2 is a diagram showing an example of the contents of the coefficient memory 5. Reference numeral 11 is a category column, and 12 is a discrimination coefficient column for each category. It is assumed that the coefficient value in the discrimination coefficient column 12 is preset. Next, a candidate certainty determination process by this character recognition device will be described with a specific example. However, here, it is assumed that the identification unit 3 outputs category information C and a distance value sequence (distance value information) d as shown below.

【0015】[0015]

【数1】C={Ci|i=1,2,・・・,I} d={di|i=1,2,・・・,I}## EQU1 ## C = {C i | i = 1, 2, ..., I} d = {d i | i = 1, 2, ..., I}

【0016】但し、Ciはi位候補カテゴリ、diはi位
候補距離値であり、Iは対象とする最大候補カテゴリ数
である。まず、差分値作成部4が上記距離値列dを受け
取って、個々の隣接距離値間の差分値、すなわち差分値
データ列d'を求める。この差分値データ列d'の一例を
以下に示す。
However, C i is the i-th candidate category, d i is the i-th candidate distance value, and I is the maximum number of target candidate categories. First, the difference value creating unit 4 receives the distance value sequence d and obtains the difference value between the individual adjacent distance values, that is, the difference value data sequence d ′. An example of this difference value data string d'is shown below.

【0017】[0017]

【数2】d'={di':di+1−di|i=1,2,・・
・,I−1}
## EQU2 ## d '= {d i ': d i + 1 −d i | i = 1, 2, ...
., I-1}

【0018】但し、di'はi位の候補カテゴリに対応す
る差分値データである。次に、第1の候補確度算出部6
において、1位候補カテゴリC1に着目して1位候補確
度の算出に用いる判別係数値wを係数メモリ5より選択
抽出する。例えば1位候補カテゴリC1が「阿」であれ
ば、図2から判別係数値wは下式の如く表わされる。
However, d i 'is difference value data corresponding to the i-th candidate category. Next, the first candidate probability calculation unit 6
At, the discriminant coefficient value w used for the calculation of the first-rank candidate probability is selected and extracted from the coefficient memory 5 by focusing on the first-rank candidate category C 1 . For example, if the first-ranked candidate category C 1 is “A”, the discrimination coefficient value w is represented by the following equation from FIG.

【0019】[0019]

【数3】w=(w41,w42,・・・,w4m,w4(2I-1) ## EQU3 ## w = (w 41 , w 42 , ..., W 4m , w 4 (2I-1)

【0020】そして、該認識結果の距離値列dと、差分
値作成部4で求めた距離値列d'と、判別係数値wとを
パラメータとした判別関数f(w,d,d')から該認
識結果の1位候補確度を以下のように算出する。
Then, the discriminant function f (w, d, d ') using the distance value sequence d of the recognition result, the distance value sequence d'obtained by the difference value creating section 4, and the discriminant coefficient value w as parameters. Then, the 1st place candidate probability of the recognition result is calculated as follows.

【0021】[0021]

【数4】 1位候補確度 =f(w,d,d') =w41・d1+w42・d2+・・・+w4I・dI' +w4(I+1)・d1'+w4(I+2)・d2'+・・・ +w4(2I-1)・dI-1'[Number 4] # 1 candidate confidence = f (w, d, d ') = w 41 · d 1 + w 42 · d 2 + ··· + w 4I · d I' + w 4 (I + 1) · d 1 ' + w 4 (I + 2)・ d 2 '+ ・ ・ ・ + w 4 (2I-1)・ d I-1 '

【0022】なお、上記のような距離値の引用、差分値
データ列d’の作成は、サンプル的に行っており、他の
実施態様も考えられる。そして、第2の候補確度算出部
7では、上記1位候補確度と距離値列dとから以下のよ
うに2位以降候補確度を算出する。
It should be noted that the quoting of the distance value and the creation of the difference value data string d'as described above are performed on a sample basis, and other embodiments are also conceivable. Then, the second candidate probability calculating section 7 calculates the second and subsequent candidate probabilities as follows from the first-rank candidate probability and the distance value sequence d.

【0023】[0023]

【数5】i位候補確度=1位候補確度×(d1/di) 但し、2≦i≦IEquation 5] i-position candidate confidence = 1-position candidate confidence × (d 1 / d i) where, 2 ≦ i ≦ I

【0024】その後、1〜I位候補確度を個々の候補文
字の確からしさを数量化した値として、後段の後処理に
適用する。
After that, the 1st to Ith candidate probabilities are applied to the post-processing in the subsequent stage as the quantified values of the probabilities of the individual candidate characters.

【0025】上述のように、本実施例では、判別関数か
ら認識結果の確からしさを数量化する際に、各候補カテ
ゴリCiの傾向に対応した判別係数wを用い、さらに、
距離値列d、差分値データ列d'を求め、距離値列全体
の分布形状がわかるように距離値情報を複合的に表現し
たので、従来のようにカテゴリに過度に依存する距離値
情報のみで表現した場合に比べて候補確度の妥当性、安
定性が格段に高まる。これにより、例えば文字認識結果
のうち距離値情報をそのまま使用する文字照合等の後処
理において、距離値情報に代えて上記導出した個々の候
補確度を用いることで、正読率の向上、誤読率の削減が
可能となる。したがって、本実施例の手法を私的機関あ
るいは公共機関でのデータエントリー業務で使用される
文字認識装置に適用した場合、オペレータの修正時間が
短くなり、人手の介在が小さくなる等の利点が生じる。
As described above, in this embodiment, when quantifying the certainty of the recognition result from the discriminant function, the discriminant coefficient w corresponding to the tendency of each candidate category C i is used, and further,
Since the distance value sequence d and the difference value data sequence d ′ are obtained and the distance value information is expressed in a composite manner so that the distribution shape of the entire distance value sequence can be understood, only the distance value information that excessively depends on the category as in the past. The relevance and stability of the candidate accuracy are significantly improved compared to the case expressed by. Thus, for example, in post-processing such as character matching that uses the distance value information as it is in the character recognition result, by using the individual candidate accuracy derived above instead of the distance value information, the correct reading rate is improved and the misreading rate is increased. Can be reduced. Therefore, when the method of the present embodiment is applied to the character recognition device used in the data entry work in a private institution or a public institution, the correction time of the operator is shortened and the manual intervention is reduced. .

【0026】なお、上述の距離値列dの引用、差分値デ
ータ列d’の作成は、サンプル的に行ったものなので、
必ずしも本実施例の内容に限定されず、他の実施内容で
あってよいのは勿論である。
Since the above-mentioned citation of the distance value sequence d and the creation of the difference value data sequence d'are carried out on a sample basis,
It is needless to say that the contents of the present embodiment are not limited to the contents of the present embodiment and may be other contents.

【0027】[0027]

【発明の効果】以上の説明から明かなように、本発明に
よれば、認識対象文字パタンに対する個々の候補カテゴ
リ毎に最適な候補確度が得られる効果がある。また、距
離値情報を複合的に扱って距離値列全体の分布形状を把
握し得るようにしたので、当該候補カテゴリに対して最
も妥当な候補確度の算出が安定的になされる効果があ
る。これにより、カテゴリに依存せずに高精度かつ安定
的な後処理結果を得るための新たな評価値を導出するこ
とができ、従来の問題点を解消することができる。
As is apparent from the above description, according to the present invention, there is an effect that the optimum candidate probability can be obtained for each candidate category for the character pattern to be recognized. Further, since the distance value information is treated in a composite manner so that the distribution shape of the entire distance value sequence can be grasped, there is an effect that the most appropriate candidate probability can be stably calculated for the candidate category. Thereby, a new evaluation value for obtaining a highly accurate and stable post-processing result can be derived without depending on the category, and the conventional problems can be solved.

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明の一実施例に係る文字認識装置のブロッ
ク構成図。
FIG. 1 is a block configuration diagram of a character recognition device according to an embodiment of the present invention.

【図2】本実施例の文字認識装置が備える係数メモリの
内容例の説明図。
FIG. 2 is an explanatory diagram of an example of contents of a coefficient memory included in the character recognition device according to the present embodiment.

【図3】従来の文字認識装置の基本構成図。FIG. 3 is a basic configuration diagram of a conventional character recognition device.

【図4】従来の文字認識装置による文字認識処理に係る
候補順位と距離値との関係を示す実測図。
FIG. 4 is an actual measurement diagram showing a relationship between candidate ranks and distance values related to character recognition processing by a conventional character recognition device.

【符号の簡単な説明】[Brief description of reference numerals]

1 特徴抽出部 2 辞書メモリ 3 識別部 4 差分値作成部 5 係数メモリ 6 第1の候補確度算出部 7 第2の候補確度算出部 1 Feature Extraction Unit 2 Dictionary Memory 3 Discrimination Unit 4 Difference Value Creation Unit 5 Coefficient Memory 6 First Candidate Probability Calculation Unit 7 Second Candidate Probability Calculation Unit

Claims (4)

【特許請求の範囲】[Claims] 【請求項1】 認識対象文字パタンの特徴を表す特徴ベ
クトルと予め準備された文字カテゴリ単位の標準ベクト
ルとの照合を行ってカテゴリ毎の候補距離値を算出し、
該候補距離値が小さい順に上位候補カテゴリとなる文字
カテゴリ群に対するカテゴリ情報と個々の文字カテゴリ
に対応する候補距離値の列からなる距離値情報とを認識
結果として出力するステップを有する文字認識方法にお
いて、 前記距離値情報から隣接候補距離値間の差分値データを
作成するステップと、作成された前記差分値データの列
及び前記距離値情報を各文字カテゴリの正読傾向と誤読
傾向とを区別するための判別係数値で内積して前記認識
結果の最上位(以下、1位)候補カテゴリに対する候補
確度を決定するステップと、を有することを特徴とする
文字認識方法。
1. A candidate distance value for each category is calculated by collating a feature vector representing a feature of a recognition target character pattern with a standard vector prepared in advance for each character category.
In a character recognition method, which includes a step of outputting, as a recognition result, category information for character category groups that are upper candidate categories in ascending order of candidate distance values and distance value information that is a sequence of candidate distance values corresponding to individual character categories. , A step of creating difference value data between adjacent candidate distance values from the distance value information, and distinguishing the created column of the difference value data and the distance value information from the correct reading tendency and the misreading tendency of each character category. For determining the candidate probability for the highest (hereinafter, first) candidate category of the recognition result by inner product with the discriminant coefficient value for the character recognition method.
【請求項2】 請求項1記載の文字認識方法において、 前記決定された1位候補カテゴリの候補確度に、1位候
補距離値とi(i>1)位候補距離値との比を乗じて前
記認識結果のi位候補カテゴリに対する候補確度を導出
するステップ、を有することを特徴とする文字認識方
法。
2. The character recognition method according to claim 1, wherein the candidate probability of the determined first-rank candidate category is multiplied by a ratio between the first-rank candidate distance value and the i (i> 1) -rank candidate distance value. And a step of deriving a candidate probability of the i-th candidate category of the recognition result.
【請求項3】 認識対象文字パタンの特徴を表す特徴ベ
クトルを作成する特徴抽出部、予め文字カテゴリ毎の標
準ベクトルを格納した辞書メモリ、及び、前記特徴ベク
トルと前記辞書メモリに格納された標準ベクトルとを照
合して得た候補距離値の小さい順に上位候補となる文字
カテゴリ群に対するカテゴリ情報と個々の文字カテゴリ
に対応する候補距離値の列からなる距離値情報とを認識
結果として出力する識別部、を備えた文字認識装置にお
いて、 各文字カテゴリの正読傾向と誤読傾向とを区別するため
のカテゴリ毎の判別係数値を格納した係数メモリと、前
記距離値情報から隣接候補距離値間の差分値データを作
成する差分値データ作成手段と、作成された前記差分値
データの列及び前記距離値情報を前記係数メモリ内の該
当文字カテゴリの判別係数値と内積することで前記認識
結果の1位候補カテゴリに対する候補確度を決定する第
1の候補確度決定手段と、を有することを特徴とする文
字認識装置。
3. A feature extraction unit that creates a feature vector representing a feature of a recognition target character pattern, a dictionary memory that stores a standard vector for each character category in advance, and the feature vector and a standard vector stored in the dictionary memory. An identification unit that outputs, as a recognition result, category information for a character category group that is a higher-ranked candidate in ascending order of candidate distance values obtained by collating with and distance value information that is a sequence of candidate distance values corresponding to individual character categories. In a character recognition device including, a coefficient memory storing a discrimination coefficient value for each category for distinguishing a correct reading tendency and a misreading tendency of each character category, and a difference between the adjacent candidate distance values from the distance value information. Difference value data creating means for creating the value data, and the created character string of the difference value data and the distance value information in the corresponding character category in the coefficient memory. Character recognition apparatus comprising: the first candidate reliability determining means for determining a candidate confidence for first candidate category of the recognition result by determining coefficients and dot product of Li, the.
【請求項4】 請求項3記載の文字認識装置において、 前記1位候補確度に1位候補距離値とi(i>1)位候
補距離値との比を乗じて前記認識結果のi位候補文字に
対する候補確度を決定する第2の候補確度決定手段、を
更に有することを特徴とする文字認識装置。
4. The character recognition device according to claim 3, wherein the first-rank candidate probability is multiplied by a ratio of the first-rank candidate distance value and the i (i> 1) -rank candidate distance value to obtain the i-th candidate of the recognition result. A character recognition device further comprising a second candidate probability determination means for determining a candidate probability for a character.
JP7020815A 1995-02-08 1995-02-08 Method and device for recognizing character Pending JPH08212300A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP7020815A JPH08212300A (en) 1995-02-08 1995-02-08 Method and device for recognizing character

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP7020815A JPH08212300A (en) 1995-02-08 1995-02-08 Method and device for recognizing character

Publications (1)

Publication Number Publication Date
JPH08212300A true JPH08212300A (en) 1996-08-20

Family

ID=12037539

Family Applications (1)

Application Number Title Priority Date Filing Date
JP7020815A Pending JPH08212300A (en) 1995-02-08 1995-02-08 Method and device for recognizing character

Country Status (1)

Country Link
JP (1) JPH08212300A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102194126A (en) * 2010-03-09 2011-09-21 索尼公司 Information processing apparatus, information processing method, and program
CN111046802A (en) * 2019-12-11 2020-04-21 北大方正集团有限公司 Evaluation method, device and equipment based on vector character and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102194126A (en) * 2010-03-09 2011-09-21 索尼公司 Information processing apparatus, information processing method, and program
CN111046802A (en) * 2019-12-11 2020-04-21 北大方正集团有限公司 Evaluation method, device and equipment based on vector character and storage medium
CN111046802B (en) * 2019-12-11 2024-01-05 新方正控股发展有限责任公司 Evaluation method, device, equipment and storage medium based on vector words

Similar Documents

Publication Publication Date Title
US7373291B2 (en) Linguistic support for a recognizer of mathematical expressions
Senior A combination fingerprint classifier
EP0847018B1 (en) A data processing method and apparatus for indentifying a classification to which data belongs
KR100249055B1 (en) Character recognition apparatus
US5917941A (en) Character segmentation technique with integrated word search for handwriting recognition
US5982929A (en) Pattern recognition method and system
US7623715B2 (en) Holistic-analytical recognition of handwritten text
US6526396B1 (en) Personal identification method, personal identification apparatus, and recording medium
US8340429B2 (en) Searching document images
KR20010041440A (en) Knowledge-based strategies applied to n-best lists in automatic speech recognition systems
CN112151014B (en) Speech recognition result evaluation method, device, equipment and storage medium
Biswas et al. Writer identification of Bangla handwritings by radon transform projection profile
JPH08212300A (en) Method and device for recognizing character
JP3469375B2 (en) Method for determining certainty of recognition result and character recognition device
US9224040B2 (en) Method for object recognition and describing structure of graphical objects
JPH05314320A (en) Recognition result evaluating system using difference of recognition distance and candidate order
JP2002183667A (en) Character-recognizing device and recording medium
JP3079202B2 (en) Character recognition method and character recognition device
JP2851865B2 (en) Character recognition device
JP2812391B2 (en) Pattern processing method
JP2001243425A (en) On-line character recognition device and method
JP3360030B2 (en) Character recognition device, character recognition method, and recording medium recording character recognition method in program form
JPH06176206A (en) Character recognizing device
JP2000298496A (en) Recognition result discarding method in pattern recognition process and pattern recognition device installing the method
JP3138665B2 (en) Handwritten character recognition method and recording medium