JPH08272904A

JPH08272904A - Pattern recognition device

Info

Publication number: JPH08272904A
Application number: JP7077925A
Authority: JP
Inventors: Shinji Ono; 真司尾野; Shigeko Kaneichi; 滋子金一
Original assignee: Glory Ltd
Current assignee: Glory Ltd
Priority date: 1995-04-03
Filing date: 1995-04-03
Publication date: 1996-10-18

Abstract

PURPOSE: To perform recognition which is highly reliable for various candidate characters by providing a specific dictionary generating means and a specific pattern recognizing means. CONSTITUTION: An image input part reads out an image to be recognized first (S101) and segments characters out of the read image (S102). Then, a character feature extraction part performs feature extraction (S103) and collates the input character feature quantities with a rough classification dictionary (S104). On the basis of the collation results, a feature extracting method matching with each input character is selected out of a detailed classification dictionary (S105), and feature extraction for detailed classification is performed by the feature extracting means (S106). And, the extracted input character feature quantities are collated with the detailed classification dictionary (S107) and the results are outputted (S110). Consequently, even when an object to be recognized and a character feature extracting method are added, recognition and evaluation are only performed by the dictionaries and feature extracting methods after the addition and the maintenance of the dictionary on a dictionary generation side is facilitated to easily perform high-precision recognition.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、パターン認識装置に係
り、特に文字認識率を維持し、誤読率を低減する詳細分
類辞書の自動構築に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a pattern recognition apparatus, and more particularly to automatic construction of a detailed classification dictionary which maintains a character recognition rate and reduces a misreading rate.

【０００２】[0002]

【従来技術】従来、類似した認識候補文字の詳細分類に
おいて単一の文字特徴量を用い、特徴成分毎の分散に基
づいて分類に適した特徴成分を抽出し照合する文字認識
装置が提案されている（特開平１−２３３６７８号）。2. Description of the Related Art Heretofore, a character recognition device has been proposed which uses a single character feature amount in detailed classification of similar recognition candidate characters and extracts and collates a characteristic component suitable for classification based on the variance of each characteristic component. (JP-A-1-233678).

【０００３】[0003]

【発明が解決しようとする課題】このような装置では、
候補文字間ごとに分類に適した特徴量が異なることが多
く、単一の文字特徴量では十分な認識精度を得ることが
できないという問題がある。SUMMARY OF THE INVENTION In such a device,
The feature amount suitable for classification often differs between candidate characters, and there is a problem that sufficient recognition accuracy cannot be obtained with a single character feature amount.

【０００４】本発明は前記実情に鑑みてなされたもの
で、種々の候補文字に対して、高速で信頼性の高い認識
を行うことのできる認識装置を提供することを目的とす
る。The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a recognition device which can recognize various candidate characters at high speed and with high reliability.

【０００５】[0005]

【課題を解決するための手段】本発明の装置の特徴は、
文字パターンに対し、大分類後の詳細分類において、複
数の特徴抽出方法を設定しておき、各特徴抽出方法毎に
分類判定しきい値を算出して、分類精度を評価し、その
分類精度の評価値が最小である特徴抽出方法と判定しき
い値および分類精度の評価値を辞書として格納する辞書
作成手段と、前記大分類結果での類似候補間の分類に適
した特徴抽出方法と判定しきい値を辞書からよみ込み、
詳細分類を行うパターン認識手段とを具備したことにあ
る。The features of the device of the present invention are:
For the character pattern, in the detailed classification after the major classification, a plurality of feature extraction methods are set, the classification determination threshold value is calculated for each feature extraction method, and the classification accuracy is evaluated. A feature extraction method having the smallest evaluation value, a dictionary creating means for storing a determination threshold value and an evaluation value of classification accuracy as a dictionary, and a feature extraction method suitable for classification between similar candidates in the large classification result. Read the threshold value from the dictionary,
It has a pattern recognition means for performing detailed classification.

【０００６】すなわち、本発明は、候補文字毎に詳細分
類に適した特徴量と特徴成分を選択できる辞書を自動構
築するようにしたもので、文字変形パターンを自動生成
し、用意した多種多様な文字特徴量で認識評価し、その
結果候補文字毎に、文字認識率が最も高く誤読率を最も
低く抑えることのできる文字特徴量を１つ選択し、その
文字特徴抽出方法名と判定しきい値および分類精度の評
価値を詳細分類辞書として格納しておく。That is, the present invention is designed to automatically construct a dictionary capable of selecting a characteristic amount and a characteristic component suitable for detailed classification for each candidate character. The character transformation pattern is automatically generated and prepared in various types. The character feature amount is recognized and evaluated, and as a result, one character feature amount having the highest character recognition rate and the lowest misreading rate is selected for each candidate character, and the character feature extraction method name and the determination threshold value are selected. And the evaluation value of the classification accuracy is stored as a detailed classification dictionary.

【０００７】そしてこの辞書を用い、候補文字間毎に適
した特徴量と特徴成分とを選択し詳細分類を行う。Then, using this dictionary, a feature amount and a feature component suitable for each candidate character are selected to perform detailed classification.

【０００８】[0008]

【作用】かかる構成によれば、新規に認識対象や文字特
徴抽出方法が追加されても、追加後の辞書と特徴抽出方
法とで認識評価すれば、その結果新規の詳細分類辞書を
更新することができ、辞書のメンテナンスが容易である
とともに、詳細分類用の新規文字特徴抽出方法の追加が
容易である。With this configuration, even if a new recognition target or character feature extraction method is added, if a dictionary and feature extraction method after addition are recognized and evaluated, the new detailed classification dictionary is updated as a result. It is possible to maintain the dictionary easily, and it is easy to add a new character feature extraction method for detailed classification.

【０００９】また認識に適した特徴抽出方法を判断する
手段として認識結果に基づいた分類精度を適用している
ため、簡単化でき、自動登録化が可能となった。Further, since the classification accuracy based on the recognition result is applied as a means for judging the feature extraction method suitable for recognition, the simplification and the automatic registration are possible.

【００１０】また入力パターンを変形させその変形を考
慮した上で分類精度を評価しているので、文字変形パタ
ーンや未知パターンにも認識精度を向上させることがで
きた。さらにまた２カテゴリのパターン間で、特徴抽
出方法と判定しきい値を決定することで、高い分類精度
の実現が可能となる。Further, since the input pattern is deformed and the classification accuracy is evaluated in consideration of the deformation, the recognition accuracy can be improved even for the character deformed pattern and the unknown pattern. Furthermore, by determining the feature extraction method and the determination threshold value between the patterns of the two categories, high classification accuracy can be realized.

【００１１】なお、複数の文字特徴抽出方法から１つを
選択するのではなく、複数の文字特徴抽出方法で分類
し、それぞれの出力値（距離値）を分類判定に用いるこ
ともできる。Instead of selecting one from a plurality of character feature extraction methods, it is also possible to classify by a plurality of character feature extraction methods and use the respective output values (distance values) for classification determination.

【００１２】[0012]

【実施例】次に、本発明の実施例について図面を参照し
つつ詳細に説明する。Embodiments of the present invention will now be described in detail with reference to the drawings.

【００１３】このパターン認識装置は、まず文字変形パ
ターンを自動生成し、用意した多種多様な文字特徴量で
認識評価し、その結果候補文字毎に、文字認識率が最も
高く誤読率を最も低く抑えることのできる文字特徴量を
１つ選択し、その文字特徴抽出方法名と判定しきい値お
よび分類精度の評価値を詳細分類辞書として格納してお
くようにした辞書作成手段と、この辞書を用い、候補文
字間毎に適した特徴量と特徴成分とを選択し詳細分類を
行うパターン認識手段とを具備したことを特徴とする。This pattern recognition device first automatically generates a character deformation pattern, recognizes and evaluates it with a variety of prepared character feature amounts, and as a result, has a highest character recognition rate for each candidate character and a lowest misreading rate. A dictionary creating means for selecting one possible character feature amount and storing the character feature extracting method name, the judgment threshold value, and the evaluation value of the classification accuracy as a detailed classification dictionary, and this dictionary are used. A pattern recognition means for selecting a feature quantity and a feature component suitable for each candidate character and performing detailed classification is provided.

【００１４】次にこのパターン認識装置を用いた認識処
理を図１に示すフローチャートを参照しつつ説明する。Next, a recognition process using this pattern recognition device will be described with reference to the flow chart shown in FIG.

【００１５】まず、画像入力部で認識しようとする画像
をよみとる（ステップ１０１）。First, the image to be recognized by the image input unit is read (step 101).

【００１６】次に文字切り出し部において、読み取った
画像から、文字の切り出しを行う（ステップ１０２）。Next, in the character cutting section, characters are cut out from the read image (step 102).

【００１７】この後文字特徴抽出部において特徴抽出を
行う（ステップ１０３）。After this, the character feature extraction section performs feature extraction (step 103).

【００１８】そして入力文字特徴量と大分類辞書との照
合を行う（ステップ１０４）。この大分類辞書はあらか
じめステップ１０８で作成されているものとする。Then, the input character feature amount is compared with the large classification dictionary (step 104). It is assumed that this large classification dictionary has been created in step 108 in advance.

【００１９】また詳細分類辞書も同様に、あらかじめス
テップ１０９で作成されているものとする。Similarly, it is assumed that the detailed classification dictionary is also created in advance in step 109.

【００２０】大分類辞書との照合結果に基づき、この入
力文字に適切な特徴抽出方法を詳細分類用辞書から選択
し（ステップ１０５）、この特徴抽出方法に基づいて詳
細分類用の特徴抽出を行う（ステップ１０６）。Based on the comparison result with the large classification dictionary, a feature extraction method suitable for this input character is selected from the detailed classification dictionary (step 105), and the feature extraction for detailed classification is performed based on this feature extraction method. (Step 106).

【００２１】そして適切な方法で特徴抽出のなされた入
力文字特徴量と詳細分類辞書との照合を行い（ステップ
１０７）、結果を出力する（ステップ１１０）。Then, the input character feature amount which has been subjected to feature extraction by an appropriate method is compared with the detailed classification dictionary (step 107), and the result is output (step 110).

【００２２】特徴抽出方法は次の表に示すように、特徴
値として、濃度勾配特徴に強度重みなしとありの２種
類、特徴ベクトルとして部分特徴、全面特徴、上下対称
度特徴３種、左右対称度特徴３種、対角対称度特徴２種
の１０種類、しきい値設定１種で２×１０×１の２０通
りの特徴抽出方法を用いている。As shown in the following table, the feature extraction method has two types of feature values, that is, the density gradient feature has no intensity weighting, and the feature vector has partial features, full features, three types of vertical symmetry features, and bilateral symmetry. 20 types of feature extraction methods of 2 × 10 × 1 are used with 10 types of 3 degree features, 2 types of diagonal symmetry features, and 1 type of threshold setting.

【００２３】この特徴値について説明する。入力画像が２５６階調の
場合、濃度勾配強度値は、１０２４階調となるが、その
濃度勾配強度値を、２値もしくは多値（８レベル）に正
規化した値である（図１０ステップ３０４）特徴ベクト
ルを次に示す。[0023] This characteristic value will be described. When the input image has 256 gradations, the density gradient strength value is 1024 gradations, which is a value obtained by normalizing the density gradient strength value to a binary value or a multi-value (8 levels) (step 304 in FIG. 10). ) The feature vector is shown below.

【００２４】次に大分類辞書作成について図２のフローチャートを参
照しつつ説明する。[0024] Next, the creation of the large classification dictionary will be described with reference to the flowchart of FIG.

【００２５】このフローチャートは前記図１のフローチ
ャートにおける大分類辞書作成ステップ１０８に相当す
る。This flowchart corresponds to the large classification dictionary creating step 108 in the flowchart of FIG.

【００２６】まず、図３に示すように、Ａ〜Ｎのフォン
トに対して０〜９と記号に対するカテゴリの文字データ
を入力する（ステップ２０１）。First, as shown in FIG. 3, character data of categories 0 to 9 and symbols for the fonts A to N is input (step 201).

【００２７】そして図４に例を示すように基準パターン
に対して６種類の変形パターンを発生させる（ステップ
２０２）。Then, as shown in the example of FIG. 4, six types of modified patterns are generated for the reference pattern (step 202).

【００２８】そしてさらに文字特徴抽出（ステップ２０
３）を行い、文字特徴量を格納する（ステップ２０
４）。Then, character feature extraction (step 20)
3) is performed and the character feature amount is stored (step 20).
4).

【００２９】そして全変形パターンの発生が終了したか
否かを判断し（ステップ２０５）、終了していれば文字
特徴量の平均、特徴次元ごとに標準偏差を算出する（ス
テップ２０６）。Then, it is judged whether or not the generation of all the deformation patterns is completed (step 205), and if completed, the average of the character feature amount and the standard deviation are calculated for each feature dimension (step 206).

【００３０】一方、終了していなければ変形パターン発
生ステップ２０２に戻る。On the other hand, if not completed, the process returns to the modified pattern generation step 202.

【００３１】そしてこの入力文字データのフォント、カ
テゴリの平均文字特徴量および特徴次元毎の標準偏差を
大分類辞書に格納する（ステップ２０７）。The font of the input character data, the average character feature amount of the category, and the standard deviation for each feature dimension are stored in the large classification dictionary (step 207).

【００３２】そして全フォント、カテゴリについて処理
が終了したか否かを判断し（ステップ２０８）、終了し
ていればこのフローを終了し、処理が終了していなけれ
ば、最初の文字データ入力ステップ２０１に戻る。Then, it is judged whether or not the processing is completed for all fonts and categories (step 208), and if it is completed, this flow is ended, and if the processing is not completed, the first character data input step 201 is executed. Return to.

【００３３】次に詳細分類辞書の作成について図５を参
照しつつ説明する。まず大分類を行うが、図２で説明し
たように、Ａ〜Ｎのフォントに対して０〜９と記号に対
するカテゴリの文字データを入力する（ステップ４０
１）。Next, the creation of the detailed classification dictionary will be described with reference to FIG. First, major classification is performed. As described with reference to FIG. 2, character data of categories 0 to 9 and symbols for fonts A to N is input (step 40).
1).

【００３４】そして基準パターンに対して変形パターン
を発生させる（ステップ４０２）。そしてさらに文字特
徴抽出（ステップ４０３）を行い、大分類辞書との照合
を行う（ステップ４０４）。Then, a deformed pattern is generated with respect to the reference pattern (step 402). Then, character feature extraction (step 403) is further performed, and collation with the large classification dictionary is performed (step 404).

【００３５】そして、特徴抽出法１〜Ｍを用いた分類
（ステップ４０５₁〜４０５_m）を行い、特徴抽出方法
別に大分類一位、二位の組み合わせ別に正解、エラーの
距離値と距離値の差の頻度分布を集計する（ステップ４
０６）。図６はある一種類の特徴抽出法を用いた場合の
詳細分類一位の絶対距離値と相対距離値（詳細分類一位
と二位の絶対距離値差）の関係を詳細分類一位が正解、
エラー別にプロットしたものである。Then, classification is performed by using the feature extraction methods _{1 to} M (steps 4051 to 405 _m ), and the feature extraction method is classified into the first rank, the correct answer is classified according to the combination of the second rank, the error distance value and the distance value. Summarize the frequency distribution of the differences (Step 4
06). FIG. 6 shows the relationship between the absolute distance value of the first detailed classification and the relative distance value (difference in absolute distance between the first and second detailed classifications) when one kind of feature extraction method is used. ,
It is plotted by error.

【００３６】そして全変形パターンについてこれらの処理を完了した
か否かを判断し（ステップ４０７）、完了していなけれ
ば変形パターンの発生ステップ４０２に戻るが、完了し
ていれば全フォント、カテゴリについて処理が終了した
か否かを判断する（ステップ４０８）。[0036] Then, it is judged whether or not these processes have been completed for all the deformation patterns (step 407), and if not completed, the process returns to the deformation pattern generation step 402, but if completed, the processes are completed for all fonts and categories. It is determined whether or not (step 408).

【００３７】そしてこの判断ステップ４０８で処理が終
了していないと判断されると最初に戻り、終了している
と判断されると、しきい値関数の算出を行う（ステップ
４０９）。ここでは特徴抽出方法別詳細分類一位、二位
の組み合わせ別の正解、エラー距離値頻度分布（図６）
から次に示すマハラノビス距離を用いてしきい値関数を
算出する。If it is judged in the judgment step 408 that the processing is not completed, the procedure returns to the beginning, and if it is judged that the processing is completed, the threshold function is calculated (step 409). Detailed classification by feature extraction method Here, correct answers for each combination of first and second positions, error distance value frequency distribution (Fig. 6)
Then, the threshold function is calculated using the Mahalanobis distance shown below.

【００３８】しきい値関数は正解の集団とエラーの集団の間でマハラ
ノビス距離が等しい点の軌跡、すなわち両群の境界を求
め、その点列を一時回帰した結果をしきい値関数とす
る。しきい値関数（相対距離）−ａ×（絶対距離）−ｂ＝０このようにして求めたしきい値関数の値が０以下の場
合、リジェクトとする。[0038] For the threshold function, the locus of points having the same Mahalanobis distance between the correct answer group and the error group, that is, the boundary between both groups is obtained, and the result of temporary regression of the point sequence is used as the threshold function. Threshold function (relative distance) -a * (absolute distance) -b = 0 When the value of the threshold function thus obtained is 0 or less, it is rejected.

【００３９】図７はしきい値関数決定の説明図である。
そして分類精度の評価値Ｖを算出し、この値に基づいて
特徴抽出方法を決定する。（ステップ４１０）特徴抽出方法の決定に際しては、まず大分類一位の参照
テーブルと二位の参照テーブルの間で、特徴抽出方法毎
に分類精度の評価値Ｖを下式で求め、最小となる方法に
決定する。FIG. 7 is an explanatory diagram for determining the threshold function.
Then, the evaluation value V of classification accuracy is calculated, and the feature extraction method is determined based on this value. (Step 410) When determining the feature extraction method, first, the evaluation value V of the classification accuracy is calculated by the following formula between the reference table of the first rank in the large classification and the reference table of the second rank, and the value is minimized. Decide on the method.

【００４０】ここでは、評価値Ｖ＝α×Ｅ＋Ｒ（Ｅ：エラー率、Ｒ：リジェクト率、α：定数）最終的な詳細分類辞書は、大分類一位候補の参照テーブ
ルと二位候補の字種の組み合わせ（大分類参照テーブル
数×字種数）毎に認識精度の良い１種の特徴抽出方法が
格納されることになる（ステップ４１１）。Here, the evaluation value V = α × E + R (E: error rate, R: reject rate, α: constant) The final detailed classification dictionary is a reference table of major classification first-ranked candidates and letters of second-ranked candidates. One kind of feature extraction method with high recognition accuracy is stored for each kind combination (number of large classification reference tables × number of character types) (step 411).

【００４１】ここではαの値は１０とし、エラー率、リ
ジェクト率の算出方法を説明する。大分類一位候補がフ
ォントＡ，カテゴリ０と大分類２位候補がカテゴリ８の
組み合わせで、特徴抽出方法は重みなしの特徴値、全面
特徴を特徴ベクトルに用いた場合の例を図８に示す。Here, the value of α is 10, and the method of calculating the error rate and the reject rate will be described. FIG. 8 shows an example in which the major classification first-ranked candidate is a combination of font A, category 0 and the major classification second-ranked candidate is category 8, and the feature extraction method uses feature values without weighting and full-features as feature vectors. .

【００４２】ここでは図５に従い集計された正解、エラ
ーの集団からマハラノビス距離を用いて算出されたしき
い値関数（相対距離）−（０．２４６０４）×（絶対距離）−（１５８．６４）＝０に入力パターン毎に相対絶対距離値を代入する。その値
が、負となる入力パターンの頻度をリジェクト率Ｒと
し、エラー集団（詳細分類一位がエラーとなる入力パタ
ーン）でしきい値関数の値が正となる頻度をエラー率Ｅ
とする。Here, the threshold function calculated using the Mahalanobis distance from the group of correct answers and errors collected according to FIG. 5 (relative distance)-(0.24604) × (absolute distance)-(158.64) Substitute the relative absolute distance value for each input pattern into = 0. The frequency of the input pattern with a negative value is the reject rate R, and the frequency with which the value of the threshold function is positive in the error group (the input pattern in which the first rank of the detailed classification is an error) is the error rate E.
And

【００４３】次にこの方法を用いた処理例について説明
する。ここでは大分類一位候補と一位とは異なるカテゴ
リのものが二位候補であるときの処理例である。まず図
９に示すように文字パターンについて大分類を行い、上
位Ｍ位例えば二位までの候補を選出する（ステップ５０
１）。Next, an example of processing using this method will be described. Here is an example of the processing when the large category first place candidate and the category different from the first place are second place candidates. First, as shown in FIG. 9, the character patterns are roughly classified, and candidates for the upper M ranks, for example, the 2nd ranks are selected (step 50).
1).

【００４４】そして詳細分類辞書の参照を行い、大分類
一位候補Ｃ１と二位候補Ｃ２の組み合わせについて特徴
抽出方法、特徴量、しきい値関数を詳細分類辞書から入
力する（ステップ５０２）。Then, the detailed classification dictionary is referred to, and the feature extraction method, the feature amount, and the threshold function for the combination of the large classification first rank candidate C1 and the second rank candidate C2 are input from the detailed classification dictionary (step 502).

【００４５】そして入力文字の詳細分類特徴抽出を行う
（ステップ５０３）。これについては後で詳細に説明す
る。Then, the detailed classification feature of the input character is extracted (step 503). This will be described in detail later.

【００４６】この後、大分類一位、二位のそれぞれの詳
細分類用特徴量と、入力文字の詳細分類用特徴量との間
で距離計算を行い、最小距離値Ｄ（絶対距離）および一
位と二位の差Ｂ（相対距離）を算出する（ステップ５０
４）。Thereafter, distance calculation is performed between the detailed classification feature amounts of the first and second major classifications and the detailed classification feature amount of the input character, and the minimum distance value D (absolute distance) and the first distance value are calculated. The difference B (relative distance) between the second place and the second place is calculated (step 50).
4).

【００４７】そして前述したしきい値関数に最小距離値
Ｄ（相対距離）と距離値の差Ｂ（相対距離）を代入し
（ステップ５０５）、しきい値関数の出力値が０よりも
大であるか否かの判断を行い（ステップ５０６）、０よ
りも大であるときアクセプト（ステップ５０７）とす
る。一方０よりも小さいときリジェクト（ステップ５０
８）とする。Then, the minimum distance value D (relative distance) and the difference B (relative distance) between the distance values are substituted into the above-mentioned threshold function (step 505), and the output value of the threshold function is larger than 0. It is determined whether there is any (step 506), and when it is greater than 0, it is accepted (step 507). On the other hand, if it is smaller than 0, reject (step 50
8).

【００４８】ここで入力文字の文字特徴抽出過程につい
て説明する。まず、図１０に示すように、画像入力部で
認識しようとする画像をよみとる（ステップ３０１）。Here, the process of extracting the character feature of the input character will be described. First, as shown in FIG. 10, the image to be recognized by the image input unit is read (step 301).

【００４９】次に読み取った画像の平滑化処理（ステッ
プ３０２）を行った後エッジ抽出オペレータ３０３にか
け注目画素の濃度勾配強度を検出し、８方向の中で最大
強度となる方向を注目画素の濃度勾配方向とし、その強
度値を濃度勾配強度値とする。そして濃度勾配強度分
布から特徴量を決定するためのしきい値を設定し（ステ
ップ３０４）、パターンをＭ×Ｎの領域に分割し（ステ
ップ３０５）、領域ごとに特徴値を集計し（ステップ３
０６）、重みなしの場合とありの場合とに対し特徴量を
算出する（ステップ３０７，３０８）。Next, after smoothing the read image (step 302), the edge extraction operator 303 is applied to detect the density gradient strength of the pixel of interest, and the density of the pixel of interest is determined in the direction of maximum intensity among the eight directions. The gradient direction is set, and the intensity value is set as the concentration gradient intensity value. Then, a threshold value for determining the feature amount from the density gradient intensity distribution is set (step 304), the pattern is divided into M × N regions (step 305), and the feature values are totaled for each region (step 3).
06), the feature amount is calculated for the case without weight and the case with weight (steps 307 and 308).

【００５０】なお、図９に示した処理例では大分類一位
候補と二位候補（一位とは異カテゴリ）間の処理につい
て説明したが、適用する候補の範囲を拡張した場合につ
いて説明する。In the processing example shown in FIG. 9, the processing between the large-class first-ranked candidate and the second-ranked candidate (a category different from the first-ranked candidate) has been described, but a case where the range of the candidate to be applied is expanded will be described. .

【００５１】まず大分類一位と同一カテゴリの候補文字
と、一位とは異なるカテゴリ候補文字の全組み合わせに
ついて、詳細分類処理１（図９）を適用する。ただし、
しきい値関数での判定すなわちリジェクトは設定しな
い。First, the detailed classification process 1 (FIG. 9) is applied to all combinations of candidate characters in the same category as the first major classification and category candidate characters different from the first major classification. However,
Judgment by the threshold function, that is, rejection is not set.

【００５２】異カテゴリの候補文字との詳細分類処理に
全勝する候補文字、すなわち比較するすべての候補より
も小さい距離値となる候補文字があれば、その候補文字
のカテゴリを出力し、なければリジェクトする。If there is a candidate character that wins all detailed classification processing with a candidate character of a different category, that is, a candidate character having a distance value smaller than all the candidates to be compared, the category of the candidate character is output, and if there is no reject character, it is rejected. To do.

【００５３】例えば大分類での候補カテゴリが３種類現
れるまでの順位までの候補について処理を適用した例を
示す。次表に示すように大分類一位「フォントＡのカテ
ゴリ０」と同一カテゴリの候補文字である「フォントＢ
のカテゴリ０、フォントＤのカテゴリ０、フォントＦの
カテゴリ０」それぞれと、大分類一位と異なるカテゴリ
の候補文字である「フォントＣのカテゴリ６、フォント
Ｅのカテゴリ６、フォントＧのカテゴリ８」の全組み合
わせについて、図９に示した詳細分類処理を実施する。For example, an example is shown in which the processing is applied to candidates up to the rank of three types of candidate categories in the large classification. As shown in the table below, "Font B," which is a candidate character in the same category as "Category 0 of font A", which is ranked first in the major categories.
Category 0, font D category 0, font F category 0 "and" C category 6 of font C, category 6 of font E, category 8 of font G "which are candidate characters of categories different from the first place in the major classification. The detailed classification process shown in FIG. 9 is performed for all combinations of.

【００５４】この表では詳細分類特徴での距離値が、比較する候補文
字よりも小さい場合「勝ち○」と表現し、大きい場合、
「負け×」と表現している。大分類二位の「フォントＢ
のカテゴリ０」が詳細分類特徴での距離値が、すべての
異カテゴリ候補文字よりも小さな距離値となり全勝した
ことを示している。よって最終候補文字はフォントＢの
カテゴリ０で出力する。[0054] In this table, if the distance value in the detailed classification feature is smaller than the candidate character to be compared, it is expressed as "winning ○", and if it is larger,
It is expressed as "Loss x". "Font B", which is the second largest category
“Category 0” indicates that the distance value in the detailed classification feature is smaller than all the different category candidate characters and that all the wins have been achieved. Therefore, the final candidate character is output in the font 0 category 0.

【００５５】また他の処理例について説明する。まず大
分類ｎ位までの候補文字間で総当たりの詳細分類処理１
（図９）を適用する。ただし、しきい値関数での判定す
なわちリジェクトは設定しない。Another processing example will be described. First, brute force detailed classification processing between candidate characters up to the nth major classification 1
(Figure 9) applies. However, the judgment by the threshold function, that is, the rejection is not set.

【００５６】前記例と同様に異カテゴリの候補文字との
詳細分類処理に全勝する候補文字、すなわち比較するす
べての候補よりも小さい距離値となる候補文字があれ
ば、その候補文字のカテゴリを出力し、なければリジェ
クトする。全勝するカテゴリが複数発生した場合リジェ
クトする。Similar to the above example, if there is a candidate character that wins all the detailed classification processing with candidate characters of different categories, that is, a candidate character having a distance value smaller than all the candidates to be compared, the category of the candidate character is output. If not, reject it. Reject if multiple categories that win all games occur.

【００５７】例えば大分類の上位５位までの候補文字に
ついて処理を適用した例を示す。この表では詳細分類特徴での距離値が、比較する候補文
字よりも小さい場合「勝ち○」と表現し、大きい場合、
「負け×」と表現している。大分類二位の「フォントＢ
のカテゴリ０」が詳細分類特徴での距離値が、すべての
異カテゴリ候補文字よりも小さな距離値となり全勝した
ことを示している。よって最終候補文字はフォントＢの
カテゴリ０で出力する。For example, an example is shown in which the processing is applied to the top 5 candidate characters in the major classification. In this table, if the distance value in the detailed classification feature is smaller than the candidate character to be compared, it is expressed as "winning ○", and if it is larger,
It is expressed as "Loss x". "Font B", which is the second largest category
“Category 0” indicates that the distance value in the detailed classification feature is smaller than all the different category candidate characters and that all the wins have been achieved. Therefore, the final candidate character is output in the font 0 category 0.

【００５８】この方法によれば、大分類結果の組み合わ
せごとに、分類に適した特徴抽出方法を用いて判定する
ことができるため、全体の認識精度の向上をはかること
が可能となる。According to this method, since it is possible to make a determination for each combination of large classification results using a feature extraction method suitable for classification, it is possible to improve the overall recognition accuracy.

【００５９】例えば図１１(a) に示すように部分的な汚
れをもつ０が入力されたとき、図１１(b) に示すように
特徴値重みなしの文字特徴を用いた場合大分類での距離
値は“８”＜“０”で出力結果はエラーであるが、図１
１(c) のように特徴値重みありの部分特徴を用いた詳細
分類での距離値が“８”＞“０”となり出力結果は正解
もしくはリジェクトとなる。For example, when 0 having a partial stain is input as shown in FIG. 11A, when character features without feature value weighting are used as shown in FIG. The distance value is "8"<"0" and the output result is an error.
The distance value in the detailed classification using the partial features with feature value weighting as in 1 (c) is “8”> “0”, and the output result is the correct answer or the reject.

【００６０】大分類一位がカテゴリ８（フォントｉ）
で、二位がカテゴリ０の組み合わせでの詳細分類辞書作
成処理（図５）の結果を次表に示す。The first major category is category 8 (font i)
The following table shows the result of the detailed classification dictionary creation process (FIG. 5) when the second rank is the category 0 combination.

【００６１】評価値Ｖが最小、つまりリジェクト、エラー率が小さく
なる特徴抽出方法は、特徴値重みあり、部分特徴であっ
た。詳細分類辞書には、・特徴抽出方法および特徴量・しきい値関数および評価値Ｖ＝５．０が格納される。[0061] The feature extraction method in which the evaluation value V is the minimum, that is, the reject and error rates are small, is the feature value weighting and the partial feature. The detailed classification dictionary stores: -feature extraction method and feature amount-threshold function and evaluation value V = 5.0.

【００６２】実際に、大分類一位がカテゴリ８（フォン
トｉ）で二位がカテゴリ０の組み合わせについて、大分
類の出力結果でリジェクト、エラーが発生しやすい入力
パターンの内容を確認すると、図１１のように０の中央
が部分的に汚れていたり、背景絵柄があるという傾向が
あった。この傾向には汚れや背景絵柄の不安定な情報を
曖昧さで表現できれば分類が可能になると考えられる。Actually, for the combination of the category 8 (font i) in the first major category and the category 0 in the second major category, the contents of the input patterns that are likely to cause rejects and errors in the output results of the major categories are as shown in FIG. There was a tendency that the center of 0 was partially soiled or there was a background pattern. It is considered that this tendency can be classified if stains and unstable information of the background picture can be represented by ambiguity.

【００６３】この場合、かすれ、汚れなど不安定な特徴
値を多値で表現し、あいまいさを分類判定に用いる特徴
抽出方法が、自動的に選択されているので、認識制度が
向上する。In this case, since the feature extraction method in which unstable feature values such as faintness and dirt are expressed in multivalues and the ambiguity is used for classification determination is automatically selected, the recognition system is improved.

【００６４】また図１２(a) に示すように入力カテゴリ
が３であるとき、大分類１位の結果はフォントｊのカテ
ゴリ８となって大分類結果はエラーであるが詳細分類に
よって正解となり認識精度の向上をはかることができ
る。As shown in FIG. 12 (a), when the input category is 3, the result of the first major classification is category 8 of font j, and the major classification result is an error, but the correct answer is recognized by the detailed classification. The accuracy can be improved.

【００６５】すなわちこのとき、図１２(b) に示すよう
に特徴値重みなしの文字特徴を用いた大分類での距離値
は“８”＜“３”で出力結果はエラーであるが、図１２
(c)のように特徴値重みありの左右対称度特徴２では詳
細分類での距離値が“８”＞“３”となり出力結果は正
解となる。That is, at this time, as shown in FIG. 12B, the distance value in the large classification using the character features without feature value weighting is "8"<"3", and the output result is an error. 12
As shown in (c), in the left-right symmetry feature 2 with feature value weighting, the distance value in the detailed classification is “8”> “3”, and the output result is the correct answer.

【００６６】また大分類一位がカテゴリ８である場合
（フォントｊ）で、二位がカテゴリ３の組み合わせでの
詳細分類辞書作成処理（図５）の結果を次表に示す。The following table shows the result of the detailed classification dictionary creation processing (FIG. 5) in the case where the first major classification is category 8 (font j) and the second largest classification is a combination of categories 3.

【００６７】評価値Ｖが最小、つまりリジェクト、エラー率が小さく
なる特徴抽出方法は、特徴値重みあり、左右対称度特徴
２であった。詳細分類辞書には、・特徴抽出方法（特徴値重みあり、左右対称度特徴２）
および特徴量・しきい値関数および評価値Ｖ＝１８．２が格納され
る。[0067] The feature extraction method in which the evaluation value V is the minimum, that is, the reject and the error rate are small is the feature value weighting and the symmetry degree feature 2. The detailed classification dictionary includes: -feature extraction method (feature value weighting, left / right symmetry feature 2)
And feature amount-The threshold value function and the evaluation value V = 18.2 are stored.

【００６８】実際に、大分類一位がカテゴリ８（フォン
トｊ）で二位がカテゴリ３の組み合わせについて、大分
類の出力結果で，リジェクト、エラーが発生しやすい入
力パターンの内容を確認すると、図１２のように、８に
類似した３という傾向があった。Actually, for the combination of category 8 (font j) in the first major category and category 3 in the second major category, the output result of the major category confirms the contents of the input pattern that is likely to cause rejection or error. Like 12, there was a tendency for 3 to be similar to 8.

【００６９】大分類一位がカテゴリ８（フォントｊ）で
二位がカテゴリ３という結果を得た場合、候補カテゴリ
８と３との差異が大きい文字左中央部分の特徴に注目す
ればよいと思われるが、ずれなどの変動を考慮している
本手法によって文字中央部の左右対称性が表現される特
徴抽出方法の方が認識精度を向上させるために適してい
ることが明確になる。When the result that the first major category is category 8 (font j) and the second major category is category 3 is to focus on the feature of the character left center part where the difference between candidate categories 8 and 3 is large. However, it is clarified that the feature extraction method that expresses the left-right symmetry of the central part of the character is more suitable for improving the recognition accuracy by the method that considers the variation such as the shift.

【００７０】また大分類一位が、フォントｋのカテゴリ
５の場合、図１３(a) に示すように入力カテゴリが６
（かすれ、文字中心の位置ずれがある）であるとき、大
分類結果はエラーであるが詳細分類によって正解となり
認識精度の向上をはかることができる。When the first major classification is category 5 of font k, the input category is 6 as shown in FIG. 13 (a).
If there is (blurring or misalignment of the character center), the result of the large classification is an error, but a correct answer is obtained by the detailed classification, and the recognition accuracy can be improved.

【００７１】すなわちこのとき、図１３(b) に示すよう
に特徴値重みなしの文字特徴を用いた大分類での距離値
は“５”＜“６”で出力結果はエラーであるが、図１３
(c)に示すように、特徴値重みありの対角対称度特徴１
では詳細分類での距離値が“６”＞“５”となり出力結
果は正解もしくはリジェクトとなる。That is, at this time, as shown in FIG. 13 (b), the distance value in the large classification using the character features without feature value weighting is "5"<"6", and the output result is an error. Thirteen
As shown in (c), diagonal symmetry feature 1 with feature value weighting
Then, the distance value in the detailed classification becomes “6”> “5”, and the output result becomes the correct answer or the reject.

【００７２】また大分類一位がカテゴリ５である場合
（フォントｋ）で、二位がカテゴリ６の組み合わせでの
詳細分類辞書作成処理（図５）の結果を次表に示す。The following table shows the result of the detailed classification dictionary creation process (FIG. 5) in the case where the first major classification is category 5 (font k) and the second largest classification is a combination of category 6.

【００７３】評価値Ｖが最小、つまりリジェクト、エラー率が小さく
なる特徴抽出方法は、特徴値重みあり、対角対称度特徴
１であった。詳細分類辞書には、・特徴抽出方法（特徴値重みあり、対角対称度特徴１）
および特徴量・しきい値関数および評価値Ｖ＝２３．２が格納され
る。[0073] The feature extracting method with the smallest evaluation value V, that is, the reject and the error rate being small, was the feature 1 with the feature value weighting. The detailed classification dictionary includes: -Feature extraction method (feature value weighting, diagonal symmetry feature 1)
And feature amount-The threshold value function and the evaluation value V = 23.2 are stored.

【００７４】実際に、大分類一位がカテゴリ５（フォン
トｋ）で二位がカテゴリ６の組み合わせについて、大分
類の出力結果でリジェクト、エラーが発生しやすい入力
パターンの内容を確認すると、図１３のように６の左部
分がかすれ、文字中心が位置ずれしているという傾向が
あった。Actually, for the combination of category 5 (font k) in the first major category and category 6 in the second major category, when the contents of the input pattern in which rejection and error are likely to occur are confirmed in the output result of the major category, FIG. As shown in FIG. 6, the left part of 6 was faint, and the center of the character was displaced.

【００７５】大分類一位がカテゴリ５（フォントｋ）で
二位がカテゴリ６という結果を得た場合、候補カテゴリ
５と６との差異が大きい文字左部分の特徴（重みなし、
部分特徴）で比較すると、入力パターンは左上部はカテ
ゴリ６に類似し、左下部はカテゴリ５に類似する結果と
なり、判定不可能となる。しかし、本発明の方法で自動
的に選択された左上部と右下部の対角対称度特徴１（図
１３）を用いるとカテゴリ５は対称性がなく、カテゴリ
６は対称性があるという特色を用いることになり判定可
能となる。When the result that the first major category is category 5 (font k) and the second major category is category 6 is, the features of the left part of the character with a large difference between the candidate categories 5 and 6 (no weight,
Comparing with (partial feature), the input pattern has a result that the upper left part is similar to category 6 and the lower left part is similar to category 5, and it is impossible to determine. However, using the upper left and lower right diagonal symmetry features 1 (FIG. 13) that are automatically selected by the method of the present invention, the characteristic is that category 5 has no symmetry and category 6 has symmetry. It will be used and can be judged.

【００７６】なお複数の文字特徴抽出方法で分類し、そ
れぞれの出力値を分類判定に用いることも可能である。It is also possible to classify by a plurality of character feature extraction methods and use the respective output values for classification determination.

【００７７】[0077]

【発明の効果】以上説明してきたように、本発明によれ
ば、認識対象や文字特徴抽出方法が追加されても、追加
後の辞書と特徴抽出方法で認識評価すればよく、辞書作
成側では辞書のメンテナンスが容易であり、容易に高精
度の認識を行うことが可能となる。As described above, according to the present invention, even if a recognition target or a character feature extraction method is added, recognition and evaluation can be performed using the dictionary and feature extraction method after the addition. The maintenance of the dictionary is easy, and it is possible to easily perform highly accurate recognition.

[Brief description of drawings]

【図１】本発明の方法を示す説明図FIG. 1 is an explanatory diagram showing a method of the present invention.

【図２】本発明による大分類辞書の作成フローチャート
図FIG. 2 is a flowchart of creating a large classification dictionary according to the present invention.

【図３】入力する文字データの一例を示す図FIG. 3 is a diagram showing an example of input character data.

【図４】変形パターンの発生例を示すFIG. 4 shows an example of generation of a deformation pattern.

【図５】本発明による詳細分類辞書の作成フローチャー
ト図FIG. 5 is a flowchart of creating a detailed classification dictionary according to the present invention.

【図６】正解の集団とエラーの集団の距離値頻度分布を
示す図FIG. 6 is a diagram showing a distance value frequency distribution of a correct answer group and an error group.

【図７】しきい値関数決定の説明図FIG. 7 is an explanatory diagram of determination of a threshold function.

【図８】しきい値関数決定の一例を示す図FIG. 8 is a diagram showing an example of determination of a threshold function.

【図９】詳細分類処理例を示すフローチャートを示す図FIG. 9 is a diagram showing a flowchart showing an example of detailed classification processing.

【図１０】文字特徴抽出のフローチャートを示す図FIG. 10 is a diagram showing a flowchart of character feature extraction.

【図１１】本発明の装置を用いた認識処理を示す図FIG. 11 is a diagram showing a recognition process using the device of the present invention.

【図１２】本発明の装置を用いた認識処理を示す図FIG. 12 is a diagram showing a recognition process using the device of the present invention.

【図１３】本発明の装置を用いた認識処理を示す図FIG. 13 is a diagram showing a recognition process using the device of the present invention.

Claims

[Claims]

1. A plurality of feature extraction methods are set in the detailed classification after the major classification for a character pattern, and a classification determination threshold value is calculated for each feature extraction method to evaluate the classification accuracy. A feature extraction method having the smallest evaluation value of the classification accuracy, a dictionary creating means for storing the judgment threshold value and the evaluation value of the classification accuracy as a dictionary, and a feature suitable for classification between similar candidates in the large classification result. A pattern recognition device comprising: a pattern recognition means for performing detailed classification by reading an extraction method and a judgment threshold value from a dictionary.