JPH07111732B2

JPH07111732B2 - Dictionary creation device for character and figure recognition

Info

Publication number: JPH07111732B2
Application number: JP62192926A
Authority: JP
Inventors: 穂高倉; 裕文曽根
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1987-07-31
Filing date: 1987-07-31
Publication date: 1995-11-29
Anticipated expiration: 2010-11-29
Also published as: JPS6436388A

Description

【発明の詳細な説明】産業上の利用分野本発明は，文字図形認識に関するものであり，特に、入
力された文字図形データから認識用特徴を抽出し、抽出
した特徴と辞書に格納した複数の文字図形データの標準
特徴とを比較し、特徴間の一致度の高い標準特徴を持つ
文字図形データを認識結果とする文字図形認識装置用の
辞書の作成方式に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to character / graphic recognition, and in particular, a recognition feature is extracted from input character / graphic data, and the extracted features and a plurality of stored features are stored in a dictionary. The present invention relates to a method of creating a dictionary for a character / graphics recognizing device which compares standard characteristics of character / graphics data with character / graphics data having a standard characteristic having a high degree of coincidence between the characteristics as a recognition result.

従来の技術文字図形認識方式の１つに，認識対象の文字図形から幾
つかの特徴を抽出し，その特徴の組（特徴ベクタあるい
は単に特徴と呼ぶ）を，辞書に格納されている複数の基
準パターンの特徴ベクタと比較し，最も特徴ベクタの類
似した基準パターンを認識結果とする方式がある。2. Description of the Related Art One of the conventional character / graphic recognition methods is to extract some features from a character / graphic to be recognized, and to combine a set of features (feature vectors or simply features) into multiple criteria stored in a dictionary. There is a method in which the reference pattern that has the most similar feature vector is used as the recognition result as compared with the feature vector of the pattern.

しかし，同一文字図形であっても，書く人により，ある
いは，書くたびに図形に変動がある手書き文字図形認識
や，複数の字形を許すマルチフォント文字認識用の辞書
を作成する場合には，単純に基準パターンを設定するこ
とができな。そのため，一般に，以下のようにして，各
文字図形の種類（カテゴリ）毎に基準となる特徴ベクタ
を決定し，辞書を作成する。すなわち，同一カテゴリの
複数のパターン（例えば，何人かの人により手書きされ
た同一文字図形）から特徴ベクタを抽出し，抽出された
特徴ベクタの平均を，そのカテゴリの基準特徴ベクタと
し，それの辞書データとして辞書に格納する。なお，辞
書作成に用いたパターンを，学習データと呼ぶ。However, even if it is the same character figure, it is simple to create a dictionary for handwritten character figure recognition in which the figure varies depending on the writer or each time it is written and a multi-font character recognition that allows multiple glyphs. No reference pattern can be set for. Therefore, in general, a reference feature vector is determined for each type (category) of each character graphic as follows, and a dictionary is created. That is, a feature vector is extracted from a plurality of patterns in the same category (for example, the same character pattern handwritten by several people), and the average of the extracted feature vectors is used as the reference feature vector for that category, and the dictionary of the feature vector is extracted. Store in the dictionary as data. The pattern used to create the dictionary is called learning data.

２つのカテゴリCa,Cbに属する学習データA1,A2,……,Am
（以上Caに属する）とB1,B2,……,Bn（以上、Cbに属す
る）のそれぞれについて求めた２つの特徴（f1,f2）が,
f1,f2で張られる特徴空間で，第２図に示す分布を示す
場合について説明する。なお，第２図において,Caに属
するデータの特徴ベクタは，黒丸点で,Cbに属するデー
タの特徴ベクタは，白三角点で表している。また，実線
は，両カテゴリ領域の境界を示している。Learning data A1, A2, ..., Am belonging to two categories Ca and Cb
Two features (f1, f2) obtained for each of (belonging to Ca) and B1, B2, ..., Bn (belonging to Cb) are
A case where the feature space defined by f1 and f2 exhibits the distribution shown in FIG. 2 will be described. In FIG. 2, the characteristic vector of the data belonging to Ca is represented by a black dot, and the characteristic vector of the data belonging to Cb is represented by a white triangular point. The solid line indicates the boundary between both category areas.

ここでは，説明のため，特徴ベクタが，（f1,f2）の２
特徴から構成されているものとするが，一般には，特徴
は２つに限らず，さらに多くの特徴が用いられる。Here, for the sake of explanation, the feature vector is 2 with (f1, f2).
However, in general, more features are used instead of two features.

上述の方式により求められる各カテゴリの基準特徴ベク
タは，図中，×点Ao,Boで表される。The reference feature vector of each category obtained by the above method is represented by × points Ao and Bo in the figure.

この時,B3の特徴ベクタは,B3がカテゴリCbに属するにも
かかわらず,BoよりもAoに近い，すなわち,BoよりもAoに
類似している。従って,Ao,Boを辞書データとして認識を
行えば,B3がCaに誤認識されることになる。At this time, the feature vector of B3 is closer to Ao than Bo, that is, more similar to Ao than Bo, even though B3 belongs to category Cb. Therefore, if Ao and Bo are recognized as dictionary data, B3 will be erroneously recognized as Ca.

このような場合,Cbのようなカテゴリを，図中，点線に
より示した２つのグループCb1,Cb2に分け，それぞれの
グループに属する学習データから平均特徴ベクタB1o,B2
oを求め，両者ともカテゴリCbの辞書データとして辞書
に登録しておけば，誤認識を防ぐことができる。In such a case, a category such as Cb is divided into two groups Cb1 and Cb2 shown by dotted lines in the figure, and average feature vectors B1o and B2 are obtained from the learning data belonging to each group.
If o is obtained and both are registered in the dictionary as category Cb dictionary data, misrecognition can be prevented.

従って，カテゴリを複数のグループに分け，カテゴリの
辞書データを増やすことにより，より高い認識率を与え
る，手書き文字認識用辞書を作成することができる。Therefore, by dividing the categories into a plurality of groups and increasing the dictionary data of the categories, it is possible to create a dictionary for handwritten character recognition that gives a higher recognition rate.

この方式により辞書を作成する場合，一般に，カテゴリ
毎のグループ数，すなわち，辞書データ数が多い程，そ
の辞書により認識率は高くなる。When a dictionary is created by this method, generally, the larger the number of groups in each category, that is, the larger the number of dictionary data, the higher the recognition rate of the dictionary.

しかし，作成された辞書を用いて認識を行う時には，認
識対象パターンの特徴ベクタと辞書に格納された全辞書
データとの比較（距離計算）を行うため，辞書に格納さ
れた辞書データの総数が多いと，認識に要する時間もそ
れにともなって大きくなる。従って，認識時間を考慮す
ると,1カテゴリの辞書データは少ない程良い。However, when performing recognition using the created dictionary, the feature vector of the recognition target pattern is compared (distance calculation) with all the dictionary data stored in the dictionary, so the total number of dictionary data stored in the dictionary is When the number is large, the time required for recognition increases accordingly. Therefore, considering the recognition time, the smaller the dictionary data for one category, the better.

以上２つの要件の下，高い認識率を実現できる最少のグ
ループ数が，最適グループ数となる。なお，一般に，各
カテゴリ毎に学習データの特徴分布の様子が異なるた
め，当然，各カテゴリの最適グループ数は異なる。Under the above two requirements, the minimum number of groups that can achieve a high recognition rate is the optimum number of groups. Note that, in general, the feature distribution state of the learning data is different for each category, and therefore, the optimal number of groups is naturally different for each category.

あるカテゴリの最適グループ数は，そのカテゴリに属す
る学習データの特徴は分布だけではなく，特徴空間上隣
接して分布する他のカテゴリのデータの分布との関係に
より決まるものであるため，あるカテゴリの最適グルー
プ数を決定するためには，全カテゴリの学習データの特
徴の分布を把握する必要がある。一般に，辞書作成に
は，膨大な数の学習データを使用するため，それら総て
について，相互の関係を把握し，各カテゴリの最適グル
ープ数を決定する作業は，膨大な計算量が必要であり，
実現が不可能であった。The optimal number of groups in a certain category is determined not only by the distribution of the characteristics of the learning data belonging to that category but also by the relationship with the distribution of the data of other categories that are adjacently distributed in the feature space. In order to determine the optimum number of groups, it is necessary to understand the distribution of the characteristics of the learning data in all categories. In general, a huge amount of learning data is used to create a dictionary, so the work of grasping the mutual relationships and determining the optimal number of groups for each category requires a huge amount of calculation. ，
It was impossible to realize.

そのため，従来は，各カテゴリをそのカテゴリに最適な
グループ数によりグループ分けすることができず，各カ
テリゴ一律のグループ数でグループ分けを行っていた。Therefore, in the past, each category could not be grouped according to the optimal number of groups for that category, and the grouping was performed with a uniform number of groups for each caterigo.

また，上述の方式で作成した辞書を用いて実際に認識を
行ってみて，認識率が良くない場合には，人手により，
グループ数の調整を行っていた。In addition, when actually recognizing using the dictionary created by the above method and the recognition rate is not good, manually
I was adjusting the number of groups.

発明が解決しようとする問題点上述した各カテゴリ一律のグループ数で，グループ分け
を行う従来の方式では，一般に，冗長なグループ分けを
行うことになるため，認識率の割には，グループ数が多
くなることが多い。また，逆に，グループ数を減らす
と，認識率が大きく低下するという問題点を有してい
た。また，その結果を人手により調整する作業には，多
大な時間が必要であった。Problems to be Solved by the Invention In the above-mentioned conventional method of grouping with a uniform number of groups in each category, generally, redundant grouping is performed, and therefore the number of groups depends on the recognition rate. Often increases. On the contrary, when the number of groups is reduced, the recognition rate is significantly reduced. Also, the work of manually adjusting the results requires a great deal of time.

本発明は上記従来技術に鑑み、簡易な構成で高速処理可
能な文字図形認識用辞書作成装置を提供するものであ
る。In view of the above-mentioned conventional art, the present invention provides a dictionary creating apparatus for character / figure recognition which has a simple structure and can perform high-speed processing.

問題点を解決するための手段上記問題点を解決するために、本発明の文字図形認識用
辞書作成装置は、予め用意された複数の評価用文字図形
データから認識用特徴を抽出し、抽出した特徴と辞書中
に格納した複数の文字図形データの標準特徴とを比較
し、特徴間の一致度の高い標準特徴を持つ文字図形デー
タを認識結果とし、認識結果の正誤を判定してカテゴリ
ごとの認識率を得る認識手段と、カテゴリごとに複数個
の学習用文字図形データから認識用特徴を抽出し、カテ
ゴリごとの標準特徴を計算し、該標準特徴を辞書データ
として格納した初期辞書を作成する初期辞書作成手段
と、前記認識手段による評価用文字図形データ認識の結
果、所定の認識率よりも認識率の低いカテゴリに対し、
該カテゴリのグループ数を増やして該カテゴリに属する
学習用文字図形データのグループ分けを行う、カテゴリ
データ分割手段と、前記カテゴリデータ分割手段で分割
された個々のグループに属する学習用文字図形データの
認識用特徴から、各グループの標準特徴を計算し、分割
対象となったカテゴリの辞書中の標準特徴を前記カテゴ
リデータ分割手段で分割された各グループの標準特徴に
置き換えた辞書を作成する辞書更新手段とを備え、前記
認識手段は、第１回目の認識に際しては、前記初期辞書
作成手段により作成された初期辞書を用い、以後は、前
記辞書更新手段により最新の辞書を用いて認識を行い、
予め設定された終了条件が満たされるまで、前記認識手
段による認識と、前記カテゴリデータ分割手段による低
認識率がカテゴリの分割と、前記辞書更新手段による低
認識率カテゴリの辞書データの更新とを繰り返すもので
ある。Means for Solving Problems In order to solve the above problems, the character / graphic recognition dictionary creation device of the present invention extracts recognition features from a plurality of evaluation character / graphic data prepared in advance and extracts them. By comparing the features with the standard features of multiple character / graphic data stored in the dictionary, the character / graphic data with standard features with a high degree of matching between the features is used as the recognition result. A recognition means for obtaining a recognition rate and a recognition feature are extracted from a plurality of learning character / graphic data for each category, a standard feature for each category is calculated, and an initial dictionary in which the standard feature is stored as dictionary data is created. As a result of recognition of the evaluation character / graphic data by the initial dictionary creating means and the recognizing means, for a category having a lower recognition rate than a predetermined recognition rate
Category data dividing means for increasing the number of groups in the category and grouping learning character / graphic data belonging to the category, and recognition of learning character / graphic data belonging to each group divided by the category data dividing means Dictionary updating means for calculating a standard characteristic of each group from the use characteristics and creating a dictionary in which the standard characteristic in the dictionary of the category to be divided is replaced with the standard characteristic of each group divided by the category data dividing means In the first recognition, the recognition means uses the initial dictionary created by the initial dictionary creation means, and thereafter, the dictionary update means performs recognition using the latest dictionary,
The recognition by the recognition unit, the division of the low recognition rate by the category data division unit into categories, and the update of the dictionary data of the low recognition rate category by the dictionary update unit are repeated until a preset end condition is satisfied. It is a thing.

作用本発明は，上記の方式により，認識結果を，フィードバ
ックしながら，各カテゴリのグループ数を調整し，高い
認識率が実現でき，かつ，辞書データ数の少ない手書き
文字図形認識用辞書の作成を行う。Effect The present invention creates a dictionary for handwritten character / figure recognition that can realize a high recognition rate by adjusting the number of groups in each category while feeding back the recognition result by the above method and having a small number of dictionary data. To do.

実施例以下，本発明の実施例を第１図を用いて説明する。Embodiment An embodiment of the present invention will be described below with reference to FIG.

第１図は，本実施例の文字図形認識用辞書作成装置のブ
ロック図である。１は，初期辞書作成部,2は，認識部,3
は，辞書更新部,4は，制御部,5は，学習用データ格納メ
モリ,6は，認識用データ格納メモリ,7は，辞書メモリ,8
は，認識結果メモリである。FIG. 1 is a block diagram of a character / graphic recognition dictionary creating apparatus of the present embodiment. 1 is an initial dictionary creation unit, 2 is a recognition unit, 3
Is a dictionary update unit, 4 is a control unit, 5 is a learning data storage memory, 6 is a recognition data storage memory, 7 is a dictionary memory, 8
Is a recognition result memory.

以下，それぞれの機能について，詳しく説明する。Hereinafter, each function will be described in detail.

初期辞書作成部１は，次の手順により初期辞書を生成す
る。The initial dictionary creating unit 1 creates an initial dictionary by the following procedure.

（１）学習用データ格納メモリ５に格納された，学習用
文字図形データから所定の法則に従って，特徴を抽出す
る。学習用データ格納メモリ５には，予め，各カテゴリ
につき複数個の学習データが用意されているものとす
る。抽出される特徴は，一般に，複数個の要素から構成
されるものであるため，本実施例においては，多次元の
特徴ベクタとして扱う。(1) Features are extracted from the learning character / graphics data stored in the learning data storage memory 5 according to a predetermined rule. It is assumed that a plurality of learning data for each category is prepared in advance in the learning data storage memory 5. Since the extracted feature is generally composed of a plurality of elements, it is treated as a multidimensional feature vector in this embodiment.

（２）個々の学習データについて求まった特徴ベクタ
を，各カテリゴ毎に平均し，基準特徴とする。(2) The feature vector obtained for each learning data is averaged for each caterigo and used as the reference feature.

（３）（２）により求まった各カテリゴ毎の基準特徴ベ
クタを辞書データとして，所定の形式で辞書メモリ７に
格納し，辞書を作成する。認識部２は，辞書メモリ７内
の最新の辞書を用いて，認識を行い，認識率を計算す
る。認識対象となる文字図形データは，学習用データ格
納メモリ５内の学習データを用いてもよいし，別に，認
識用文字図形データを用意して用いてもよい。認識用デ
ータを，別に用意する場合には，認識用データ格納メモ
リ６を設け，そこに用意した認識用データを格納してお
く。但し，この場合，認識率を求めるため，そのデータ
が，どのカテゴリに属するものであるか，判明している
ものでなければならない。(3) The reference feature vector for each caterigo obtained in (2) is stored as dictionary data in the dictionary memory 7 in a predetermined format to create a dictionary. The recognition unit 2 performs recognition using the latest dictionary in the dictionary memory 7 and calculates the recognition rate. As the character / graphic data to be recognized, the learning data in the learning data storage memory 5 may be used, or the character / graphic data for recognition may be separately prepared and used. When preparing the recognition data separately, the recognition data storage memory 6 is provided and the prepared recognition data is stored therein. However, in this case, in order to obtain the recognition rate, it must be known to which category the data belongs.

認識率は，全認識対象データに占める，正認識されたデ
ータの割合である。正認識は，狭義には，認識対象デー
タの特徴ベクタと最も類似している（距離の小さい）辞
書データのカテゴリ，すなわち,1位認識結果と，その認
識対象の属するカテゴリが一致する場合をいうが，広義
の次のように定義してもかまわない。認識対象データの
特徴ベクタと辞書データとを類似度（距離）を基準とし
て，類似度の大きい（距離の小さい）ものから順に，複
数のカテゴリを候補として選び，制御カテゴリの中に，
その認識対象データの属するカテゴリが存在すれば，正
認識されたものとする。例えば，認識の際の３位までの
候補カテゴリによる認識率は,3位認識率と呼ばれる。The recognition rate is the ratio of correctly recognized data to all recognition target data. In a narrow sense, the correct recognition is a category of dictionary data that is most similar to the feature vector of the recognition target data (small distance), that is, the case where the first-rank recognition result matches the category to which the recognition target belongs. However, it may be defined as follows in a broad sense. Based on the similarity (distance) between the feature vector of the recognition target data and the dictionary data, multiple categories are selected as candidates in descending order of similarity (small distance), and among the control categories,
If there is a category to which the recognition target data belongs, it is regarded as correctly recognized. For example, the recognition rate based on the candidate categories up to the third rank at the time of recognition is called the third rank recognition rate.

本発明においては，いずれの認識率を採用してもかまわ
ない。In the present invention, any recognition rate may be adopted.

なお，本発明では，認識率は，各カテゴリ別に求める。
すなわち，各カテゴリ認識率は，各カテゴリに属する全
認識対象データの内，正認識されたデータの割合とな
る。In the present invention, the recognition rate is calculated for each category.
That is, each category recognition rate is the ratio of correctly recognized data among all recognition target data belonging to each category.

求まった認識率は，認識結果メモリ８に格納される。The obtained recognition rate is stored in the recognition result memory 8.

辞書更新部３は，次の手順により新たな辞書を作成す
る。The dictionary updating unit 3 creates a new dictionary by the following procedure.

（１）認識結果メモリ８内に格納された，認識部２によ
る認識結果を参照し，認識率の悪いカテゴリに対して，
認識に用いた辞書における，そのカテゴリのグループ数
よりも，グループ数を増やして，再度，そのカテゴリに
属する学習データのグループ分けを行う。複数のデータ
間の類似性に注目して，類似したものを集め，データを
いくつかのグループに分ける処理は，クラスタリングと
呼ばれている。クラスタリングアルゴリズムとしては，
最短距離法，最長距離法，重心法，メジアン法，ウォー
ド法などが知られている（“多変量統計解析法",田中豊
・脇本和昌，現代数学社）。本発明においては，クラス
タリングアルゴリズムの如何は問題ではないので，上記
アルゴリズムの内のいずれかにより，グループ分けを行
う。(1) Referring to the recognition result by the recognition unit 2 stored in the recognition result memory 8, for a category with a poor recognition rate,
In the dictionary used for recognition, the number of groups is increased from the number of groups in that category, and the learning data belonging to that category is grouped again. The process of focusing on similarities between a plurality of pieces of data, collecting similar pieces and dividing the pieces of data into groups is called clustering. As a clustering algorithm,
The shortest distance method, the longest distance method, the center of gravity method, the median method, the Ward method, etc. are known ("multivariate statistical analysis method", Yutaka Tanaka / Kazumasa Wakimoto, Hyundai Mathematics). In the present invention, since the clustering algorithm does not matter, the grouping is performed by any of the above algorithms.

また，グループ数増加方法にも，一定数だけグループ数
を増やす（例えば，増やす），あるいは，一定倍する
（例えば，グループ数を２倍にする）などが考えられる
が，そのいずれであってもかまわない。Further, as the method for increasing the number of groups, it is possible to increase the number of groups by a certain number (for example, increase) or multiply the number of groups by a certain number (for example, double the number of groups). I don't care.

（２）（１）の結果形成されたグループ毎に，そのグル
ープに属する学習データの特徴ベクタの平均を基準特徴
とし，それらを，そのカテゴリの新しい辞書データとす
る。(2) For each group formed as a result of (1), the average of the feature vectors of the learning data belonging to that group is used as the reference feature, and these are used as new dictionary data for that category.

（３）変更の無かったカテゴリに関しては，認識に用い
た辞書の辞書データをそのまま辞書データとする。(3) For categories that have not been changed, the dictionary data of the dictionary used for recognition is used as it is as dictionary data.

（４）（２）、（３）の辞書データを，所定の形式で辞
書メモリ７に格納し，新たな辞書を作成する。制御部４
は，終了条件の判定および繰り返しの制御を行う。(4) The dictionary data of (2) and (3) are stored in the dictionary memory 7 in a predetermined format to create a new dictionary. Control unit 4
Determines the end condition and controls the repetition.

以下に，本実施例の文字図形認識用辞書作成装置の動作
手順について，同じく第１図を用いて説明する。The operation procedure of the character / graphic recognition dictionary creating apparatus of the present embodiment will be described below with reference to FIG.

（１）初期辞書作成部１が，初期辞書を作成し，辞書メ
モリ７に格納する。(1) The initial dictionary creating unit 1 creates an initial dictionary and stores it in the dictionary memory 7.

（２）認識部２が，辞書メモリ７内の初期辞書を用いた
認識を行い，認識結果を認識結果メモリ８に格納する。(2) The recognition unit 2 performs recognition using the initial dictionary in the dictionary memory 7, and stores the recognition result in the recognition result memory 8.

（３）制御部４が，終了条件を判定する。終了条件が満
たせれていなければ，（４）の処理へ移り，終了条件が
満たされていた場合には，（７）の処理に移る。(3) The control unit 4 determines the ending condition. If the end condition is not satisfied, the process proceeds to (4), and if the end condition is satisfied, the process proceeds to (7).

（４）辞書更新部３が，認識結果メモリ８に格納された
認識結果をもとに，辞書数を増やした新たな辞書を作成
し，それを，辞書メモリ７に格納する。(4) The dictionary updating unit 3 creates a new dictionary in which the number of dictionaries is increased based on the recognition result stored in the recognition result memory 8 and stores it in the dictionary memory 7.

（５）認識部２が，辞書更新部３で新たに作成された，
辞書メモリ７内の最新辞書を用いて認識を行い，認識結
果を認識結果メモリ８に格納する。(5) The recognition unit 2 is newly created by the dictionary updating unit 3,
The latest dictionary in the dictionary memory 7 is used for recognition, and the recognition result is stored in the recognition result memory 8.

（６）以後，（３）において，予め設定された終了条件
が満たされるまで，（３），（４），（５）の処理を繰
り返す。(6) After that, in (3), the processes of (3), (4), and (5) are repeated until the preset termination condition is satisfied.

（７）（３）において，終了条件が満足された場合に
は，繰り返しを終了し，その時点における，辞書メモリ
７内の最新辞書が，最終結果として出力される。(7) If the ending condition is satisfied in (3), the iteration is ended, and the latest dictionary in the dictionary memory 7 at that time is output as the final result.

終了条件は，認識率や，繰り返しの回数について設定し
ておく。例えば、“全カテゴリの認識率が,99.5％以上
になったら繰り返しを終了する",あるいは，“繰り返し
は10回まで行う”等の条件が考えられる。また，それら
の条件を複合してもかまわない。The termination conditions are set for the recognition rate and the number of repetitions. For example, a condition such as "end the repetition when the recognition rate of all categories is 99.5% or more" or "repeat up to 10 times" can be considered. In addition, these conditions may be combined.

発明の効果予め，各カテゴリの最適グループ数を知ることは，大変
困難であるが，本発明によれば，認識結果をフィードバ
ックしながら，各カテゴリのグループ数を調整すること
により，最適グループ数に近いグループ数によるグルー
プ分けを実現できる。その結果，高速高性能な文字認識
装置に不可欠な，少ない辞書データ数で，高認識率を実
現する辞書の作成が可能となる。EFFECTS OF THE INVENTION It is very difficult to know the optimal number of groups in each category in advance, but according to the present invention, the optimal number of groups can be determined by adjusting the number of groups in each category while feeding back the recognition result. It is possible to realize grouping based on a close number of groups. As a result, it is possible to create a dictionary that realizes a high recognition rate with a small number of dictionary data, which is essential for a high-speed and high-performance character recognition device.

[Brief description of drawings]

第１図は本発明の機能図、第２図は学習データの特徴の
分布の例を示す図である。１……初期辞書作成部,2……認識部,3……辞書更新部,4
……制御部,5……学習用データ格納メモリ,6……認識用
データ格納メモリ,7……辞書メモリ,8……認識結果メモ
リ。FIG. 1 is a functional diagram of the present invention, and FIG. 2 is a diagram showing an example of distribution of characteristics of learning data. 1 …… Initial dictionary creation section, 2 …… Recognition section, 3 …… Dictionary update section, 4
...... Control unit, 5 ...... Learning data storage memory, 6 ...... Recognition data storage memory, 7 ...... Dictionary memory, 8 ...... Recognition result memory.

Claims

[Claims]

1. A feature for recognition is extracted from a plurality of character / graphic data for evaluation prepared in advance, and the extracted feature is compared with standard features of a plurality of character / graphic data stored in a dictionary to match the features. A recognition means that obtains the recognition rate for each type (category) of character and figure by using the character and figure data having a high standard characteristic as the recognition result and determining the correctness of the recognition result. An initial dictionary creating unit that extracts a recognition feature from data, calculates a standard feature for each category, and creates an initial dictionary that stores the standard feature as dictionary data, and a result of recognition character graphic data recognition by the recognition unit. , Category data having a recognition rate lower than a predetermined recognition rate, the number of groups of the category is increased to classify learning character / graphic data belonging to the category. The standard feature of each group is calculated from the dividing unit and the recognition feature of the learning character / graphic data belonging to each group divided by the category data dividing unit, and the standard feature in the dictionary of the category to be divided is calculated. Dictionary updating means for creating a dictionary in which is replaced with the standard features of each group divided by the category data dividing means, and the recognizing means, in the first recognition,
Using the initial dictionary created by the initial dictionary creating means, thereafter, recognition is performed by using the latest dictionary by the dictionary updating means, and recognition by the recognizing means until a preset ending condition is satisfied, A dictionary creating apparatus for character / figure recognition, characterized in that the division of the low recognition rate category by the category data dividing means and the updating of the dictionary data of the low recognition rate category by the dictionary updating means are repeated.

2. A patent characterized in that, in the category data dividing means, grouping is performed based on similarity or distance in a feature space of recognition features extracted from each learning character / graphic data of a division target category. The character dictionary recognizing dictionary creating apparatus according to claim 1.