JP2008071214A

JP2008071214A - Character recognition dictionary creation method and its device, character recognition method and its device, and storage medium in which program is stored

Info

Publication number: JP2008071214A
Application number: JP2006250250A
Authority: JP
Inventors: Yoshimasa Kimura; 義政木村
Original assignee: Kochi University of Technology
Current assignee: Kochi University of Technology
Priority date: 2006-09-15
Filing date: 2006-09-15
Publication date: 2008-03-27

Abstract

<P>PROBLEM TO BE SOLVED: To provide a character recognition dictionary creation method and its device for discriminating similar characters with less computational complexity by selecting the small number of features making contribution to identification out of the original features, and a character recognition method and its device thereof, and a storage medium in which the character recognition dictionary creation program and the character recognition program are stored. <P>SOLUTION: By applying a genetic algorithm to the feature selection, and altering generation to the gene having a rate of the recognition beyond a fixed value with a degree of conformance in reduction ratio of the features, the small number of features with high recognition capability can be obtained. Moreover, the feature of the recognition dictionary is acquired by the feature number stored in the detailed recognition dictionary in the detailed recognition, so that the character recognition system with high accuracy and the small dictionary capacity can be obtained. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、文字認識辞書作成方法とその装置、文字認識方法とその装置および文字認識辞書作成プログラムと文字認識プログラムとを格納した記憶媒体に関する。 The present invention relates to a character recognition dictionary creation method and apparatus, a character recognition method and apparatus, a character recognition dictionary creation program, and a storage medium storing a character recognition program.

従来の文字認識の方法としては、前処理−特徴抽出−識別からなる認識系がよく採られている（例えば非特許文献１参照）。入力された文字パターンは前処理部において、パターン中に存在する雑音の除去、文字パターンの位置や大きさの正規化が施される。特徴抽出部では文字パターンの本質を表わす特徴が抽出される。この特徴は、文字パターンの種類によって予め定められている。識別部では、認識対象とするカテゴリの典型的なパターンである標準パターンを用意しておき、入力文字パターンと該標準パターンとの近さの尺度を用い、最も近い尺度に対応するカテゴリを認識結果として出力する方法が知られている。 As a conventional character recognition method, a recognition system including preprocessing, feature extraction, and identification is often employed (see, for example, Non-Patent Document 1). The input character pattern is subjected to noise removal and normalization of the position and size of the character pattern in the preprocessing unit. The feature extraction unit extracts a feature representing the essence of the character pattern. This feature is predetermined by the type of character pattern. The identification unit prepares a standard pattern that is a typical pattern of a category to be recognized, uses a measure of the proximity between the input character pattern and the standard pattern, and recognizes the category corresponding to the closest measure as a result of recognition. The method of outputting as is known.

手書き漢字認識では、特徴としては文字線の方向特徴を用い、識別手法としては統計的識別手法を用いれば高い認識率が得られることが実証されている。しかし、漢字全字種の平均の認識率は高くとも、個々の字種では認識率が低いものもあり、それらの多くは（問、闇、閤、間）、（徴、微）などに見られる如く類似した字形形状を持つ類似文字である。 In handwritten Kanji recognition, it has been proved that a high recognition rate can be obtained by using a character line direction feature as a feature and a statistical identification method as an identification method. However, even though the average recognition rate for all Kanji types is high, there are some that have low recognition rates for individual character types, many of which are seen in (question, darkness, jealousy, between) and (signature, fine). It is a similar character having a similar character shape as can be seen.

そこで、類似文字を識別するには類似文字を専用に認識する詳細識別部を前記認識系に付加する方法が採られている。詳細識別の手法としては、文字パターンの局所情報を直接扱う構造解析的手法と文字パターンから抽出した特徴を扱う統計的手法に大別される。 Therefore, in order to identify similar characters, a method is adopted in which a detailed identification unit that recognizes similar characters exclusively is added to the recognition system. Detailed identification methods are roughly classified into a structural analysis method that directly handles local information of a character pattern and a statistical method that handles features extracted from the character pattern.

構造解析的手法としては、類似文字間の差分となるストロークを抽出し、これを用いて識別する方法（非特許文献２参照）と、類似文字間の差分が現れる部首等の部分領域のみ用いて識別する方法（非特許文献３参照）とがある。しかし、ストロークには接触や欠けが生じるため局所情報であるストロークや部分領域の抽出は難しい。 As a structural analysis technique, a stroke that is a difference between similar characters is extracted and used for identification (see Non-Patent Document 2), and only a partial region such as a radical in which a difference between similar characters appears is used. And a method of identifying them (see Non-Patent Document 3). However, it is difficult to extract strokes and partial areas, which are local information, because contact and chipping occur in the strokes.

統計的手法はさらにいくつかの手法に分かれるが、それぞれ問題点がある。類似文字間の標準パターンの間で差分の大きい差分特徴のみを用いて識別する方法（非特許文献４参照）は原特徴を直接使用できる利点はあるが、差分特徴は類似文字間の差分が大きいという基準で選択されたものであり、その安定性は吟味されておらず、特徴が変動した場合に識別力が弱くなる。部分空間を用いる方法（非特許文献５参照）は特徴数を削減することはできるが、自カテゴリーの特徴分布しか考慮しないため他カテゴリーとの差分を見る機能は無い。判別分析を用いる方法（非特許文献６参照）は他カテゴリーとの差分を見る機能はあるが、特徴分布が正規分布であることを仮定している。しかし、実際の文字パターンの分布は正規分布でないことが多いうえ、識別対象カテゴリー数をＱ個とすると識別に有効な特徴の個数は（Ｑ−１）個しか得られずＱの数が小さいときは非常に少ない特徴数で識別せねばならず識別が困難になる。また、部分空間法、判別分析は原特徴に線形変換を施して得られる特徴を新たな特徴とするため特徴当りの計算量は増加することとなる。 Statistical methods are further divided into several methods, each with its own problems. The method of using only the difference feature having a large difference between the standard patterns between similar characters (see Non-Patent Document 4) has an advantage that the original feature can be used directly, but the difference feature has a large difference between similar characters. The stability is not examined, and the discriminating power is weakened when the characteristics change. Although the method using a subspace (see Non-Patent Document 5) can reduce the number of features, it only has a feature distribution of its own category, so there is no function to see the difference from other categories. The method using discriminant analysis (see Non-Patent Document 6) has a function of seeing a difference from other categories, but assumes that the feature distribution is a normal distribution. However, the distribution of the actual character pattern is often not a normal distribution, and when the number of categories to be identified is Q, only (Q-1) features can be obtained and the number of Q is small. Must be identified with a very small number of features, making identification difficult. In addition, since the subspace method and discriminant analysis use a feature obtained by performing linear transformation on the original feature as a new feature, the amount of calculation per feature increases.

「パターン認識」電子情報通信学会、１９８８年．"Pattern recognition", IEICE, 1988. 「外郭構造情報を利用したストローク抽出法による手書き漢字認識」、電子情報通信学会論文誌、（Ｄ），ｖｏｌ．Ｊ６７−Ｄ，ｎｏ．９，ｐｐ．１０５２−１０５９，１９８４年．“Handwritten Kanji Recognition by Stroke Extraction Method Using Outer Structure Information”, IEICE Transactions, (D), vol. J67-D, no. 9, pp. 1052-1059, 1984. 「部分整合領域の自動学習による手書き文字の詳細識別に関する一手法」、電子情報通信学会論文誌、Ｄ−▲２▼，ｖｏｌ．Ｊ７８−Ｄ−▲２▼，ｎｏ．３，ｐｐ．４９２−５００，１９９５年．“A Method for Detailed Identification of Handwritten Characters by Automatic Learning of Partially Matched Regions”, IEICE Transactions, D- (2), vol. J78-D- (2), no. 3, pp. 492-500, 1995. 「数量化理論を用いた手書き漢字詳細識別の一検討」、昭和５９年度電子情報通信学会総合全国大会、ｎｏ．１６２０，１９８４年．“A Study on Detailed Identification of Handwritten Kanji Using Quantification Theory”, General Conference of IEICE General Conference, no. 1620, 1984. 「重ね合わせ的手法による手書き類似漢字識別」、電子情報通信学会パターン認識・理解研究会技術研究報告、ＰＲＵ９１−４８，１９９１年．"Handwritten similar Kanji identification by superposition method", IEICE Pattern Recognition / Understanding Technical Report, PRU 91-48, 1991. 「判別分析を用いた手書き文字認識の高精度化」、ＮＴＴＲ＆Ｄ，ｖｏｌ．４３，Ｎｏ．８，ｐｐ．８７３−８８０，１９９４年．“High accuracy of handwritten character recognition using discriminant analysis”, NTT R & D, vol. 43, no. 8, pp. 873-880, 1994.

以上述べたように、構造解析的手法はストロークや部分領域の抽出が難しく、統計的手法の中で原特徴から選択した特徴を直接使用する差分特徴は特徴変動に弱く、また、特徴数を削減し少数個の特徴で識別する部分空間法や判別分析は特徴分布が正規分布であるという仮定の上に成り立っているので真に識別に貢献する特徴を選択できるとは限らないうえ、原特徴を直接使用する差分特徴に比べて特徴当たりの計算量が増加するという問題点があった。 As described above, it is difficult to extract strokes and partial areas in the structural analysis method, and the difference feature that directly uses the feature selected from the original feature in the statistical method is vulnerable to feature variation, and the number of features is reduced. However, subspace methods and discriminant analysis that identify with a small number of features are based on the assumption that the feature distribution is a normal distribution, so it is not always possible to select features that really contribute to discrimination, and the original features There is a problem that the amount of calculation per feature increases compared to the difference feature used directly.

本発明は上記に鑑みてなされたものであり、原特徴の中から識別に貢献している少数の特徴を選択して少ない計算量で類似文字識別を行う文字認識辞書作成方法とその装置、文字認識方法とその装置および文字認識辞書作成プログラムと文字認識プログラムとを格納した記憶媒体を提供するところにある。 The present invention has been made in view of the above, and a character recognition dictionary creation method and apparatus for selecting similar characters with a small amount of calculation by selecting a small number of features that contribute to identification from among original features, and a character thereof It is an object of the present invention to provide a storage medium storing a recognition method and apparatus, a character recognition dictionary creation program, and a character recognition program.

本発明は上記目的を達成するため、カテゴリ当たり多数の文字パターンを収集した学習データを用い、該文字パターンの特徴を抽出して統計的処理を施すことにより該カテゴリの標準パターンを作成する文字認識辞書作成方法において、
“０”、“１”の２値をとるビットが特徴の個数だけ連なったビット列からなる遺伝子を複数個用意し、予め定められた確率で“１”が生じるよう各個体に“０”、“１”の値を割り当てることにより初期遺伝子集団を作成し、
該初期遺伝子集団の個体においてビットが“１”の値を取るアドレスに対応する特徴のみを用いて、指定された類似文字の標準パターンの特徴と該類似文字の学習データの文字パターンの特徴との間で識別尺度を計算し該個体による認識率を求める処理を全ての固体に対して行い、
前記固体における認識率が予め定められた認識率以上となる固体のみを集めた集合を親遺伝子候補集合として作成し、
前記親遺伝子候補集合の各個体に対し、少ない特徴数で前期予め定められた認識率を保持する観点から設定された適応度を求め、
前記親遺伝子候補集合の各個体の適応度を用いて次世代の親遺伝子となる固体を選択することにより親遺伝子集合を作成し、
前記親遺伝子集合から固体を２組取り出し、該２組の固体の遺伝子を交叉により交換した新たな固体を２組生成して前記親遺伝子集合に戻す操作を予め定められた回数だけ繰り返し、
前記親遺伝子集合から固体を取り出し、遺伝子の一部に“０”、“１”の値を予め与えられた確率で反転させる突然変異を生じさせ、
前記した、個体による認識−適応度計算−選択−交叉−突然変異の一連の処理を１世代における処理とし予め定められた収束の基準を満足するか、あるいは、予め定められた最大世代数に到達するかの条件を満たすまで世代交代を繰り返し、最終的に得られた親遺伝子集合の中で適応度が最大の固体を取り出し、
指定された類似文字のカテゴリ名と、前記適応度が最大の固体から“１”の値を取るアドレスに対応する特徴番号とを用いて前記類似文字の詳細識別辞書を作成すること、
を最も主要な特徴とする。 In order to achieve the above object, the present invention uses learning data obtained by collecting a large number of character patterns per category, extracts the characteristics of the character patterns, and performs statistical processing to generate character patterns for the categories. In the dictionary creation method,
A plurality of genes consisting of bit strings in which the number of binary bits of “0” and “1” are connected for the number of features are prepared, and “0”, “ Create an initial gene population by assigning a value of 1 "
The feature of the standard pattern of the designated similar character and the feature of the character pattern of the learning data of the similar character using only the feature corresponding to the address where the bit takes a value of “1” in the individual of the initial gene population The process of calculating the discrimination scale between them and calculating the recognition rate by the individual is performed on all solids,
Create a set of only solids whose recognition rate in the solid is equal to or higher than a predetermined recognition rate as a parent gene candidate set,
For each individual of the parent gene candidate set, find the fitness set from the viewpoint of maintaining the recognition rate predetermined in the previous period with a small number of features,
Create a parent gene set by selecting a solid that will be the next generation parent gene using the fitness of each individual of the parent gene candidate set,
Two sets of solids are taken out from the parent gene set, two sets of solid genes are exchanged by crossover, and two sets of new solids are generated and returned to the parent gene set for a predetermined number of times,
Taking a solid from the parent gene set, causing a mutation that inverts the value of “0”, “1” to a part of the gene with a predetermined probability,
The above-mentioned series of recognition-fitness calculation-selection-crossover-mutation processing by an individual satisfies the predetermined convergence criteria or reaches a predetermined maximum number of generations. The generation change is repeated until the condition of whether or not is satisfied, and the solid with the maximum fitness in the finally obtained parent gene set is taken out,
Creating a detailed identification dictionary of the similar characters using the category name of the designated similar characters and the feature number corresponding to the address having a value of “1” from the object having the maximum fitness;
Is the most important feature.

また、前記適応度は、個体の有する“１”の総数を特徴数とし、当該世代の固体の中で最大特徴数を求め、各固体が該最大特徴数から低減できた特徴数を前記最大特徴数で除した値で定義される特徴数減少比とすることを特徴とする。 Further, the fitness is the total number of “1” that an individual has as a feature number, the maximum feature number among the objects of the generation is obtained, and the number of features that each individual can reduce from the maximum feature number is the maximum feature number. The feature number reduction ratio is defined by a value divided by a number.

本発明は上記目的を達成するため、入力文字パターンの特徴と標準パターンとの特徴との間で計算して得られる近さの尺度を用いて前記入力文字パターンの属するカテゴリを前記尺度の昇順に候補列として出力する文字認識方法において、
識別で得られた第１位候補の識別尺度と第２位候補の識別尺度とから前記第１位候補が正解カテゴリであるかの信頼性判定を行い、
信頼性が高いと判定された場合は前記第１位候補を認識結果として出力し、信頼性が高いと判定されなかった場合は識別で得られた候補を認識対象として後続の詳細識別部に送出し、
詳細識別部では、詳細識別辞書を探索して前記第1位候補の属するカテゴリ集合と該カテゴリ集合の識別に用いる詳細識別用特徴を得、
前記カテゴリ集合の識別辞書から前記詳細識別用特徴の特徴番号で指定される特徴を読み出し、該特徴と入力文字パターンから得られた特徴中の前記詳細識別用特徴とを用いて詳細識別を行い、得られた結果を認識結果として出力すること、
を特徴とする。 In order to achieve the above-mentioned object, the present invention assigns categories to which the input character pattern belongs in ascending order of the scale using a measure of proximity obtained by calculating between the features of the input character pattern and the features of the standard pattern. In the character recognition method to output as a candidate string,
From the identification scale of the first candidate obtained by the identification and the identification scale of the second candidate, it is determined whether or not the first candidate is a correct category,
If it is determined that the reliability is high, the first candidate is output as a recognition result, and if it is not determined that the reliability is high, the candidate obtained by identification is sent as a recognition target to the subsequent detailed identification unit. And
The detailed identification unit searches the detailed identification dictionary to obtain a category set to which the first candidate belongs and a feature for detailed identification used for identification of the category set,
Read the feature specified by the feature number of the feature for detailed identification from the identification dictionary of the category set, perform the detailed identification using the feature and the feature for detailed identification in the feature obtained from the input character pattern, Outputting the obtained results as recognition results;
It is characterized by.

本発明は次のような効果を奏する。請求項１、請求項４に記載されている発明は、識別に使用している特徴から選択された特徴を用いて詳細識別を行う方法であって、一定の認識率を保持しながら特徴数を減少させるよう遺伝的アルゴリズムにより選択した特徴を用いているので、差分特徴のように類似文字の標準パターンの情報のみで特徴選択したものとは異なり、また、部分空間法や判別分析のように特徴分布のみから特徴選択したものとも異なり、識別能力の高い少数の特徴が得られる長所がある。 The present invention has the following effects. The invention described in claims 1 and 4 is a method for performing detailed identification using a feature selected from the features used for identification, wherein the number of features is maintained while maintaining a constant recognition rate. The feature selected by the genetic algorithm is used to reduce it, so it is different from the feature selected only by the standard pattern information of similar characters like the difference feature, and the feature like the subspace method and discriminant analysis Unlike the features selected from the distribution alone, there is an advantage that a small number of features with high discrimination ability can be obtained.

請求項２、請求項５に記載されている発明は、請求項１、請求項４に記載されている発明において、適応度を特徴数減少比としたものであり、適応度の取る値域が大きく、最適に近い解が見つけ易く、収束も速くなるという長所がある。 The invention described in claims 2 and 5 is the invention described in claims 1 and 4, in which the fitness is a reduction ratio of the number of features, and the value range of the fitness is large. There are advantages that it is easy to find a solution close to the optimum and that convergence is quick.

請求項３、請求項６に記載されている発明は、詳細識別用の特徴は識別用の特徴の一部を使用するものであり詳細識別用特徴を別途設定していないので、詳細識別辞書に特徴を格納する必要がなくなり辞書容量が小さくなるとともに、認識系構成に一貫性を持たせることができるという長所がある。 In the invention described in claims 3 and 6, the detailed identification feature uses a part of the identification feature and the detailed identification feature is not separately set. There is an advantage in that it is not necessary to store the features, the dictionary capacity is reduced, and the recognition system configuration can be made consistent.

以下、本発明の実施の形態について図面を参照して説明する。 Embodiments of the present invention will be described below with reference to the drawings.

図1は本発明の一実施例を示す文字認識辞書作成装置のブロック構成図で、入力パターンメモリ部１、前処理部２、特徴抽出部３、特徴選択部４、詳細識別辞書作成部５、詳細識別辞書６から成る。 FIG. 1 is a block diagram of a character recognition dictionary creating apparatus according to an embodiment of the present invention. An input pattern memory unit 1, a preprocessing unit 2, a feature extracting unit 3, a feature selecting unit 4, a detailed identification dictionary creating unit 5, It consists of a detailed identification dictionary 6.

入力パターンメモリ部１はスキャナ、テレビカメラ等の入力装置により取り込まれた文字パターンを格納し、前処理部２は正規化・雑音除去等を行い、特徴抽出部３は認識に使用するための特徴を入力文字パターンから抽出し、特徴選択部４は前記特徴の中から詳細識別に必要な特徴を選択し、詳細識別辞書作成部５は前記選択された特徴に文字コードを付加した情報を詳細識別辞書６に格納する。 The input pattern memory unit 1 stores character patterns captured by an input device such as a scanner or a TV camera, the preprocessing unit 2 performs normalization and noise removal, and the feature extraction unit 3 is a feature for use in recognition. Is extracted from the input character pattern, the feature selection unit 4 selects a feature necessary for detailed identification from the features, and the detailed identification dictionary creation unit 5 performs detailed identification of information obtained by adding a character code to the selected feature. Store in dictionary 6.

次に、本発明で主要な部分をなす特徴選択部４の動作を図２を用いて説明する。図２は特徴選択部４の一実施例を示すブロック構成図であって、初期遺伝子集団作成回路４１、識別尺度計算回路４２、認識率計算回路４３、親遺伝子候補選択回路４４、適応度計算回路４５、親遺伝子選択回路４６、交叉操作回路４７、突然変異操作回路４８、世代交代継続判定回路４９から成る。 Next, the operation of the feature selection unit 4 which is a main part of the present invention will be described with reference to FIG. FIG. 2 is a block configuration diagram showing an embodiment of the feature selection unit 4. An initial gene group creation circuit 41, an identification scale calculation circuit 42, a recognition rate calculation circuit 43, a parent gene candidate selection circuit 44, and a fitness calculation circuit. 45, a parent gene selection circuit 46, a crossover operation circuit 47, a mutation operation circuit 48, and a generation change continuation determination circuit 49.

初期遺伝子集団作成回路４１は遺伝子の初期状態を作成する。図３は遺伝子の初期状態を示す図であって、遺伝子の固体は“０”、“１”の２値をとるビットがｄ個連なったビット列からなり、該固体がＫ個集まって１つの集団を形成する様子が示されている。遺伝子の各ビットのアドレスは特徴番号に対応しており、ビットの値が“０”を取るときは該ビットのアドレスに対応する特徴は詳細識別に使用されず、“1”を取るときは使用されることを表している。初期遺伝子集団作成回路４１は確率α（０＜α＜１）で“１”が生じるよう各個体に“０”、“１”の値を割り当てることにより初期状態をセットする。該遺伝子に遺伝的アルゴリズムを作用させることにより“１”の数が減少し、詳細識別に使用される特徴が選ばれる仕組みとなっている。 The initial gene group creation circuit 41 creates an initial state of genes. FIG. 3 is a diagram showing an initial state of a gene, and a gene solid is composed of a bit string in which d binary bits of “0” and “1” are connected in series, and a group of K pieces of these solids is gathered. Is shown. The address of each bit of the gene corresponds to the feature number. When the value of the bit is “0”, the feature corresponding to the address of the bit is not used for detailed identification, and is used when “1” is taken. Represents that The initial gene group creation circuit 41 sets the initial state by assigning the values “0” and “1” to each individual so that “1” occurs with the probability α (0 <α <1). By applying a genetic algorithm to the gene, the number of “1” is reduced, and a feature used for detailed identification is selected.

識別尺度計算回路４２は、世代ｇ＝１において、初期遺伝子集団作成回路41から受け取ったＫ個の固体の中の第1固体（ｋ＝１）のビット列で“１”を取るアドレスに対応する特徴のみを用いて、指定された類似文字の標準パターンの特徴と該類似文字の学習データの文字パターンの特徴との間で識別尺度の計算を行う。入力パターンの特徴は特徴抽出部３から送出され、類似文字の標準パターンの特徴は識別辞書１０から読み出され、識別尺度の計算に供される。識別尺度の計算は学習データの中の指定された類似文字の全ての文字パターンについて行う。第１固体における処理が終了すると、第２個体以降、全ての個体について識別尺度の計算を行う。 The identification scale calculation circuit 42 corresponds to the address corresponding to an address that takes “1” in the bit string of the first solid (k = 1) among the K solids received from the initial gene population creation circuit 41 in the generation g = 1. Is used to calculate the identification scale between the feature of the standard pattern of the designated similar character and the feature of the character pattern of the learning data of the similar character. The feature of the input pattern is transmitted from the feature extraction unit 3, and the feature of the standard pattern of similar characters is read from the identification dictionary 10 and used for calculation of the identification scale. The calculation of the identification scale is performed for all character patterns of designated similar characters in the learning data. When the processing in the first solid is completed, the identification scale is calculated for all the individuals after the second individual.

認識率計算回路４３は、識別尺度計算回路４２で得られた類似文字の識別尺度を用いてｋ＝１，２，…，Ｋと変化させながら第ｋ固体の学習データに対する認識率γ_ｋをＫ個の固体全てについて計算する。 The recognition rate calculation circuit 43 changes the recognition rate γ _k for the learning data of the k-th solid while K is changed to k = 1, 2,..., K using the similar character discrimination scale obtained by the discrimination scale calculation circuit 42. Calculate for all solids.

親遺伝子候補選択回路４４は、認識率計算回路４３で得られた各個体の認識率γ_ｋを用いて予め定められた認識率γ_０以上となる固体のみを集めた世代ｇにおける集合Ａ（ｇ）を作成し、これらを親遺伝子の候補とする。 Parent gene candidate selection circuit 44 sets in generation g of only the collected solids as a predetermined recognition rate gamma ₀ or more with the recognition rate gamma _k of each individual obtained by the recognition rate calculating circuit 43 A (g ) And make these parent gene candidates.

適応度計算回路４５は、親遺伝子候補選択回路４４で得られた集合Ａ（ｇ）に属する固体ｋの適応度ａ（ｋ）を計算する。適応度ａ（ｋ）として個体ｋにおける特徴数減少比を用いると The fitness calculation circuit 45 calculates the fitness a (k) of the individual k belonging to the set A (g) obtained by the parent gene candidate selection circuit 44. When the feature number reduction ratio in the individual k is used as the fitness a (k)

で表わされる。ここで、Ｄ_ｍａｘは集合Ａ（ｇ）に属する固体の有する特徴数の中で最大となる特徴数、Ｄ（ｋ）は第ｋ固体の特徴数である。式（１）では各個体が最大特徴数から低減できた特徴数を最大特徴数で除して得られる特徴数減少比を適応度として用いているが、最大特徴数に対する固体の特徴数の比など、最大特徴数と固体の特徴数の関係を表すものであれば他の尺度であっても良い。 It is represented by Here, D _max is the maximum number of features of the solids belonging to the set A (g), and D (k) is the number of features of the k-th solid. In equation (1), the feature number reduction ratio obtained by dividing the number of features that each individual can reduce from the maximum number of features by the maximum number of features is used as fitness, but the ratio of the number of individual features to the maximum number of features Any other scale may be used as long as it represents the relationship between the maximum feature number and the solid feature number.

親遺伝子選択回路４６は、適応度計算回路４５で得た適応度ａ（ｋ）を親遺伝子候補選択回路４４で得た集合Ａ（ｇ）に作用させ、適応度比例戦略で次世代の親遺伝子となる固体を選択し、Ｋ個の個体からなる親遺伝子集合Ｓ（ｇ）を作成する。ここでは、親遺伝子作成に適応度比例戦略を用いたがエリート保存戦略、期待値戦略他の方法であっても良い。 The parent gene selection circuit 46 causes the fitness a (k) obtained by the fitness calculation circuit 45 to act on the set A (g) obtained by the parent gene candidate selection circuit 44, and uses the fitness proportional strategy to generate the next generation parent gene. Is selected, and a parent gene set S (g) consisting of K individuals is created. Here, the fitness proportional strategy is used to create the parent gene, but an elite preservation strategy, an expected value strategy, or other methods may be used.

交叉操作回路４７は、親遺伝子選択回路４６で得た親遺伝子集合Ｓ（ｇ）に交叉を施し、Ｋ個の個体からなる親遺伝子集合Ｓ_ｃ（ｇ）を作成する。１点交叉はＳ（ｇ）から任意の２個の個体を取り出し、ランダムに発生させたアドレスで以って該２個の個体を切断し、該切断点以降のビット列を互いに組み替える処理で行う。多点交叉は交叉点を複数個用意し、一度に該複数個の交叉点で遺伝子を交換する。交叉の処理は1点であっても多点であっても、また、他の方式であってもよい。 The crossover operation circuit 47 performs crossover on the parent gene set S (g) obtained by the parent gene selection circuit 46 to create a parent gene set S _c (g) composed of K individuals. One-point crossover is performed by taking any two individuals from S (g), cutting the two individuals with randomly generated addresses, and rearranging the bit strings after the cut point. In multipoint crossover, a plurality of crossover points are prepared, and genes are exchanged at the plurality of crossover points at a time. The crossover process may be one point, multiple points, or another method.

突然変異操作回路４８は、交叉操作回路４７で得た親遺伝子集合Ｓ_ｃ（ｇ）から任意の個体を取り出し突然変異を施し、Ｋ個の個体からなる親遺伝子集合Ｓ_ｍ（ｇ）を作成する。突然変異は取り出された個体に対し、ランダムに発生させたアドレスにおいてビット列の内容を予め与えられた確率で反転させるものである。 The mutation operation circuit 48 extracts an arbitrary individual from the parent gene set S _c (g) obtained by the crossover operation circuit 47 and performs mutation to create a parent gene set S _m (g) composed of K individuals. . Mutation is to reverse the contents of a bit string at a randomly generated address with a probability given in advance to an individual taken out.

世代交代継続判定回路４９は、遺伝的アルゴリズムの収束の尺度を計算し、収束の基準が満たされているか、あるいは、ｇが予め定められた最大値ｇ_ｍａｘを超えていないかの判定が行われ、これら２つの条件を満たさない場合は次の世代ｇ＋１において、親遺伝子集合Ｓ_ｍ（ｇ）をＳ（ｇ）に置き換えて前記一連の処理を繰り返し、新しい親遺伝子集合Ｓ_ｍ（ｇ）を得る。いずれかの条件が満たされたとき、親遺伝子集合Ｓ_ｍ（ｇ）の中で最大の適応度を持つ個体が詳細識別辞書作成部５へ送出される。収束の尺度としては遺伝子集団の中の固体の取る適応度の最大値、遺伝子集団における固体全体の適応度の平均値等があるが、この他の尺度であっても良い。 The generational change continuation determination circuit 49 calculates a convergence measure of the genetic algorithm, and determines whether the convergence criterion is satisfied or whether g does not exceed a predetermined maximum value _gmax. If these two conditions are not satisfied, in the next generation g + 1, the parent gene set S _m (g) is replaced with S (g) and the above-described series of processing is repeated to obtain a new parent gene set S _m (g). . When any of the conditions is satisfied, an individual having the maximum fitness in the parent gene set S _m (g) is sent to the detailed identification dictionary creation unit 5. As a scale of convergence, there are a maximum value of fitness of a solid in a gene population, an average value of fitness of the whole solid in a gene population, etc., but other scales may be used.

以上の処理により世代ｇにおける遺伝子が得られる。以降、個体による認識−適応度計算−選択−交叉−突然変異の一連の処理を1世代の処理とし、終了の条件が満たされるまで前記処理を繰り返す。 The gene in generation g is obtained by the above processing. Thereafter, a series of processes of recognition, fitness calculation, selection, crossover, and mutation by an individual is regarded as one generation process, and the above process is repeated until a termination condition is satisfied.

詳細識別辞書作成部５は特徴選択部４から送出された親遺伝子集合Ｓ_ｍ（ｇ）の情報を基に詳細識別辞書を作成する。図４は詳細識別辞書６の一実施例を示す構成図であって、カテゴリ集合Ω_ｈ、対象カテゴリＣ_ｈ１，Ｃ_ｈ２，…，Ｃ_{ｈＬ（ｈ）}、詳細識別用特徴ｆ_ｈ１，ｆ_ｈ２，…，ｆ_{ｈＬ（ｈ）}からなる。カテゴリ集合Ω_ｈ（ｈ＝１，２，…，Ｈ）とその要素である対象カテゴリＣ_ｈ１，Ｃ_ｈ２，…，Ｃ_{ｈＬ（ｈ）}は予め与えられた方法で作成されている。特徴選択部４はカテゴリ集合Ω_ｈの要素をなすカテゴリを識別対象として遺伝的アルゴリズムを施すことにより作成された親遺伝子集合Ｓ_ｍ（ｇ）を受け付ける。Ｓ_ｍ（ｇ）のビット列で“１”が立っているアドレスが特徴番号となって詳細識別用特徴の特徴番号ｆ_ｈ１，ｆ_ｈ２，…，ｆ_{ｈＬ（ｈ）}が詳細識別辞書６の所定の位置に格納される。前記処理はΩ_１から予め与えられたカテゴリ集合の数Ｈに到達するまで順に行われ、詳細識別辞書６が完成する。 The detailed identification dictionary creation unit 5 creates a detailed identification dictionary based on the information of the parent gene set S _m (g) sent from the feature selection unit 4. FIG. 4 is a block diagram showing an embodiment of the detailed identification dictionary 6. The category set Ω _h , the target categories C _h1 , C _h2 ,..., C _{hL (h)} , the detailed identification features f _h1 , f _h2 , ..., f _{hL (h)} . The category set Ω _h (h = 1, 2,..., H) and the target categories C _h1 , C _h2 ,..., C _{hL (h)} that are elements thereof are created in a predetermined method. Feature selection unit 4 receives the parent gene set S _{m (g)} created by applying a genetic algorithm as an identification subject categories that form the elements of category set Omega _h. The address where “1” stands in the bit string of S _m (g) becomes the feature number, and the feature numbers f _h1 , f _h2 _,. Stored in position. The above processing is performed in order until the number of category sets H given in advance from Ω ₁ is reached, and the detailed identification dictionary 6 is completed.

本発明に係る文字認識辞書作成方法において主要な部分をなす特徴選択部４の一実施例である図２の動作を図５のフローチャートを用いて説明する。初期遺伝子集団作成回路４１は、“０”、“１”の２値をとるビットがｄ個連なって構成された遺伝子がＫ個集まって１つの集団となる初期遺伝子集団を作成し（ステップ２０１）、世代数ｇにはｇ＝１が、固体番号ｋにはｋ＝１が、初期値としてそれぞれセットされる（ステップ２０２）。識別尺度計算回路４２は、世代ｇにおいて、第ｋ固体のビット列で“１”を取るアドレスに対応する特徴のみを用いて、指定された類似文字の標準パターンの特徴と該類似文字の学習データの文字パターンの特徴との間で識別尺度を計算を行い、認識率計算回路４３は、該識別尺度を用いてｋ＝１，２，…，Ｋと変化させながら第ｋ固体の学習データに対する認識率γ_ｋをＫ個の固体全てについて計算する（ステップ２０３）。親遺伝子候補選択回路４４は、各個体の認識率γ_ｋが予め定められた認識率γ_０以上となる固体のみを集めた世代ｇにおける集合Ａ（ｇ）を作成し、これらを親遺伝子の候補とする（ステップ２０４）。適応度計算回路４５は、集合Ａ（ｇ）に属する固体の中で特徴数が最大となる固体の有する特徴数Ｄ_ｍａｘを求め（ステップ２０５）、第ｋ固体の特徴数をＤ（ｋ）とし固体ｋの特徴数減少比を適応度と規定した場合の適応度ａ（ｋ）を計算する（ステップ２０６）。親遺伝子選択回路４６は、適応度ａ（ｋ）を集合Ａ（ｇ）に作用させ、適応度比例戦略で次世代の親遺伝子となる固体を選択し、Ｋ個の個体からなる親遺伝子集合Ｓ（ｇ）を作成する（ステップ２０７）。交叉操作回路４７は、Ｓ（ｇ）に交叉を施し、Ｋ個の個体からなる親遺伝子集合Ｓ_ｃ（ｇ）を作成し（ステップ２０８）、突然変異操作回路４８はＳ_ｃ（ｇ）から任意の個体を取り出し突然変異を施し、Ｋ個の個体からなる親遺伝子集合Ｓ_ｍ（ｇ）を作成する（ステップ２０９）。世代交代継続判定回路４９は、遺伝的アルゴリズムの収束の尺度を計算し（ステップ２１０）、収束の基準が満たされているか、あるいは、ｇが予め定められた最大値ｇ_ｍａｘを超えていないかの判定が行われ（ステップ２１１）、これら２つの条件を満たさない場合にはｇ＝ｇ＋１とし（ステップ２１２）、世代ｇのＳ_ｍ（ｇ）を世代ｇ＋１のＳ（ｇ）に置き換えて前記一連の処理を行い新しいＳ_ｍ（ｇ）を得る。いずれかの条件が満たされた場合には親遺伝子集合Ｓ_ｍ（ｇ）の中で最大の適応度を持つ個体が詳細識別辞書作成部５へ送出され、処理は終了する（ステップ２１３）。 2 will be described with reference to the flowchart of FIG. 5, which is an embodiment of the feature selection unit 4 which is a main part of the character recognition dictionary creation method according to the present invention. The initial gene group creation circuit 41 creates an initial gene group that is a group of K genes composed of d consecutive bits of “0” and “1”. Then, g = 1 is set as the generation number g, and k = 1 is set as the initial value for the individual number k (step 202). The identification scale calculation circuit 42 uses only the feature corresponding to the address that takes “1” in the k-th solid bit string in the generation g, and uses the standard pattern feature of the designated similar character and the learning data of the similar character. The recognition rate is calculated between the character pattern features, and the recognition rate calculation circuit 43 changes the recognition rate for the k-th solid learning data while changing k = 1, 2,... γ _k is calculated for all K solids (step 203). The parent gene candidate selection circuit 44 creates a set A (g) in the generation g in which only individuals whose recognition rate γ _{k of} each individual is equal to or higher than a predetermined recognition rate γ ₀ are collected, and these are set as parent gene candidates. (Step 204). The fitness calculation circuit 45 obtains the feature number D _max of the solid having the maximum number of features among the solids belonging to the set A (g) (step 205), and sets the feature number of the k-th solid as D (k). The fitness a (k) when the feature number reduction ratio of the solid k is defined as the fitness is calculated (step 206). The parent gene selection circuit 46 causes the fitness a (k) to act on the set A (g), selects a solid that will be the next generation parent gene by the fitness proportional strategy, and sets the parent gene set S composed of K individuals. (G) is created (step 207). The crossover operation circuit 47 performs crossover on S (g) to create a parent gene set S _c (g) composed of K individuals (step 208), and the mutation operation circuit 48 arbitrarily selects S _c (g). Are extracted and mutated to create a parent gene set S _m (g) consisting of K individuals (step 209). The generation change continuation determination circuit 49 calculates a convergence measure of the genetic algorithm (step 210), and whether the criterion of convergence is satisfied or whether g does not exceed a predetermined maximum value g _max A determination is made (step 211). If these two conditions are not satisfied, g = g + 1 is set (step 212), and S _m (g) of generation g is replaced with S (g) of generation g + 1. Processing is performed to obtain a new S _m (g). If any of the conditions is satisfied, the individual having the maximum fitness in the parent gene set S _m (g) is sent to the detailed identification dictionary creating unit 5 and the process ends (step 213).

図６は本発明の一実施例を示す文字認識装置のブロック構成図で、入力パターンメモリ部１、前処理部２、特徴抽出部３、大分類部７、大分類辞書８、識別部９、識別辞書１０、判定部１１、詳細識別部１２、詳細識別辞書６、認識結果メモリ部１３、制御部１４から成る。 FIG. 6 is a block diagram of a character recognition apparatus according to an embodiment of the present invention. The input pattern memory unit 1, the preprocessing unit 2, the feature extraction unit 3, the large classification unit 7, the large classification dictionary 8, the identification unit 9, It comprises an identification dictionary 10, a determination unit 11, a detailed identification unit 12, a detailed identification dictionary 6, a recognition result memory unit 13, and a control unit 14.

入力パターンメモリ部１はスキャナ、テレビカメラ等の入力装置により文字パターンを取り込み、前処理部２は正規化・雑音除去等を行い、特徴抽出部３は認識に使用するための特徴を入力文字パターンから抽出し、大分類部７は特徴抽出部３より得られた前記入力文字パターンの特徴と大分類辞書８に蓄積されている標準パターンの特徴との近さの尺度をカテゴリ毎に計算し、該尺度を昇順に並べて候補とともに識別部９に出力する。識別部９は送出された候補に対し、前記入力文字パターンの特徴と識別辞書１０に蓄積されている標準パターンの特徴との近さの尺度をカテゴリ毎に計算し、昇順に並べた該尺度と該尺度に対応する候補を識別結果とし判定部１１に出力する。判定部１１は識別部９から出力された結果の信頼性を予め定めた条件式で判定し、条件を満足した場合は識別結果が認識結果メモリ部１３に送出され格納される。条件を満足しなかった場合は、識別部９で得られた候補は詳細識別部１２に送出される。詳細識別部１２は送出された候補に対し、前記入力文字パターンと詳細識別辞書６に格納されているカテゴリ集合に属するカテゴリとの近さの尺度をカテゴリ毎に計算し、昇順に並べた該尺度と該尺度に対応する候補を識別結果とし認識結果メモリ部１３に出力する。 The input pattern memory unit 1 captures a character pattern by an input device such as a scanner or a TV camera, the pre-processing unit 2 performs normalization, noise removal, and the like, and the feature extraction unit 3 inputs features to be used for recognition. The large classification unit 7 calculates, for each category, a measure of the closeness between the features of the input character pattern obtained from the feature extraction unit 3 and the features of the standard pattern stored in the large classification dictionary 8. The scales are arranged in ascending order and output to the identification unit 9 together with the candidates. The identification unit 9 calculates, for each category, a measure of the closeness between the features of the input character pattern and the features of the standard pattern stored in the identification dictionary 10 for the sent candidates, and the measures arranged in ascending order. Candidates corresponding to the scale are output to the determination unit 11 as identification results. The determination unit 11 determines the reliability of the result output from the identification unit 9 using a predetermined conditional expression. If the condition is satisfied, the identification result is transmitted to the recognition result memory unit 13 and stored. If the condition is not satisfied, the candidate obtained by the identification unit 9 is sent to the detailed identification unit 12. The detailed identification unit 12 calculates, for each category, a measure of the proximity between the input character pattern and the category belonging to the category set stored in the detailed identification dictionary 6 for the sent candidates, and the measures arranged in ascending order. And candidates corresponding to the scale are output to the recognition result memory unit 13 as identification results.

次に、判定部１１の動作を説明する。まず、識別部９からの出力から第1位候補Ｃ_ｔとその識別尺度ｄ_ｔ、第２位候補Ｃ_ｓとその識別尺度ｄ_ｓ（ここでｄ_ｔ≦ｄ_ｓ）を取り出す。次に、判定用テーブルからＣ_ｔに対応する判定用閾値θ_ｔ1，θ_ｔ２を読み出し、次の式（２）、式（３）をともに満足するとき識別結果は信頼性が高いと判断する。
ｄ_ｔ≦θ_ｔ1 （２）
ｄ_ｓ−ｄ_ｔ≧θ_ｔ２（３）
式（２）は第１位候補Ｃ_ｔの該識別尺度ｄ_ｔが判定用閾値θ_ｔ1以下の値であることを表し、式（３）は第１位候補Ｃ_ｔのとる識別尺度ｄ_ｔと第２位候補Ｃ_ｓのとる識別尺度ｄ_ｓとの差Δｄが判定用閾値θ_ｔ２以上の値であることを表している。判定用閾値θ_ｔ1，θ_ｔ２は式（２）、式（３）の条件を満足したときの誤読率が予め定めた値以下となるよう大量の学習パターンを用いた分析により設定されている。
信頼性が高いと判定された場合は、識別で得られた候補列および該候補列の識別尺度が認識結果メモリ部１３に送出される。式（２）、式（３）の少なくともひとつの条件を満足しなかった場合は、判定部１１は信頼性が高いと判定せず識別で得られた候補列は詳細識別部１２に転送され、詳細識別部１２で得られた候補列と識別尺度が認識結果として認識結果メモリ部１３に出力される。 Next, the operation of the determination unit 11 will be described. First, the first candidate C _t and its identification scale d _t , the second candidate C _s and its identification scale d _s (where d _t ≦ d _s ) are extracted from the output from the identification unit 9. Next, determination threshold values θ _t1 and θ _t2 corresponding to C _t are read from the determination table, and when both the following expressions (2) and (3) are satisfied, it is determined that the identification result is highly reliable.
d _t ≦ θ _t1 (2)
d _s −d _t ≧ θ _t2 (3)
Equation (2) represents that the identification measure d _t is greater than the judgment threshold theta _t1 values of the first candidate C _t, Equation (3) is an identification measure d _t take the first of the candidate C _t This indicates that the difference Δd with respect to the discrimination scale d _s taken by the second candidate C _s is a value equal to or greater than the determination threshold θ _t2 . The determination threshold values θ _t1 and θ _t2 are set by analysis using a large number of learning patterns so that the misreading rate when the conditions of the expressions (2) and (3) are satisfied is not more than a predetermined value.
If it is determined that the reliability is high, the candidate string obtained by the identification and the identification scale of the candidate string are sent to the recognition result memory unit 13. If at least one of the conditions of Expression (2) and Expression (3) is not satisfied, the determination unit 11 does not determine that the reliability is high, and the candidate string obtained by the identification is transferred to the detailed identification unit 12, Candidate strings and identification scales obtained by the detailed identification unit 12 are output to the recognition result memory unit 13 as recognition results.

次に、詳細識別部１２の動作を図７を用いて説明する。図７は本発明の一実施例を示す詳細識別部１２の機能ブロック図であって、カテゴリ集合読出回路１２１、詳細識別用特徴読出回路１２２、詳細識別尺度計算回路１２３、ソート回路１２４から成る。 Next, the operation of the detailed identification unit 12 will be described with reference to FIG. FIG. 7 is a functional block diagram of the detailed identification unit 12 showing an embodiment of the present invention, which comprises a category set readout circuit 121, a detailed identification feature readout circuit 122, a detailed identification scale calculation circuit 123, and a sort circuit 124.

カテゴリ集合読出回路１２１は、識別部９から送出され判定部１１経由で転送されてきた情報の中から第1位候補Ｃ_ｔを取り出し、詳細識別辞書６の対象カテゴリを探索しＣ_ｔの属するカテゴリ集合Ω_ｔを検出し、Ω_ｔの要素である対象カテゴリ｛Ｃ_ｔ１，Ｃ_ｔ２，．．．．, Ｃ_ｔL(ｔ)｝を得る。 Category set read circuit 121 takes out the first of the candidate C _t from the information transferred via which the determination unit 11 is sent from the identification unit 9, belongs to explore target category details identification dictionary 6 C _t Category detecting a set Omega _t, target category is an element of _{_{_{Ω t {C t1, C t2}}} ,. . . . , C _{tL (t)} }.

詳細識別用特徴読出回路１２２は、詳細識別辞書６において前記Ω_ｔが検出された後で、前記Ω_ｔに対応する詳細識別用特徴｛ｆ_ｔ１，ｆ_ｔ２，．．．．, ｆ_ｔL(ｔ)｝を取り出す。 The detailed identification feature readout circuit 122 detects the detailed identification features {f _t1 , f _t2 , _... Corresponding to the Ω _t after the Ω _t is detected in the detailed identification dictionary 6. . . . , f _{tL (t)} }.

詳細識別尺度計算回路１２３は、識別辞書１０における対象カテゴリ｛Ｃ_ｔ１，Ｃ_ｔ２，．．．．, Ｃ_ｔL(ｔ)｝の標準パターンの中から詳細識別用特徴｛ｆ_ｔ１，ｆ_ｔ２，．．．．, ｆ_ｔL(ｔ)｝の特徴番号で指定される特徴を読み出し、入力文字パターンから得られた特徴との間で詳細識別尺度の計算を行い、各対象カテゴリにおける詳細識別尺度｛ｐ_ｔ１，ｐ_ｔ２，．．．．, ｐ_ｔL(ｔ)｝を得る。 The detailed identification scale calculation circuit 123 performs the target category {C _t1 , C _t2,. . . . , C _{tL (t)} }, the detailed identification features {f _t1 , f _t2,. . . . , f _{tL (t)} }, the feature specified by the feature number is read out, and the detailed identification measure is calculated with the feature obtained from the input character pattern, and the detailed identification measure {p _t1 , p in each target category is calculated. _t2,. . . . , p _{tL (t)} }.

ソート回路１２４は、対象カテゴリ｛Ｃ_ｔ１，Ｃ_ｔ２，．．．．, Ｃ_ｔL(ｔ)｝の詳細識別尺度｛ｐ_ｔ１，ｐ_ｔ２，．．．．, ｐ_ｔL(ｔ)｝を昇順に並べ替え、対象カテゴリと詳細識別尺度とを対にして認識結果メモリ部１３に出力する。 The sort circuit 124 includes target categories {C _t1 , C _t2,. . . . , C _{tL (t)} }, a detailed discriminant measure {p _t1 , p _t2,. . . . , p _{tL (t)} } are sorted in ascending order, and the target category and the detailed identification scale are paired and output to the recognition result memory unit 13.

以上述べた図６の各処理の制御は制御部１４からの信号によって行われる。 Control of each process of FIG. 6 described above is performed by a signal from the control unit 14.

本発明に係るパターン認識方法の一実施例である図６の動作を図８のフローチャートを用いて説明する。文字パターンはスキャナ、テレビカメラ等の入力装置により入力パターンメモリ部１に取り込まれ（ステップ３０１）、前処理部２により正規化・雑音除去等の前処理が施され（ステップ３０２）、特徴抽出部３により認識に使用するための特徴が入力文字パターンから抽出される（ステップ３０３）。得られた特徴は大分類部７に送られ、前記入力文字パターンの特徴と大分類辞書５に蓄積されている標準パターンの特徴との近さの尺度をカテゴリ毎に計算し、該尺度を昇順に並べて候補とともに識別部９に出力される（ステップ３０４）。送出された候補に対し、識別部９は前記入力文字パターンの特徴と識別辞書１０に蓄積されている標準パターンの特徴との近さの尺度をカテゴリ毎に計算し、昇順に並べた該尺度と該尺度に対応する候補は識別結果として判定部１１に出力される（ステップ３０５）。判定部１１は識別部９の信頼性を予め定めた条件式で判定し（ステップ３０６）、条件を満足した場合は識別結果が認識結果メモリ部１３に送出され格納され（ステップ３１１）、満足しない場合は、識別部９で得られた候補は詳細識別部１２に送出される。
詳細識別部１２のカテゴリ集合読出回路１２１は、識別部９から送出された情報の中から第1位候補Ｃ_ｔを取り出し、詳細識別辞書６に格納されている対象カテゴリを探索しＣ_ｔの属するカテゴリ集合Ω_ｔを検出し（ステップ３０７）、Ω_ｔの要素｛Ｃ_ｔ１，Ｃ_ｔ２，．．．．, Ｃ_ｔL(ｔ)｝を詳細識別対象カテゴリとする（ステップ３０８）。詳細識別用特徴読出回路１２２は、前記Ω_ｔに対応する詳細識別用特徴｛ｆ_ｔ１，ｆ_ｔ２，．．．．, ｆ_ｔL(ｔ)｝を取り出す（ステップ３０９）。詳細識別尺度計算回路１２３は、識別辞書１０における対象カテゴリ｛Ｃ_ｔ１，Ｃ_ｔ２，．．．．, Ｃ_ｔL(ｔ)｝の標準パターンの中から詳細識別用特徴｛ｆ_ｔ１，ｆ_ｔ２，．．．．, ｆ_ｔL(ｔ)｝の特徴番号で指定される特徴を読み出し、入力文字パターンから得られた特徴との間で詳細識別尺度の計算を行い（ステップ３１０）、対象カテゴリと詳細識別尺度とが対となった識別結果が認識結果メモリ部１３に出力される（ステップ３１１）。 The operation of FIG. 6 as an embodiment of the pattern recognition method according to the present invention will be described with reference to the flowchart of FIG. The character pattern is taken into the input pattern memory unit 1 by an input device such as a scanner or a TV camera (step 301), and preprocessing such as normalization and noise removal is performed by the preprocessing unit 2 (step 302). 3, features to be used for recognition are extracted from the input character pattern (step 303). The obtained features are sent to the major classification unit 7, where a measure of the closeness between the features of the input character pattern and the features of the standard pattern stored in the major classification dictionary 5 is calculated for each category, and the measures are in ascending order. Are output to the identification unit 9 together with the candidates (step 304). For the sent candidates, the identification unit 9 calculates a measure of the closeness between the features of the input character pattern and the features of the standard pattern stored in the identification dictionary 10 for each category, and the measures arranged in ascending order. Candidates corresponding to the scale are output to the determination unit 11 as identification results (step 305). The determination unit 11 determines the reliability of the identification unit 9 using a predetermined conditional expression (step 306). If the condition is satisfied, the identification result is transmitted to the recognition result memory unit 13 and stored (step 311), which is not satisfied. In this case, the candidates obtained by the identification unit 9 are sent to the detailed identification unit 12.
Category set read circuit 121 of the detailed identification unit 12 takes out the first of the candidate C _t from the information sent from the identification unit 9, to the genus of searching the target category that is stored in the detailed identification dictionary 6 C _t detecting a category set Omega _t (step 307), the elements of _{_{_{Ω t {C t1, C t2}}} ,. . . . , C _{tL (t)} } is a detailed identification target category (step 308). More discriminating feature reading circuit 122, the detailed identification features corresponding to _{_{_{Ω t {f t1, f t2}}} ,. . . . , f _{tL (t)} } is extracted (step 309). The detailed identification scale calculation circuit 123 performs the target category {C _t1 , C _t2,. . . . , C _{tL (t)} }, the detailed identification features {f _t1 , f _t2,. . . . , f _{tL (t)} }, the feature specified by the feature number is read out, and a detailed identification measure is calculated with the feature obtained from the input character pattern (step 310). The paired identification results are output to the recognition result memory unit 13 (step 311).

上述したように、本実施形態では遺伝的アルゴリズムにより、一定値以上の認識率を有する遺伝子の世代交代を行うことにより識別に使用する特徴を削減する仕組みとなっている。従って、認識を用いない差分特徴、主成分分析、判別分析等の方法よりも認識性能を向上させることができる。 As described above, in this embodiment, the genetic algorithm has a mechanism for reducing the characteristics used for identification by performing generational change of genes having a recognition rate of a certain value or more. Accordingly, the recognition performance can be improved as compared with methods such as differential features, principal component analysis, and discriminant analysis that do not use recognition.

また、本発明の図1および図６に示す各構成要素をプログラムとして構築し、ディスク装置や、フレキシブルディスク、ＣＤ−ＲＯＭ等の可搬記憶媒体に格納しておき、文字認識を行うときに前記プログラムを前記可搬記憶媒体が接続可能なコンピュータあるいは文字認識装置にインストールすることにより、容易に本発明を実現することが可能である。 1 and 6 of the present invention is constructed as a program and stored in a portable storage medium such as a disk device, a flexible disk, or a CD-ROM, and the character recognition is performed when the character recognition is performed. By installing the program in a computer or a character recognition device to which the portable storage medium can be connected, the present invention can be easily realized.

以上、本発明を実施形態に基づき具体的に説明したが、本発明は前記実施形態に限定されるものではなく、その要旨を逸脱しない範囲において種々変更可能であることは言うまでもない。 Although the present invention has been specifically described above based on the embodiments, it is needless to say that the present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the scope of the invention.

本発明の文字認識辞書作成装置の一実施例のブロック図である。It is a block diagram of one Example of the character recognition dictionary creation apparatus of this invention. 図1に示す文字認識辞書作成装置に使用されている特徴選択部４のブロック図である。It is a block diagram of the feature selection part 4 used for the character recognition dictionary creation apparatus shown in FIG. 図２の初期遺伝子集団作成回路４１によって作成された遺伝子集団の一実施例を示す図である。It is a figure which shows one Example of the gene group produced by the initial gene group production circuit 41 of FIG. 図1に示す詳細識別辞書６の一実施例を示す図である。It is a figure which shows one Example of the detailed identification dictionary 6 shown in FIG. 本発明の文字認識辞書作成方法の一実施例のフローチャートである。It is a flowchart of one Example of the character recognition dictionary creation method of this invention. 本発明の文字認識装置の一実施例のブロック図である。It is a block diagram of one Example of the character recognition apparatus of this invention. 図６に示す文字認識辞書作成装置に使用されている詳細識別部１２のブロック図である。It is a block diagram of the detailed identification part 12 used for the character recognition dictionary creation apparatus shown in FIG. 本発明の文字認識方法の一実施例のフローチャートである。It is a flowchart of one Example of the character recognition method of this invention.

Explanation of symbols

１入力パターンメモリ部
２前処理部
３特徴抽出部
４特徴選択部
５詳細識別辞書作成部
６詳細識別辞書
７大分類部
８大分類辞書
９識別部
１０識別辞書
１１判定部
１２詳細識別部
１３認識結果メモリ部
１４制御部
４１初期遺伝子集団作成回路
４２識別尺度計算回路
４３認識率計算回路
４４親遺伝子候補選択回路
４５適応度計算回路
４６親遺伝子選択回路
４７交叉操作回路
４８突然変異操作回路
４９世代交代継続判定回路
１２１カテゴリ集合読出回路
１２２詳細識別用特徴読出回路
１２３詳細識別尺度計算回路
１２４ソート回路 DESCRIPTION OF SYMBOLS 1 Input pattern memory part 2 Preprocessing part 3 Feature extraction part 4 Feature selection part 5 Detailed identification dictionary creation part 6 Detailed identification dictionary 7 Major classification part 8 Major classification dictionary 9 Identification part 10 Identification dictionary 11 Determination part 12 Detailed identification part 13 Recognition Result memory unit 14 Control unit 41 Initial gene group creation circuit 42 Identification scale calculation circuit 43 Recognition rate calculation circuit 44 Parent gene candidate selection circuit 45 Fitness calculation circuit 46 Parent gene selection circuit 47 Crossover operation circuit 48 Mutation operation circuit 49 Generation change Continuation determination circuit 121 Category set readout circuit 122 Detailed identification feature readout circuit 123 Detailed identification scale calculation circuit 124 Sort circuit

Claims

In a character recognition dictionary creating method for creating a standard pattern of the category by using a learning data obtained by collecting a large number of character patterns per category, extracting characteristics of the character pattern and performing statistical processing,
A plurality of genes consisting of bit strings in which the number of binary bits of “0” and “1” are connected for the number of features are prepared, and “0”, “ Create an initial gene population by assigning a value of 1 "
The feature of the standard pattern of the designated similar character and the feature of the character pattern of the learning data of the similar character using only the feature corresponding to the address where the bit takes a value of “1” in the individual of the initial gene population The process of calculating the discrimination scale between them and calculating the recognition rate by the individual is performed on all solids,
Create a set of only solids whose recognition rate in the solid is equal to or higher than a predetermined recognition rate as a parent gene candidate set,
For each individual of the parent gene candidate set, find the fitness set from the viewpoint of maintaining the recognition rate predetermined in the previous period with a small number of features,
Create a parent gene set by selecting a solid that will be the next generation parent gene using the fitness of each individual of the parent gene candidate set,
Two sets of solids are taken out from the parent gene set, two sets of solid genes are exchanged by crossover, and two sets of new solids are generated and returned to the parent gene set for a predetermined number of times,
Taking a solid from the parent gene set, causing a mutation that inverts the value of “0”, “1” to a part of the gene with a predetermined probability,
The above-mentioned series of recognition-fitness calculation-selection-crossover-mutation processing by an individual satisfies the predetermined convergence criteria or reaches a predetermined maximum number of generations. The generation change is repeated until the condition of whether or not is satisfied, and the solid with the maximum fitness in the finally obtained parent gene set is taken out,
Creating a detailed identification dictionary of the similar characters using the category name of the designated similar characters and the feature number corresponding to the address having a value of “1” from the object having the maximum fitness;
Character recognition dictionary creation method characterized by

The fitness is the total number of “1” s that an individual has, and the maximum number of features is obtained among the solids of the generation, and the number of features that each solid can reduce from the maximum number of features is the maximum number of features. 2. The character recognition dictionary creation method according to claim 1, wherein the feature number reduction ratio is defined by the divided value.

In the character recognition method of outputting the category to which the input character pattern belongs as a candidate string in ascending order of the scale using a measure of proximity obtained by calculating between the features of the input character pattern and the features of the standard dictionary,
From the identification scale of the first candidate obtained by the identification and the identification scale of the second candidate, it is determined whether or not the first candidate is a correct category,
If it is determined that the reliability is high, the first candidate is output as a recognition result, and if it is not determined that the reliability is high, the candidate obtained by identification is sent as a recognition target to the subsequent detailed identification unit. And
The detailed identification unit searches the detailed identification dictionary to obtain a category set to which the first candidate belongs and a feature for detailed identification used for identification of the category set,
Read the feature specified by the feature number of the feature for detailed identification from the identification dictionary of the category set, perform the detailed identification using the feature and the feature for detailed identification in the feature obtained from the input character pattern, Outputting the obtained results as recognition results;
Character recognition method characterized by

In a character recognition dictionary creation device that creates a standard pattern of the category by using a learning data collected a large number of character patterns per category, extracting features of the character pattern and performing statistical processing,
A plurality of genes consisting of bit strings in which the number of binary bits of “0” and “1” are connected for the number of features are prepared, and “0”, “ Create an initial gene population by assigning a value of 1 "
The feature of the standard pattern of the designated similar character and the feature of the character pattern of the learning data of the similar character using only the feature corresponding to the address where the bit takes a value of “1” in the individual of the initial gene population The process of calculating the discrimination scale between them and calculating the recognition rate by the individual is performed on all solids,
Create a set of only solids whose recognition rate in the solid is equal to or higher than a predetermined recognition rate as a parent gene candidate set,
For each individual of the parent gene candidate set, find the fitness set from the viewpoint of maintaining the recognition rate predetermined in the previous period with a small number of features,
Create a parent gene set by selecting a solid that will be the next generation parent gene using the fitness of each individual of the parent gene candidate set,
Two sets of solids are taken out from the parent gene set, two sets of solid genes are exchanged by crossover, and two sets of new solids are generated and returned to the parent gene set for a predetermined number of times,
Taking a solid from the parent gene set, causing a mutation that inverts the value of “0”, “1” to a part of the gene with a predetermined probability,
The above-mentioned series of recognition-fitness calculation-selection-crossover-mutation processing by an individual satisfies the predetermined convergence criteria or reaches a predetermined maximum number of generations. A detailed identification dictionary creating means for repeating generation change until the condition of whether or not is satisfied, and extracting a solid with the maximum fitness in the finally obtained parent gene set,
Detailed identification dictionary means for creating a detailed identification dictionary of the similar character by using the category name of the designated similar character and the feature number corresponding to the address having a value of “1” from the object having the maximum fitness ,
A character recognition dictionary creating apparatus comprising:

The fitness is the total number of “1” s that an individual has, and the maximum number of features is obtained among the solids of the generation, and the number of features that each solid can reduce from the maximum number of features is the maximum number of features. The character recognition dictionary creation device according to claim 4, wherein a feature number reduction ratio defined by the divided value is used.

In a character recognition device that outputs a category to which the input character pattern belongs as a candidate string in ascending order of the scale using a measure of proximity obtained by calculating between the features of the input character pattern and the features of the standard pattern,
From the identification scale of the first candidate obtained by the identification and the identification scale of the second candidate, it is determined whether or not the first candidate is a correct category,
If it is determined that the reliability is high, the first candidate is output as a recognition result, and if it is not determined that the reliability is high, the candidate obtained by identification is sent as a recognition target to the subsequent detailed identification unit. Determination means to perform,
Search the detailed identification dictionary to obtain the category set to which the first candidate belongs and the characteristics used for detailed identification of the category set,
Read the feature specified by the feature number of the feature for detailed identification from the identification dictionary of the category set, perform the detailed identification using the feature and the feature for detailed identification in the feature obtained from the input character pattern, Detailed identification means for outputting the obtained results as recognition results;
A character recognition device comprising:

A storage medium storing a program for causing a computer to execute the processing steps in the character recognition dictionary creating method according to claim 1.

5. A storage medium characterized in that a program for causing a computer to execute the processing steps in the character recognition method according to claim 4 is stored in a computer-readable medium.