JP5240049B2

JP5240049B2 - Character identification program, character identification device

Info

Publication number: JP5240049B2
Application number: JP2009108380A
Authority: JP
Inventors: 浩明武部; 悦伸堀田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2009-04-27
Filing date: 2009-04-27
Publication date: 2013-07-17
Anticipated expiration: 2029-04-27
Also published as: JP2010257339A

Description

本発明は、文字識別プログラム、文字識別装置に関し、例えば、類似する文字を高精度に認識する文字識別プログラム、文字識別装置に関する。 The present invention relates to a character identification program and a character identification device, for example, a character identification program and a character identification device that recognize a similar character with high accuracy.

一般に、イメージスキャナーで取り込まれる文書に含まれる文字をテキストデータに変換する手法として光学文字認識（以下、ＯＣＲ:Optical Character Recognitionとする）が広く知られている。 In general, optical character recognition (hereinafter referred to as OCR: Optical Character Recognition) is widely known as a method for converting characters included in a document captured by an image scanner into text data.

ＯＣＲでは、文字認識を高精度に認識することが求められ、特に、類似文字を高精度に識別することが求められている。この類似文字とは、例えば、「犬」と「大」が挙げられ、「犬」と認識すべき文字を「大」と認識すると誤読となってしまう。 In OCR, it is required to recognize character recognition with high accuracy, and in particular, it is required to identify similar characters with high accuracy. Examples of the similar characters include “dog” and “large”. If a character that should be recognized as “dog” is recognized as “large”, it is misread.

この類似文字を識別する方式の一例として、マハラノビス距離や投影距離法と、各類似文字が示す特徴の差異とを用いることで、任意の文字が与えられた場合に、どちらの類似文字に類似するかを判定する混合識別方式がある。 As an example of a method for identifying this similar character, the Mahalanobis distance and the projection distance method and the difference in characteristics indicated by each similar character are used, and when any character is given, it is similar to which similar character. There is a mixed identification method for determining whether or not.

上述した「類似文字の特徴の差異」とは、各類似文字の文字カテゴリから抽出される情報に基づいて求められる。この「文字カテゴリ」とは、ある所定の文字における、文字のゆらぎ、文字線の傾き、文字面積等をパラメーターとする特徴ベクトルを用いて定義される。 The above-mentioned “difference in characteristics of similar characters” is obtained based on information extracted from the character category of each similar character. This “character category” is defined using a feature vector whose parameters are, for example, character fluctuation, character line inclination, and character area of a predetermined character.

一般に、特徴ベクトルは、特徴空間上に示されることから、上述した「文字カテゴリ」は、特徴空間上の分布として示されることになる。これを文字カテゴリ分布と呼ぶことにする。なお、文字のゆらぎ、文字線の傾き、文字面積等といったパラメーターを便宜的に「特徴量」とする。 In general, since the feature vector is shown on the feature space, the above-mentioned “character category” is shown as a distribution on the feature space. This is called a character category distribution. For convenience, parameters such as character fluctuation, character line inclination, and character area are referred to as “features”.

続いて、上述した混合識別方式について具体的に説明する。図１２は、従来の混合識別方式を説明するための図である。上述したように、特徴空間は、高次元で定義されるが、図１２の特徴空間１０は、説明の便宜上、例えば、文字線の傾きで定義される特徴量と、文字面積で定義される特徴量とで定義される次元にて表すものとする。 Next, the above-described mixed identification method will be specifically described. FIG. 12 is a diagram for explaining a conventional mixed identification method. As described above, the feature space is defined in a high dimension, but the feature space 10 in FIG. 12 is, for convenience of explanation, for example, a feature amount defined by the inclination of a character line and a feature defined by a character area. It shall be expressed in a dimension defined by quantity.

まず、図１２の文字カテゴリ分布Ａと文字カテゴリ分布Ｂ（以下、単に、カテゴリ分布Ａ、Ｂとする）について説明する。このカテゴリ分布Ａは、文字線の傾きや文字面積等が異なる同意の文字が集まったカテゴリ分布を示す。そして、カテゴリ分布Ｂについても同様とする。なお、「同意の文字」とは、同じ文字でフォントやポイントが異なるもの、あるいは、スキャナ読み取りをした画像で、同じ文字であるものの読み取り誤差やごみの有無などから画像として違いのある文字を示す。 First, the character category distribution A and the character category distribution B (hereinafter simply referred to as category distributions A and B) in FIG. 12 will be described. This category distribution A indicates a category distribution in which agreed characters having different character line inclinations, character areas, and the like are gathered. The same applies to the category distribution B. Note that “consent characters” are the same characters with different fonts and points, or images that have been scanned by the scanner, but have the same characters due to reading errors, presence of dust, etc. .

また、カテゴリＡとカテゴリＢは互いに類似し、例えば、カテゴリＡが「犬（いぬ）」を示す場合、カテゴリＢは「大（だい）」を示す。さらに、カテゴリＡが「烏（からす）」を示す場合、カテゴリＢは「鳥（とり）」を示す。 Further, the category A and the category B are similar to each other. For example, when the category A indicates “dog”, the category B indicates “large”. Further, when category A indicates “crow”, category B indicates “bird”.

そして、任意の文字が入力された場合、入力された文字が、カテゴリＡに属するのか、もしくは、カテゴリＢに属するのかを識別する類似文字識別を行う。 When an arbitrary character is input, similar character identification is performed to identify whether the input character belongs to category A or category B.

このように、任意の文字を、互いに類似する文字カテゴリのどちらに属するかを識別することを「類似文字識別」として、以下説明する。 In this way, identification of which arbitrary character belongs to which of the similar character categories will be described below as “similar character identification”.

この「類似文字識別」が行われる場合に、例えば、原関数をマハラノビス距離で定義し、上述した類似文字が示す特徴の差異を補正項として加えた混合識別関数を用いて行われる。この補正項の算出について、以下説明する。 When this “similar character identification” is performed, for example, the original function is defined by the Mahalanobis distance, and a mixed identification function in which the above-described feature difference indicated by the similar characters is added as a correction term is performed. The calculation of this correction term will be described below.

まず、類似文字の特徴の差異をカテゴリＡとカテゴリＢの各平均ベクトル（ａ、ｂ）からなる差分ベクトルとして捉え、識別対象となる任意の文字：入力パターンｐが与えられたとする。 First, it is assumed that the difference between features of similar characters is regarded as a difference vector composed of average vectors (a, b) of category A and category B, and an arbitrary character to be identified: input pattern p is given.

この場合、入力パターンｐから差分ベクトルに射影した値：ｐ１を求める。そして、ｐ１と各文字カテゴリの平均ベクトルから求められるｍ１、ｍ２が、カテゴリＡ、カテゴリＢの識別関数に対する補正項となる。 In this case, a value p1 projected from the input pattern p onto the difference vector is obtained. Then, m1 and m2 obtained from p1 and the average vector of each character category are correction terms for the category A and category B discrimination functions.

そして、図１２に示した混合識別関数の補正項をｍ１とした混合識別関数により、カテゴリＡと入力パターンｐとの識別関数値：ｆ_Ａ（ｐ）を求め、同様に、補正項をｍ２とした混合識別関数により、カテゴリＢと入力パターンｐとの識別関数値：ｆ_Ｂ（ｐ）を求める。 Then, the discriminant function value: f _A (p) between the category A and the input pattern p is obtained by the mixed discriminant function in which the correction term of the mixed discriminant function shown in FIG. 12 is m1, and similarly, the correction term is m2. The discriminant function value: f _B (p) between the category B and the input pattern p is obtained by the mixed discriminant function.

そして、算出した各識別関数値に基づいて、入力パターンｐがカテゴリＡ、カテゴリＢのどちらに属するのかを識別し、入力パターンｐの属するカテゴリが定められる。これは、入力パターンｐに対し、類似文字の特徴の差異を用いた識別を行ったことに相当する。 Then, based on each calculated identification function value, it is identified whether the input pattern p belongs to category A or category B, and the category to which the input pattern p belongs is determined. This is equivalent to identifying the input pattern p using the difference in characteristics of similar characters.

なお、類似文字の特徴の差異を用いた識別の一例として、差分ベクトルに加えて、強調ベクトルを用いて類似文字の識別を行う方法がある。この強調ベクトルとは、入力パターンの値と、類似文字の識別境界面における法線方向のベクトルとからなるベクトルを示す（例えば、特許文献１参照）。 In addition, as an example of identification using a difference in characteristics of similar characters, there is a method of identifying similar characters using an emphasis vector in addition to a difference vector. The enhancement vector indicates a vector composed of an input pattern value and a vector in the normal direction on the identification boundary surface of similar characters (see, for example, Patent Document 1).

特開２００２−１８３６６４号公報JP 2002-183664 A

しかしながら、上述した従来の技術では、類似文字の特徴の差異を高精度に識別することができないという問題があった。 However, the above-described conventional technique has a problem in that it is not possible to identify a difference in characteristics of similar characters with high accuracy.

例えば、図１２に示した例では、カテゴリＡとカテゴリＢの各平均ベクトルからなる差分ベクトルを用いて類似文字の特徴の差異を算出したが、類似文字の特徴の差異は、特徴空間上に広がっており、差分ベクトル以外のベクトルで示される他の情報が特徴空間上に存在している。 For example, in the example illustrated in FIG. 12, the difference between the features of similar characters is calculated using a difference vector composed of the average vectors of category A and category B. However, the difference in features of the similar characters spreads over the feature space. In addition, other information indicated by vectors other than the difference vector exists in the feature space.

したがって、差分ベクトルを用いて、類似文字の特徴の差異を算出しても、差分ベクトル以外のベクトルで示される他の情報が使用されていない。その結果、識別に必要な他の情報が不足しているため、類似文字を高精度に認識するには限界がある。 Therefore, even if the difference between the features of similar characters is calculated using the difference vector, other information indicated by the vector other than the difference vector is not used. As a result, since other information necessary for identification is lacking, there is a limit to recognizing similar characters with high accuracy.

開示の技術は、上記に鑑みてなされたものであって、類似文字カテゴリ間の差異を高精度に識別する文字識別プログラム、文字識別装置を提供することを目的とする。 The disclosed technology has been made in view of the above, and an object of the present invention is to provide a character identification program and a character identification device that identify differences between similar character categories with high accuracy.

本願の開示する文字識別プログラムは、特徴空間上において、形状が異なり同意の文字を含む文字カテゴリを、対比する文字ごとに特定し、各文字カテゴリに共通する情報を共通分布として求める共通分布処理手順と、前記共通分布と各文字カテゴリ分布を基にして各文字カテゴリ分布を修正する修正手順と、識別対象となる文字を取得した場合に、当該文字の前記特徴空間上の位置と、前記修正手順により修正された各文字カテゴリ分布とを基にして、前記文字を識別する文字識別手順とをコンピュータに実行させる。 The character identification program disclosed in the present application specifies, for each character to be compared, a character category having a different shape and including an agreed character on the feature space, and obtains information common to each character category as a common distribution processing procedure A correction procedure for correcting each character category distribution based on the common distribution and each character category distribution, and when a character to be identified is acquired, the position of the character in the feature space, and the correction procedure The computer is caused to execute a character identification procedure for identifying the character based on the character category distribution corrected by the above.

本願の開示する文字識別プログラムによれば、類似文字カテゴリ間の差異を高精度に識別するという効果を奏する。 According to the character identification program disclosed in the present application, there is an effect that a difference between similar character categories is identified with high accuracy.

図１は、実施例の概要を説明するための図である。FIG. 1 is a diagram for explaining the outline of the embodiment. 図２は、実施例１に係る文字識別装置を示す図である。FIG. 2 is a diagram illustrating the character identification device according to the first embodiment. 図３は、類似文字テーブルのデータ構造の一例を説明するための図である。FIG. 3 is a diagram for explaining an example of the data structure of the similar character table. 図４は、文字認識辞書のデータ構造の一例を説明するための図である。FIG. 4 is a diagram for explaining an example of the data structure of the character recognition dictionary. 図５は、類似文字の識別処理を説明するための図である。FIG. 5 is a diagram for explaining similar character identification processing. 図６は、実施例１に係る文字識別装置の処理を示すフローチャートである。FIG. 6 is a flowchart illustrating the process of the character identification device according to the first embodiment. 図７は、実施例２に係る文字識別装置を示す図である。FIG. 7 is a diagram illustrating the character identification device according to the second embodiment. 図８は、類似文字認識辞書のデータ構造の一例を示す図である。FIG. 8 is a diagram illustrating an example of the data structure of the similar character recognition dictionary. 図９は、インターフェースの一例を示す図である。FIG. 9 is a diagram illustrating an example of an interface. 図１０は、実施例２に係る文字識別装置の処理を示すフローチャートである。FIG. 10 is a flowchart illustrating the process of the character identification device according to the second embodiment. 図１１は、実施例１に係る文字識別装置を構成するコンピュータのハードウェア構成を示す図である。FIG. 11 is a diagram illustrating a hardware configuration of a computer constituting the character identification device according to the first embodiment. 図１２は、従来の混合識別方式を説明するための図である。FIG. 12 is a diagram for explaining a conventional mixed identification method.

以下に、本願の開示する文字識別プログラム、文字識別装置の実施例を図面に基づいて説明する。なお、この実施例によりこの発明が限定されるものではない。 Embodiments of a character identification program and a character identification device disclosed in the present application will be described below with reference to the drawings. Note that the present invention is not limited to the embodiments.

まず、本実施例１に係る文字識別装置の概要について説明する。実施例１に係る文字識別装置は、最初に認識対象となる任意の文字を含む画像データが与えられ、この任意の文字が、文字カテゴリＡに属するのか、それとも、文字カテゴリＢに属するのか判定を行い、任意の文字に対する識別を行う。なお、文字カテゴリＡと文字カテゴリＢは、互いに類似するものとする。 First, an outline of the character identification device according to the first embodiment will be described. The character identification device according to the first embodiment is initially given image data including an arbitrary character to be recognized, and determines whether the arbitrary character belongs to the character category A or the character category B. To identify any character. Note that the character category A and the character category B are similar to each other.

このように、任意の文字を、互いに類似する文字カテゴリのどちらに属するかを識別することを「類似文字識別」として以下説明する。 In this way, identification of which arbitrary character belongs to which of the similar character categories will be described as “similar character identification”.

この「類似文字識別」を行う際、実施例１に示す文字識別装置は、互いに類似する文字カテゴリ間に共通する情報を特徴空間上の共通分布として求め、求めた共通分布を各類似文字が示す分布に対し、畳み込み積分を行い、畳み込んだ分布を用いて、識別を行う。以下、具体的に図を用いて説明する。 When this “similar character identification” is performed, the character identification device according to the first embodiment obtains information common to similar character categories as a common distribution on the feature space, and each similar character indicates the obtained common distribution. Convolution is performed on the distribution, and identification is performed using the convolved distribution. Hereinafter, it demonstrates concretely using figures.

図１は、実施例の概要を説明するための図である。図１に示した例では、任意の入力パターンｐが、図１２に示した特徴空間１０上に与えられ、混合識別関数を用いて、カテゴリＡもしくはＢのどちらに類似するかの識別を行う。 FIG. 1 is a diagram for explaining the outline of the embodiment. In the example shown in FIG. 1, an arbitrary input pattern p is given on the feature space 10 shown in FIG. 12, and it is discriminated whether it is similar to category A or B using a mixed discriminant function.

なお、カテゴリＡ、Ｂは、従来技術にて説明した文字カテゴリを示すものとし、ある所定の文字における、文字のゆらぎ、文字線の傾き、文字面積等をパラメーターとする特徴ベクトルを用いて定義される。 The categories A and B indicate the character categories described in the related art, and are defined using feature vectors whose parameters are character fluctuation, character line inclination, character area, etc., in a given character. The

一般に、この特徴ベクトルは、特徴空間上に示されることから、上述した「文字カテゴリ」は、特徴ベクトルで定義される特徴空間上の分布として示される。これを文字カテゴリ分布と呼ぶことにする。なお、文字のゆらぎ、文字線の傾き、文字面積等といったパラメーターを便宜的に「特徴量」とする。 In general, since this feature vector is shown on the feature space, the above-mentioned “character category” is shown as a distribution on the feature space defined by the feature vector. This is called a character category distribution. For convenience, parameters such as character fluctuation, character line inclination, and character area are referred to as “features”.

まず、各類似文字カテゴリが示す特徴空間上の分布から共通する形状の共通分布５０を入力パターンｐの値に関らず求める（ステップＳ１０）。なお、共通分布５０内の「ｃ」は、カテゴリＡとカテゴリＢの平均ベクトルが重なっているベクトルを示す。 First, a common distribution 50 having a common shape is obtained from the distribution on the feature space indicated by each similar character category regardless of the value of the input pattern p (step S10). Note that “c” in the common distribution 50 indicates a vector in which the average vectors of category A and category B overlap.

そして、入力パターンｐをガウス分布の広がりとして捉えた場合に、ステップＳ１０で求めた共通分布５０を用いて、入力パターンｐの分布を仮定する（ステップＳ２０）。 Then, when the input pattern p is regarded as the spread of the Gaussian distribution, the distribution of the input pattern p is assumed using the common distribution 50 obtained in step S10 (step S20).

そして、共通分布５０をカテゴリＡとカテゴリＢが示す各分布に対して、カテゴリごとに畳み込み積分を行うことで、カテゴリＡの分布と、カテゴリＢの分布を共通分布５０により変形させる（ステップＳ３０）。 Then, the distribution of category A and the distribution of category B are transformed by the common distribution 50 by performing convolution integration for each category with respect to each distribution indicated by category A and category B (step S30). .

以上のステップＳ１０〜ステップＳ３０は、入力パターンｐのベクトル周辺に共通分布５０を仮定することにより、入力パターンｐの分布形状を考慮した識別を行うことになる。つまり、入力パターンｐの各特徴量を十分に考慮した識別を行うことに相当する。 In the above steps S10 to S30, the common distribution 50 is assumed around the vector of the input pattern p, so that the identification considering the distribution shape of the input pattern p is performed. That is, this is equivalent to performing identification in consideration of each feature amount of the input pattern p.

そして、入力パターンｐが示す分布として仮定された共通分布５０と、カテゴリＡ、カテゴリＢが示す各分布とを正規分布と仮定した場合、カテゴリＡ、カテゴリＢが示す分布に対し、共通分布５０を用いた畳み込み積分を行う。 When the common distribution 50 assumed as the distribution indicated by the input pattern p and the distributions indicated by the categories A and B are assumed to be normal distributions, the common distribution 50 is determined with respect to the distribution indicated by the categories A and B. Perform the convolution integral used.

これは、共通分布５０をガウス分布として捉えた場合に、広がりの大きい方向に対しては、各類似文字カテゴリ間の差異が現れにくく、広がりの小さい方向は各類似文字カテゴリの差異が現れやすいと考え、広がりが小さい方向を重視することになる。したがって、各類似文字の特徴を重視した識別を導くことになる。 This is because, when the common distribution 50 is regarded as a Gaussian distribution, a difference between the similar character categories is less likely to appear in a direction with a large spread, and a difference between the similar character categories is likely to appear in a direction with a small spread. Think and focus on the direction of small spread. Therefore, the identification that emphasizes the feature of each similar character is led.

そして、共通分布５０を用いて変形させたカテゴリＡの分布に基づいて、カテゴリＡに対する混合識別関数（例えば、混合識別関数Ａとする）が求まり、この混合識別関数Ａから識別関数値Ａが算出される。 Based on the distribution of category A deformed using the common distribution 50, a mixed discriminant function for category A (for example, a mixed discriminant function A) is obtained, and an discriminant function value A is calculated from the mixed discriminant function A. Is done.

一方、共通分布５０を用いて変形させたカテゴリＢの分布に基づいて、カテゴリＢに対する混合識別関数（例えば、混合識別関数Ｂとする）が求まり、この混合識別関数Ｂから識別関数値Ｂが算出される。 On the other hand, a mixed discriminant function for category B (for example, a mixed discriminant function B) is obtained based on the distribution of category B deformed using the common distribution 50, and the discriminant function value B is calculated from the mixed discriminant function B. Is done.

そして、識別関数値Ａ、識別関数値Ｂに基づいて、入力パターンｐの識別を行う。図１に示した例では、カテゴリＡと入力パターンｐとの識別関数値は「０．６」となり、カテゴリＢと入力パターンｐとの識別関数値は「０．４」となる。 Then, the input pattern p is identified based on the identification function value A and the identification function value B. In the example shown in FIG. 1, the discriminant function value between the category A and the input pattern p is “0.6”, and the discriminant function value between the category B and the input pattern p is “0.4”.

そして、上述したように、広がりが小さい方向を重視した識別を行っていることから、算出された識別関数値が小さい方が、類似度が高く、入力パターンｐは、カテゴリＢに類似することになる。 As described above, since the identification is performed with an emphasis on the direction in which the spread is small, the smaller the calculated identification function value, the higher the similarity, and the input pattern p is similar to the category B. Become.

このように、任意に与えられた入力文字を共通分布として捉え、その共通分布を類似文字が示す特徴空間上の分布に対し、類似文字ごとに畳み込み積分を行い、畳み込んだ分布を用いて、識別を行うことで、類似文字の差異を高精度に識別することができる。 In this way, arbitrarily given input characters are regarded as a common distribution, the common distribution is subjected to convolution integration for each similar character with respect to the distribution on the feature space indicated by the similar characters, and the convolution distribution is used. By performing the identification, the difference between similar characters can be identified with high accuracy.

次に、実施例１に係る文字識別装置の構成について説明する。実施例１に示す文字識別装置１００は、類似文字を識別する際、類似文字間における共通分布を求め、求めた共通分布を各類似文字が示す特徴空間上の分布に対し、畳み込み積分を行う。 Next, the configuration of the character identification device according to the first embodiment will be described. When identifying a similar character, the character identification device 100 according to the first embodiment obtains a common distribution between similar characters, and performs convolution integration on the distribution in the feature space indicated by each similar character.

さらに、畳み込み積分にて算出した値から固有値、固有ベクトルを算出し、算出した固有値、固有ベクトルにより、擬似ベイズ識別を行う。具体的に図を用いて説明する。図２は、実施例１に係る文字識別装置を示す図である。 Further, eigenvalues and eigenvectors are calculated from values calculated by convolution integration, and pseudo Bayes identification is performed using the calculated eigenvalues and eigenvectors. This will be specifically described with reference to the drawings. FIG. 2 is a diagram illustrating the character identification device according to the first embodiment.

図２に示した文字識別装置１００は、画像入力部１０１と、処理部１０２と、認識結果出力部１０３とを有する。画像入力部１０１は、文字データを含む文字画像を光学的に読取る入力装置で、画像入力部１０１が読込んだ文字画像は、処理部１０２に出力される。 The character identification device 100 illustrated in FIG. 2 includes an image input unit 101, a processing unit 102, and a recognition result output unit 103. The image input unit 101 is an input device that optically reads a character image including character data, and the character image read by the image input unit 101 is output to the processing unit 102.

なお、上述した文字画像に含まれる文字データは、例えば、図１に示した入力パターンｐに相当するものとする。 The character data included in the character image described above corresponds to, for example, the input pattern p shown in FIG.

処理部１０２は、画像入力部１０１から入力された文字画像に対し、従来技術を用いて文字認識を行い、取得した認識結果に対して、さらに、上述した「類似文字識別」を行う処理部で、類似文字テーブル１０２ａと、文字認識辞書１０２ｂと、文字認識処理部１０２ｃとを有する。 The processing unit 102 is a processing unit that performs character recognition on the character image input from the image input unit 101 using conventional techniques, and further performs the above-described “similar character identification” on the acquired recognition result. , A similar character table 102a, a character recognition dictionary 102b, and a character recognition processing unit 102c.

類似文字テーブル１０２ａは、類似文字のリストを記憶するテーブルで、所定の文字に対する類似文字が記憶されている。図３は、類似文字テーブルのデータ構造の一例を示す図である。 The similar character table 102a is a table that stores a list of similar characters, and stores similar characters for a predetermined character. FIG. 3 is a diagram illustrating an example of the data structure of the similar character table.

図３に示したＩＤ：０は、文字カテゴリ「鳥（とり）」を示し、この「鳥」に対応する類似文字数は「１」で、対応する類似文字カテゴリは「烏（からす）」となる。以下、文字カテゴリ「乎」に対応する類似文字カテゴリは「平」となり、「ぱ」には「ば」が対応し、「大」には、「犬」と「太」とが対応する。 ID: 0 shown in FIG. 3 indicates the character category “bird”, the number of similar characters corresponding to this “bird” is “1”, and the corresponding similar character category is “crow”. Become. Hereinafter, the similar character category corresponding to the character category “乎” is “flat”, “pa” corresponds to “ba”, and “large” corresponds to “dog” and “thick”.

そして、上述した類似文字カテゴリが有する類似文字数についても、文字カテゴリに対応する類似文字数と対称的に有する。例えば、「烏（からす）」に類似した文字の個数は１個であり、それは「鳥（とり）」であることが記憶されている。 The number of similar characters included in the similar character category described above is also symmetrical to the number of similar characters corresponding to the character category. For example, the number of characters similar to “crow” is one, and it is stored that it is “bird”.

また、「犬（いぬ）」に類似した文字の個数は２個であり、それは「大（だい）」と「太（ふと）い」であることが記憶されている。 In addition, the number of characters similar to “dog” is two, and it is stored that they are “large” and “fat”.

図２の説明に戻り、文字認識辞書１０２ｂについて説明する。文字認識辞書１０２ｂは、各文字カテゴリの平均ベクトルと、各文字カテゴリに対応する固有値と固有ベクトルを記憶するテーブルである。具体的に図を用いて説明する。 Returning to FIG. 2, the character recognition dictionary 102b will be described. The character recognition dictionary 102b is a table that stores an average vector of each character category, eigenvalues and eigenvectors corresponding to each character category. This will be specifically described with reference to the drawings.

図４は、文字認識辞書のデータ構造の一例を説明するための図である。図４の文字認識辞書１０２ｂは、各類似文字が有する「平均ベクトル」、「固有値」、「固有ベクトル」
を記憶している。 FIG. 4 is a diagram for explaining an example of the data structure of the character recognition dictionary. The character recognition dictionary 102b in FIG. 4 includes an “average vector”, “eigenvalue”, and “eigenvector” that each similar character has.
Is remembered.

そして、図１に示したカテゴリＡ、Ｂを例に挙げると、カテゴリＡに対応する平均ベクトル「ａ」、カテゴリＡの分布に対応する固有値「ｋ１」、固有ベクトル「ｘ１」が記憶されている。 Taking the categories A and B shown in FIG. 1 as an example, an average vector “a” corresponding to the category A, an eigenvalue “k1” corresponding to the distribution of the category A, and an eigenvector “x1” are stored.

一方、カテゴリＢについては、カテゴリＢに対応する平均ベクトル「ｂ」、カテゴリＢの分布に対応する固有値「ｋ２」、固有ベクトル「ｘ２」が記憶されている。 On the other hand, for category B, an average vector “b” corresponding to category B, an eigenvalue “k2” corresponding to the distribution of category B, and an eigenvector “x2” are stored.

次に、図２の説明に戻り、文字認識処理部１０２ｃについて説明する。文字認識処理部１０２ｃは、入力された文字画像に含まれる文字を認識し、取得した認識結果に対し、類似文字識別を行う処理部である。 Next, returning to the description of FIG. 2, the character recognition processing unit 102c will be described. The character recognition processing unit 102c is a processing unit that recognizes characters included in the input character image and performs similar character identification on the acquired recognition result.

まず、入力された文字画像（例えば、入力パターンｐを含むものとする）に対して、従来技術を用いた文字認識処理を行い、この入力パターンｐに対する複数の文字認識候補（以下、単に文字候補とする）を取得する。 First, a character recognition process using conventional technology is performed on an input character image (for example, including an input pattern p), and a plurality of character recognition candidates for the input pattern p (hereinafter simply referred to as character candidates). ) To get.

そして、取得した複数の文字候補において、第１位の文字候補が類似文字テーブル１０２ａに登録されているか否かを判定し、判定結果により、第１位の文字候補が類似文字テーブル１０２ａに登録されていない場合は、第１位の文字候補を入力パターンｐの文字認識結果とする。 Then, in the plurality of acquired character candidates, it is determined whether or not the first character candidate is registered in the similar character table 102a. Based on the determination result, the first character candidate is registered in the similar character table 102a. If not, the first character candidate is taken as the character recognition result of the input pattern p.

一方、第１位の文字候補が類似文字テーブル１０２ａに登録されている場合は、第２位から第Ｎ位までに含まれる文字候補の文字であって、上述した第１位の文字候補に類似する文字が、類似文字テーブル１０２ａに含まれているか否かを判定する。 On the other hand, when the first character candidate is registered in the similar character table 102a, it is a character candidate character included in the second to Nth characters, and is similar to the first character candidate described above. It is determined whether or not the character to be included is included in the similar character table 102a.

そして、判定結果により、第１位の文字候補に類似する文字が類似文字テーブル１０２ａに登録されていない場合、第１位の文字候補を入力パターンｐの文字認識結果として出力する。 If the character similar to the first character candidate is not registered in the similar character table 102a according to the determination result, the first character candidate is output as the character recognition result of the input pattern p.

一方、第１位の文字候補に類似する文字が、類似文字テーブル１０２ａに登録されている場合、第２位から第Ｎ位までに含まれる文字候補において、一番上位の文字候補を選択し、選択した文字候補と第１位の文字候補との類似文字識別判定を行う。 On the other hand, if a character similar to the first character candidate is registered in the similar character table 102a, the highest character candidate is selected from the character candidates included in the second to Nth characters, Similar character identification determination is performed between the selected character candidate and the first character candidate.

上述した文字認識処理部１０２ｃの処理について具体的に例を挙げて説明する。例えば、入力パターンｐから、「猫（ねこ）」を入力パターンｐの第１位の文字候補として取得した場合、取得した「猫（ねこ）」を類似文字テーブル１０２ａから検索する。 The process of the character recognition processing unit 102c described above will be described with a specific example. For example, when “cat” is acquired from the input pattern p as the first character candidate of the input pattern p, the acquired “cat” is searched from the similar character table 102a.

そして、検索結果より、「猫（ねこ）」が、類似文字テーブル１０２ａに登録されていない場合、文字認識処理部１０２ｃは、「猫（ねこ）」を入力パターンｐの文字認識結果とする。 Then, if “cat” is not registered in the similar character table 102a based on the search result, the character recognition processing unit 102c sets “cat” as the character recognition result of the input pattern p.

一方、「猫（ねこ）」が、類似文字テーブル１０２ａに登録されている場合、入力パターンｐに対する第２位から第Ｎ位の文字候補を参照し、第１位候補「猫（ねこ）」に類似する文字が類似文字テーブル１０２ａに登録されているか判定を行う。 On the other hand, when “cat” is registered in the similar character table 102a, the second to Nth character candidates for the input pattern p are referred to and the first candidate “cat” is selected. It is determined whether similar characters are registered in the similar character table 102a.

そして、判定結果により、第１位候補「猫（ねこ）」に類似する文字が登録されていない場合、第１位候補「猫（ねこ）」を入力パターンｐの文字認識結果とする。 If no character similar to the first candidate “cat” is registered as a result of the determination, the first candidate “cat” is set as the character recognition result of the input pattern p.

一方、第２位から第Ｎ位の文字候補の中に、第２位の文字候補として、「描（えが）く」、第６位の文字候補として、「錨（いかり）」を取得した場合において、取得した「描（えが）く」と「錨（いかり）」が類似文字テーブル１０２ａに登録されている場合、以下に示す類似文字識別を行う。 On the other hand, among the 2nd to Nth character candidates, “Draw” was acquired as the 2nd character candidate, and “Ikari” was acquired as the 6th character candidate. In this case, when the acquired “draw” and “Ikari” are registered in the similar character table 102a, the following similar character identification is performed.

この場合、第２位から第Ｎ位の中で、一番の上位候補である第２位の文字候補「描（えが）く」と、第１位候補「猫（ねこ）」に関する類似文字識別を行う。処理の結果、入力パターンｐが、第１位候補「猫（ねこ）」との類似度が高ければ、第１位候補「猫（ねこ）」を入力パターンｐの文字認識結果とする。 In this case, among the 2nd to Nth positions, the second highest character candidate “Egaku”, which is the highest candidate, and the similar character relating to the first candidate “cat” Identify. As a result of the processing, if the input pattern p is highly similar to the first candidate “cat”, the first candidate “cat” is set as the character recognition result of the input pattern p.

一方、第２位の文字候補「描（えが）く」との類似度が高ければ、第２位の文字候補「描（えが）く」を入力パターンｐの文字認識結果とする。 On the other hand, if the degree of similarity with the second-ranked character candidate “Draw” is high, the second-ranked character candidate “Draw” is set as the character recognition result of the input pattern p.

なお、「猫（ねこ）」もしくは「描（えが）く」のいずれかを文字認識結果としても、第６位の文字候補「錨（いかり）」についての類似文字の識別は実施されず、「錨（いかり）」が、第２位の文字候補である場合において、第１位候補「猫（ねこ）」との類似文字識別を行う。 Note that even if either “cat” or “draw” is used as the character recognition result, similar character identification for the sixth candidate character “Ikari” is not performed, When “Ikari” is the second candidate character, similar character identification with the first candidate “Cat” is performed.

次に、上述した類似文字識別について図を用いて説明する。図５は、類似文字の識別処理を説明するための図である。なお、任意の入力パターンｐが与えられた場合に、文字カテゴリＡ、文字カテゴリＢのどちらに類似するかを識別する場合について説明する。 Next, similar character identification described above will be described with reference to the drawings. FIG. 5 is a diagram for explaining similar character identification processing. A case will be described in which the character category A or the character category B is identified when an arbitrary input pattern p is given.

まず、文字認識処理部１０２ｃが類似文字識別を行う際、カテゴリＡとカテゴリＢの共通分布Σ_Ｃを算出する（ステップＳ５０）。この共通分布Σ_Ｃを求める一例として、単純平均の方法を用いて求める場合を例に挙げて説明する。 First, the character recognition processing section 102c is making a similar character identification, to calculate a common distribution sigma _C category A and category B (step S50). As an example for obtaining the common distribution sigma _C, it will be described as an example a case obtained by using a simple average method.

カテゴリＡの共分散行列をＣ_Ａとし、カテゴリＢの共分散行列をＣ_Ｂとした場合、Ｃ_ＡとＣ_Ｂの和として共分散行列の和：Ｃ_Ｘを求める。そして、求めたＣ_Ｘから、上位ｎ個の固有値と固有ベクトルを算出する。 The covariance matrix of the category A and C _A, if the covariance matrix of the category B was C _B, the sum of the covariance matrix as a sum of C _A and C _B: Request C _X. Then, the top n eigenvalues and eigenvectors are calculated from the obtained _CX .

そして、算出した固有値と固有ベクトルから射影行列：Ｐ_Ｘを求めることにより、カテゴリＡとカテゴリＢの共通分布Σ_Ｃに対応する共分散行列とする。なお、これまで上述した単純平均の方法の他に以下に示す「共通化操作」によっても共通分布Σ_Ｃが求められ、この共通化操作の詳細については後述する。 Then, the calculated eigenvalues and projection matrix from eigenvectors: by obtaining a P _X, and covariance matrix corresponding to the common distribution sigma _C category A and category B. Incidentally, heretofore a common distribution sigma _C is determined by "common operation" below in addition to the simple average of the methods described above will be described in detail later in this common operation.

次に、入力パターンｐが示すベクトルの周辺にステップＳ５０にて求めた共通分布Σ_Ｃを仮定し、この共通分布Σ_Ｃを用いて、カテゴリＡとカテゴリＢが示す各分布に対し、畳み込み積分を行い、各カテゴリ分布の変形を計算する（ステップＳ５１）。 Then, assuming a common distribution sigma _C determined in step S50 around the vector indicated by the input pattern p, by using the common distribution sigma _C, for each distribution shown the category A and category B, and convolution The deformation of each category distribution is calculated (step S51).

例えば、入力パターンｐの特徴ベクトルをｘとすると、特徴空間上の点ｘに関し、このｘを中心とする共通分布Ｄ_ｐを密度関数とするカテゴリＣの２次識別関数の期待値を識別関数値とする。 For example, the input when the feature vector of the pattern p and x, relates to a point x in the feature space, the expected value of the discriminant function value of the secondary discriminant function category C to a common distribution D _p centered on the x and density function And

この場合において、通常の２次識別関数を以下の式（１）で定義した場合、この式（１）から、さらに以下に示す式（２）が求められる。

In this case, when a normal secondary discriminant function is defined by the following equation (1), the following equation (2) is further obtained from this equation (1).

式（２）において、ｍ_Ｃは、カテゴリＣの平均ベクトルを示し、Σ_Ｃは、カテゴリＣの共分散行列を示し、Σは、共通分布Ｄ_ｐの共分散行列を示す。これは、通常の２次識別関数において、Σ_ＣをΣ+Σ_Ｃに置き換えたものに相当する。 In Equation (2), m _C represents an average vector of category C, Σ _C represents a covariance matrix of category C, and Σ represents a covariance matrix of common distribution D _p . This, in normal quadratic discriminant function, corresponds to replacing the sigma _C to Σ + Σ _C.

そして、文字認識処理部１０２ｃは、式（２）に基づいて、共通分布Σ_Ｃが示す共分散行列と、カテゴリＡの分布が示す共分散行列との和を計算し、固有値・固有ベクトルを新たに算出する。 Then, the character recognition processing unit 102c, based on the equation (2), and covariance matrix shown the common distribution sigma _C, calculates the sum of the covariance matrix shown the distribution of categories A, new eigenvalues and eigenvectors calculate.

一方、文字認識処理部１０２ｃは、カテゴリＢに対してもカテゴリＡと同様に、式（２）を用いて、カテゴリＢの分布が示す共分散行列と共通分布Σ_Ｃが示す共分散行列との和を計算し、固有値・固有ベクトルを新たに計算する。 On the other hand, the character recognition processing unit 102c, similarly to Category A with respect to Category B, using equation (2), the common distribution sigma _C indicates a covariance matrix and the covariance matrix shown the distribution of Category B The sum is calculated, and the eigenvalue / eigenvector is newly calculated.

次に、文字認識処理部１０２ｃは、ステップＳ５１で求めた固有値・固有ベクトルに基づき、カテゴリＡ、カテゴリＢとの擬似ベイズ識別から入力パターンｐが、カテゴリＡもしくはカテゴリＢのどちらに類似するかの判定を行い、判定結果を認識結果出力部１０３に入力する（ステップＳ５２）。 Next, the character recognition processing unit 102c determines whether the input pattern p is similar to the category A or the category B from the pseudo Bayes identification with the category A and the category B based on the eigenvalue / eigenvector obtained in step S51. The determination result is input to the recognition result output unit 103 (step S52).

例えば、図１に示したように、カテゴリＡに対応する識別関数値Ａが「０．６」で、カテゴリＢに対応する識別関数値Ａが「０．４」の場合、算出された識別関数値が小さい方が、類似度が高いことから、入力パターンｐは、カテゴリＢに類似することになる。 For example, as shown in FIG. 1, when the discrimination function value A corresponding to the category A is “0.6” and the discrimination function value A corresponding to the category B is “0.4”, the calculated discrimination function The smaller the value is, the higher the degree of similarity is. Therefore, the input pattern p is similar to the category B.

このように、入力パターンｐは、特徴空間上でひとつのベクトルとして考えられるが、入力パターンのベクトル周辺に共通分布を仮定することにより、カテゴリＡとカテゴリＢが示す各分布の形状に基づいた識別を行うことになる。 In this way, the input pattern p can be considered as one vector in the feature space. However, by assuming a common distribution around the input pattern vector, identification based on the shape of each distribution indicated by category A and category B is possible. Will do.

そして、入力パターンｐの共通分布と、カテゴリＡとカテゴリＢが示す各分布の形状を正規分布と仮定し、各カテゴリ分布の２次識別関数を共通分布による畳み込み積分をカテゴリごとに行う。 Then, assuming that the common distribution of the input pattern p and the shapes of the distributions indicated by the categories A and B are normal distributions, the convolution integral by the common distribution is performed for each category on the secondary discriminant function of each category distribution.

これは、カテゴリＡに対し、畳み込み積分を行って得られる関数は、共通分布が示す共分散行列とカテゴリＡが示す共分散行列との和に置き換えた２次識別関数に相当する。 This corresponds to a quadratic discriminant function in which the function obtained by performing convolution integration on category A is replaced with the sum of the covariance matrix indicated by the common distribution and the covariance matrix indicated by category A.

一方、カテゴリＢに対し、畳み込み積分を行って得られる関数は、共通分布が示す共分散行列とカテゴリＢが示す共分散行列との和に置き換えた２次識別関数に相当する。 On the other hand, the function obtained by performing convolution integration on category B corresponds to a quadratic discriminant function replaced with the sum of the covariance matrix indicated by the common distribution and the covariance matrix indicated by category B.

次に、図２の説明に戻り、認識結果出力部１０３について説明する。認識結果出力部１０３は、処理部１０２が行う類似文字識別に基づいて、文字認識結果を出力する処理部である。 Next, returning to the description of FIG. 2, the recognition result output unit 103 will be described. The recognition result output unit 103 is a processing unit that outputs a character recognition result based on similar character identification performed by the processing unit 102.

なお、上述してきたように、共通分布を求める一例として、単純平均の方法を用いて求める場合を例に挙げて説明したが、以下に示す「共通化操作」によっても共通分布が算出される。 As described above, as an example of obtaining the common distribution, the case of obtaining using the simple average method has been described as an example, but the common distribution is also calculated by the “common operation” described below.

単純平均の方法を用いて共通分布を算出した場合、類似文字カテゴリＡと類似文字カテゴリＢが与えられたとき、カテゴリＡとカテゴリＢに共通する変形（共通分布）は、カテゴリＡの共分散行列とカテゴリＢの共分散行列の和を求め、上位ｎ個の固有値と固有ベクトルを算出する。 When the common distribution is calculated using the simple average method, when the similar character category A and the similar character category B are given, the deformation (common distribution) common to the category A and the category B is the covariance matrix of the category A And the category B covariance matrix are obtained, and the top n eigenvalues and eigenvectors are calculated.

しかし、上位ｎ個の固有値と固有ベクトルには、類似文字カテゴリ間の差異が含まれている場合があるので、類似文字カテゴリ間の差異が含まれない共通分布を得る方法の一例として、他方の分布に含まれない成分をカットする方法がある。 However, since the top n eigenvalues and eigenvectors may include differences between similar character categories, the other distribution is an example of a method for obtaining a common distribution that does not include differences between similar character categories. There is a method of cutting ingredients that are not included.

例えば、入力パターンｐが、カテゴリＡ側に対して、極端な広がりを持っていた場合、単純平均方法で算出した共通分布は、カテゴリＡ側の寄与の影響を受けた共通分布となってしまう。 For example, when the input pattern p has an extreme spread with respect to the category A side, the common distribution calculated by the simple average method becomes a common distribution influenced by the contribution of the category A side.

したがって、類似文字カテゴリ間の差異が含まれないような共通分布を算出する方法として、共通化操作がある。具体的には、他方の分布に含まれない成分をカットすることで、より正確な共通分布が算出される。以下、共通化操作について詳細に説明する。 Therefore, there is a common operation as a method for calculating a common distribution that does not include a difference between similar character categories. Specifically, a more accurate common distribution is calculated by cutting a component that is not included in the other distribution. Hereinafter, the common operation will be described in detail.

まず、カテゴリＡの分布に対して、カテゴリＢの分布を用いた共通化変換を行う。この際、カテゴリＡの共分散行列をＣ_Ａ、カテゴリＢの共分散行列をＣ_Ｂとした場合、まず、Ｃ_Ａから固有値・固有ベクトルに基づく変換行列Ｔ_ＣＡを算出する。 First, common conversion using the distribution of category B is performed on the distribution of category A. At this time, if the covariance matrix of category A is C _A and the covariance matrix of category B is C _B , first, a transformation matrix T _CA based on eigenvalues / eigenvectors is calculated from C _A.

そして、固有値・固有ベクトルに基づく変換行列Ｔを求める。この場合、共分散行列Ｃの固有値をλ_ｉ（ｉ＝１、２・・・、ｋ（?ｎ））、固有ベクトルをΦ_ｉ（ｉ＝１、２・・・、ｋ（?ｎ））としたときに、Φ_ｋ＋１・・・Φ_ｎをΦ_ｉ（ｉ＝１，２・・・、ｋ（?ｎ））と正規直交基底になるようにとると、変換行列Ｔは以下の式（３）で表される。

Then, a transformation matrix T based on the eigenvalue / eigenvector is obtained. In this case, the eigenvalue of the covariance matrix C is λ _i (i = 1, 2,..., K (? N)), and the eigenvector is Φ _i (i = 1, 2,..., K (? N)). Then, if Φ _{k + 1} ... Φ _n is taken to be an orthonormal basis with Φ _i (i = 1, 2,..., K (? N)), the transformation matrix T is expressed by the following formula (3 ).

そして、以下の式（４）を用いて、カテゴリＢの分布のうち、カテゴリＡの分布に共通する部分を取り出す。

Then, a portion common to the category A distribution is extracted from the category B distribution using the following equation (4).

次に、カテゴリＢの分布に対して、カテゴリＡの分布を用いた共通化変換を行う。この場合、カテゴリＡにて説明した同様の処理を行うことで、以下の式（５）にて表される。

Next, the common conversion using the category A distribution is performed on the category B distribution. In this case, by performing the same processing as described for category A, the following expression (5) is obtained.

そして、式（４）に示したＣ´_Ｂと、式（５）に示したＣ´_Ａに基づいて、Ｃ´_Ｂ+Ｃ´_Ａの上位ｎ個の固有値と固有ベクトルを算出する。そして、算出された固有値、固有ベクトルを用いて、共通分布が算出される。 Then, based on C ′ _B shown in Expression (4) and C ′ _A shown in Expression (5), the upper n eigenvalues and eigenvectors of C ′ _B + C ′ _A are calculated. Then, the common distribution is calculated using the calculated eigenvalue and eigenvector.

次に、実施例１に示した文字識別装置１００の処理について説明する。図６は、実施例１に係る文字識別装置の処理を示すフローチャートである。 Next, processing of the character identification device 100 shown in the first embodiment will be described. FIG. 6 is a flowchart illustrating the process of the character identification device according to the first embodiment.

まず、文字識別装置１００が文字画像を取得し、取得した文字画像（例えば、入力パターンｐを含むものとする）に対して通常の文字認識処理を行う（ステップＳ１００）。そして、入力パターンｐに対する複数の文字認識候補を取得する（ステップＳ１０１）。 First, the character identification device 100 acquires a character image, and performs normal character recognition processing on the acquired character image (for example, including the input pattern p) (step S100). Then, a plurality of character recognition candidates for the input pattern p are acquired (step S101).

そして、複数の文字候補の中で、第１位の文字候補が類似文字テーブル１０２ａに登録されているか否かを判定し、判定結果により、第１位の文字候補が登録されていない場合（ステップＳ１０２、Ｎｏ）は、第１位の文字候補を文字認識の結果とする。 Then, it is determined whether or not the first character candidate is registered in the similar character table 102a among the plurality of character candidates, and if the first character candidate is not registered according to the determination result (step) S102, No) sets the first character candidate as the result of character recognition.

一方、第１位の文字候補が類似文字テーブル１０２ａに登録されている場合（ステップＳ１０２、Ｙｅｓ）は、ステップＳ１０１で取得した第２位から第ｎ位までの文字候補の中に、第１位の文字候補に類似する文字が類似文字テーブル１０２ａに登録されているか否かを判定する（ステップＳ１０３）。 On the other hand, if the first character candidate is registered in the similar character table 102a (step S102, Yes), the first character candidate from the second character to the nth character acquired in step S101 is the first character candidate. It is determined whether or not a character similar to the character candidate is registered in the similar character table 102a (step S103).

判定結果により、類似文字テーブル１０２ａに第１位の文字候補の類似する文字が登録されていない場合（ステップＳ１０４、Ｎｏ）、第ｎ+１位に登録されている文字候補を取得する（ステップＳ１０５）。 As a result of the determination, if the similar character of the first character candidate is not registered in the similar character table 102a (No in step S104), the character candidate registered in the n + 1st character is acquired (step S105). ).

そして、ステップＳ１０１にて取得した全ての候補文字に対して処理を行っている場合（ステップＳ１０６、Ｎｏ）、第１位の文字候補を認識結果として出力する。 If all candidate characters acquired in step S101 are processed (No in step S106), the first character candidate is output as a recognition result.

一方、ステップＳ１０１にて取得した全ての候補文字に対して処理を行っていない場合（ステップＳ１０６、Ｙｅｓ）、ステップＳ１０４に移行する。 On the other hand, when processing has not been performed for all candidate characters acquired in step S101 (step S106, Yes), the process proceeds to step S104.

また、ステップＳ１０３の判定結果により、第２位から第Ｎ位までに含まれる文字候補の文字であって、上述した第１位の文字候補に類似する文字が、類似文字テーブル１０２ａに登録されている場合（ステップＳ１０４、Ｙｅｓ）、第２位から第Ｎ位までに含まれる文字候補で一番上位候補の文字と、第１位の文字候補との類似文字識別を行う(ステップＳ１０７)。 Further, according to the determination result in step S103, characters that are character candidates included in the second to Nth positions and similar to the first character candidate described above are registered in the similar character table 102a. If yes (step S104, Yes), similar character identification is performed between the top candidate character and the first character candidate included in the second to Nth character candidates (step S107).

このフローチャートによれば、文字識別装置１００が、「鳥（からす）」や「鳥（とり）」といった類似文字が示す特徴の差異を、特徴空間上にて取得することで、各類似文字を高精度に識別できる。 According to this flowchart, the character identification device 100 obtains the difference between features indicated by similar characters such as “bird” and “bird” on the feature space, thereby obtaining each similar character. Can be identified with high accuracy.

次に、実施例２に示す文字識別装置について説明する。図７は、実施例２に係る文字識別装置を示す図である。図７に示した文字識別装置２００は、ユーザが指定する類似文字に基づき、類似文字識別を行う文字識別装置である。 Next, the character identification device shown in Example 2 will be described. FIG. 7 is a diagram illustrating the character identification device according to the second embodiment. The character identification device 200 shown in FIG. 7 is a character identification device that performs similar character identification based on similar characters designated by the user.

この「類似文字識別」とは、実施例１で示したように、任意の文字を、互いに類似する文字カテゴリのどちらかに属するかを識別することを示す。 This “similar character identification” indicates that an arbitrary character belongs to one of character categories similar to each other as described in the first embodiment.

そして、文字識別装置２００は、画像入力部２０１と、処理部２０２と、認識結果出力部２０３と、インターフェース２０４とを有する。 The character identification device 200 includes an image input unit 201, a processing unit 202, a recognition result output unit 203, and an interface 204.

また、文字識別装置２００は、図２に示した文字識別装置１００とほぼ同等の機能を有し、この文字識別装置１００と同等の機能を有する機能部についての詳細な説明は省略する。 Moreover, the character identification device 200 has substantially the same function as the character identification device 100 shown in FIG. 2, and detailed description of the functional unit having the same function as the character identification device 100 is omitted.

具体的に、文字識別装置１００との違いは、処理部２０２に類似文字認識辞書２０２ｃを有すること、インターフェース２０４を有していることである。以下、類似文字認識辞書２０２ｃと、インターフェース２０４について説明する。 Specifically, the difference from the character identification device 100 is that the processing unit 202 has a similar character recognition dictionary 202 c and an interface 204. Hereinafter, the similar character recognition dictionary 202c and the interface 204 will be described.

図８は、類似文字認識辞書のデータ構造の一例を示す図である。図８に示した類似文字認識辞書２０２ｃは、互いに類似する２つの文字カテゴリが記憶されている。 FIG. 8 is a diagram illustrating an example of the data structure of the similar character recognition dictionary. The similar character recognition dictionary 202c shown in FIG. 8 stores two character categories similar to each other.

そして、この２つの文字カテゴリに登録されている文字は、入力パターンｐが与えられた場合において、ユーザが、入力パターンｐと、対応する２つの文字カテゴリに登録されている文字とが正しく識別されることを希望する文字データを示す。 When the input pattern p is given to the characters registered in these two character categories, the user can correctly identify the input pattern p and the characters registered in the corresponding two character categories. Indicates the character data desired to be

「サンプル数」は、対応する文字カテゴリに類似する類似文字データの個数を示し、例えば、「鳥（とり）」と「烏（からす）」に対応するサンプル数は「２」であることを示している。 “Number of samples” indicates the number of similar character data similar to the corresponding character category. For example, the number of samples corresponding to “bird” and “crow” is “2”. Show.

そして、「サンプル数」と同じ個数のベクトルが共通ベクトルとして記憶されている。なお、上述した「サンプル数」、図８に示した「共通ベクトル」における詳細なデータ構造については、後述の図９にて説明する。 The same number of vectors as the “number of samples” are stored as common vectors. The detailed data structure of the “number of samples” described above and the “common vector” shown in FIG. 8 will be described later with reference to FIG.

続いて、図７に示したインターフェース２０４について図を用いて説明する。図９は、インターフェースの一例を示す図である。図９に示したインターフェース２０４は、ユーザが、正しく識別されることを希望する文字カテゴリと、その文字カテゴリに類似する類似文字データとをそれぞれ画像の組で、ユーザが指定するインターフェースである。 Next, the interface 204 illustrated in FIG. 7 will be described with reference to the drawings. FIG. 9 is a diagram illustrating an example of an interface. The interface 204 shown in FIG. 9 is an interface in which the user designates a character category that the user desires to be correctly identified and similar character data similar to the character category, each as a set of images.

図９に示した例では、ウインドウ上部の「文字の指定」は、ユーザがキーボード等（図示省略）を用いて、文字を入力するインターフェースを示す。そして、文字カテゴリＡに「ば」が、文字カテゴリＢに「ぱ」が入力される。 In the example shown in FIG. 9, “designation of characters” at the top of the window indicates an interface through which a user inputs characters using a keyboard or the like (not shown). Then, “BA” is input to the character category A, and “PA” is input to the character category B.

したがって、任意の入力パターンｐが与えられた場合において、ユーザは、この入力パターンｐが、「ぱ」と「ば」のどちらに属するかについて、正確に識別されること希望している。 Therefore, when an arbitrary input pattern p is given, the user wants to accurately identify whether the input pattern p belongs to “pa” or “ba”.

そして、入力パターンｐが、「ぱ」と「ば」のどちらに属するかが識別される際に、「文字の指定」に登録されている「ぱ」と「ば」の組に相当する類似文字データをインターフェース２０４の下部に示す「画像の指定」に登録する。 When the input pattern p is identified as belonging to “pa” or “ba”, similar characters corresponding to the “pa” and “ba” pairs registered in “character designation” are identified. The data is registered in “image designation” shown at the bottom of the interface 204.

この「画像の指定」における「サンプルＩＤ」が示す番号は、図８の「サンプル数」に相当する。また、上述した「類似文字データ」とは、「画像の指定」によって登録されるデータを示す。 The number indicated by the “sample ID” in the “designation of image” corresponds to the “number of samples” in FIG. The “similar character data” described above indicates data registered by “designation of image”.

そして、「画像の指定」に登録された画像の組に基づいて、文字認識処理部２０２ｄが共通ベクトルを計算し、図８の類似文字認識辞書２０２ｃに算出した共通ベクトルを登録する。 The character recognition processing unit 202d calculates a common vector based on the set of images registered in the “designation of image”, and registers the calculated common vector in the similar character recognition dictionary 202c in FIG.

この共通ベクトルの算出について、文字認識処理部２０２ｄが行う処理を具体的に説明する。まず、与えられた２つの画像から得られる特徴ベクトルをｘ_Ａとｘ_Ｂとした場合、共通ベクトルを以下に示す式（６）にて算出する。

The processing performed by the character recognition processing unit 202d for calculating the common vector will be specifically described. First, assuming that feature vectors obtained from two given images are x _A and x _B , a common vector is calculated by the following equation (6).

なお、上述した式（６）において、（（ｘ_Ａ−ｘ_Ｂ）・ｘ_Ａ）は、ｘ_Ａ−ｘ_Ｂと、ｘ_Ａとのベクトルの内積を示し、（ｘ_Ａ−ｘ_Ｂ）／│ｘ_Ａ−ｘ_Ｂ│は、単位ベクトルを示すものとする。 In the above equation (6), ((x _A −x _B ) · x _A ) represents an inner product of vectors of x _A −x _B and x _A, and (x _A −x _B ) / | x _A −x _B | represents a unit vector.

また、式（６）にて示した各共通ベクトルは、同じ値を有することになり、いずれか一方の共通ベクトルを算出すれば、対応する共通ベクトルが求められる。 Moreover, each common vector shown by Formula (6) will have the same value, and if any one common vector is calculated, the corresponding common vector will be calculated | required.

そして、文字認識処理部２０２ｄは、式（６）を用いて算出した共通ベクトルを図８に示した類似文字認識辞書２０２ｃに登録する。 Then, the character recognition processing unit 202d registers the common vector calculated using Equation (6) in the similar character recognition dictionary 202c shown in FIG.

次に、文字認識処理部２０２ｄが行う類似文字識別について説明する。まず、文字認識処理部２０２ｄは、任意の入力パターンｐが与えられた場合に、入力パターンｐに類似する文字カテゴリの組が類似文字認識辞書２０２ｃに登録されているか否かを検索する。 Next, similar character identification performed by the character recognition processing unit 202d will be described. First, when an arbitrary input pattern p is given, the character recognition processing unit 202d searches whether a set of character categories similar to the input pattern p is registered in the similar character recognition dictionary 202c.

そして、検索結果により、登録されている場合、文字認識処理部２０２ｄは、入力パターンｐに類似する文字カテゴリの組に対応する共通ベクトルを取得し、以下の式（７）を用いて共通分布に追加する。

Then, if registered based on the search result, the character recognition processing unit 202d acquires a common vector corresponding to a set of character categories similar to the input pattern p, and uses the following equation (7) to obtain a common distribution. to add.

式（７）において、「Ｃ」は、実施例１で示した共通分布（例えば、Σ_Ｃ）に対応し、この共通分布に対応する新しい共通分布がＣ´となる。なお、共通分布Σ_Ｃを求める方法は、実施例１で示したように、単純平均の方法を用いて算出しても良く、「共通化操作」を用いて算出しても良く、詳細な算出方法については、実施例１と同様であるので省略する。 In Expression (7), “C” corresponds to the common distribution (for example, Σ _C ) shown in the first embodiment, and a new common distribution corresponding to this common distribution is C ′. The method for obtaining the common distribution sigma _C, as shown in Example 1, may be calculated using the simple average method may be calculated by using the "common operation", detailed calculation Since the method is the same as that of the first embodiment, a description thereof will be omitted.

その後、文字認識処理部２０２ｄは、式（７）で求めた共通分布を用いて、文字カテゴリ「Ａ」（例えば、“ば”）と、文字カテゴリ「Ｂ」（例えば：“ぱ”）とに対して、畳み込み積分を行う。そして、畳み込み積分によって得られる固有値、固有ベクトルに基づいて、擬似ベイズ識別を行う。 After that, the character recognition processing unit 202d uses the common distribution obtained by Expression (7) to categorize the character category “A” (for example, “BA”) and the character category “B” (for example: “PA”). On the other hand, convolution integration is performed. Then, pseudo Bayes identification is performed based on eigenvalues and eigenvectors obtained by convolution integration.

なお、図９に示した例では、類似文字データを画像データにより示したが、これは、説明の便宜上、画像データを例に挙げたに過ぎず、他のデータによって置き換えたとしても本実施例に示す機能が得られるものとする。 In the example shown in FIG. 9, the similar character data is shown as image data. However, for convenience of explanation, this is merely an example of image data, and this embodiment can be used even if it is replaced with other data. It is assumed that the function shown in (1) is obtained.

次に、実施例２に示した文字識別装置２００に処理について説明する。図１０は、実施例に係る文字識別装置の処理を示すフローチャートである。 Next, processing will be described for the character identification device 200 shown in the second embodiment. FIG. 10 is a flowchart illustrating the process of the character identification device according to the embodiment.

あらかじめ、ユーザがインターフェース２０４を介して、正しく認識してほしい文字カテゴリと、その文字カテゴリに類似する類似文字カテゴリを画像の組とした類似文字データを登録しておく。 In advance, the user registers a character category that the user wants to recognize correctly through the interface 204 and similar character data in which a similar character category similar to the character category is a set of images.

そして、文字認識処理部２０２ｄが、登録された類似文字データに基づいて、共通ベクトルを計算し、算出した共通ベクトルを類似文字認識辞書２０２ｃに登録する（ステップＳ２００）。 Then, the character recognition processing unit 202d calculates a common vector based on the registered similar character data, and registers the calculated common vector in the similar character recognition dictionary 202c (step S200).

そして、文字識別装置２００が文字画像を取得し、取得した文字画像に対して通常の文字認識処理を行い（ステップＳ２０１）、複数の文字認識候補を取得する（ステップＳ２０２）。 The character identification device 200 acquires a character image, performs normal character recognition processing on the acquired character image (step S201), and acquires a plurality of character recognition candidates (step S202).

続いて、複数の文字候補の中で、第１位の文字候補が類似文字テーブル２０２ｂに登録されているか否かを判定し、判定結果より、第１位の文字候補が登録されていない場合（ステップＳ２０３、Ｎｏ）は、第１の文字候補を文字認識の結果とする。 Subsequently, it is determined whether or not the first character candidate is registered in the similar character table 202b among the plurality of character candidates. If the first character candidate is not registered based on the determination result ( In step S203, No), the first character candidate is set as a result of character recognition.

一方、第１位の文字候補が登録されている場合（ステップＳ２０３、Ｙｅｓ）は、第２位から第ｎ位までの文字候補の中に、第１位の文字候補に類似する文字が類似文字テーブル２０２ｂに登録されているか否かを判定する（ステップＳ２０４）。 On the other hand, when the first character candidate is registered (step S203, Yes), among the character candidates from the second character to the nth character, a character similar to the first character candidate is a similar character. It is determined whether or not it is registered in the table 202b (step S204).

判定結果により、類似文字テーブル２０２ｂに第１位の文字候補に類似する文字が登録されていない場合（ステップＳ２０５、Ｎｏ）、第ｎ+１位に登録されている文字を取得する（ステップＳ２０６）。 If a character similar to the first character candidate is not registered in the similar character table 202b based on the determination result (No in step S205), the character registered in the n + 1st character is acquired (step S206). .

そして、ステップＳ２０１にて取得した全ての候補文字に対して処理を行っている場合（ステップＳ２０７、Ｎｏ）、第１位の文字候補を認識結果として出力する。 If all candidate characters acquired in step S201 are processed (No in step S207), the first character candidate is output as a recognition result.

一方、ステップＳ２０１にて取得した全ての候補文字に対して処理を行っていない場合（ステップＳ２０７、Ｙｅｓ）、ステップＳ２０５に移行する。 On the other hand, when processing has not been performed for all candidate characters acquired in step S201 (step S207, Yes), the process proceeds to step S205.

そして、第２位から第ｎ位までの文字候補であって、第１位の文字候補に類似する文字が類似文字テーブル２０２ｂに記憶されている場合（ステップＳ２０５、Ｙｅｓ）、類似文字の識別を行う(ステップＳ２０８)。 If characters similar to the first character candidate from the second character to the nth character are stored in the similar character table 202b (Yes in step S205), the similar character is identified. This is performed (step S208).

ステップＳ２０８において、文字認識処理部２０２ｄが類似文字の識別処理を行う場合、ステップＳ２００で登録した共通ベクトルが類似文字認識辞書２０２ｃに登録されているか否かを検索する。 In step S208, when the character recognition processing unit 202d performs similar character identification processing, it is searched whether or not the common vector registered in step S200 is registered in the similar character recognition dictionary 202c.

検索結果により、登録されている場合、文字認識処理部２０２ｄは、参照した共通ベクトルを共通分布に追加する。そして、新しく求めた共通分布に基づいて、類似文字識別を行う。 If registered according to the search result, the character recognition processing unit 202d adds the referenced common vector to the common distribution. Then, similar character identification is performed based on the newly obtained common distribution.

上述した実施例によれば、類似文字カテゴリ間に共通する特徴の情報を抽出し、類似文字カテゴリ間に共通する特徴を抑えつつ、特徴の差異が強調される識別を行うため、多次元的に広がった類似した文字カテゴリ間の特徴の差異を利用し、より高精度な類似文字の識別が可能となる。 According to the above-described embodiment, the feature information common to similar character categories is extracted, and the features common to similar character categories are suppressed, and the feature differences are emphasized. It is possible to identify similar characters with higher accuracy by utilizing the difference in characteristics between the expanded similar character categories.

ところで、本実施例において説明した各処理のうち、自動的に行われるものとして説明した処理の一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部あるいは一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種データを含む情報については、特記する場合を除いて任意に変更することができる。 By the way, among the processes described in the present embodiment, a part of the processes described as being automatically performed can be manually performed, or all of the processes described as being manually performed or A part can be automatically performed by a known method. In addition, the information including the processing procedure, control procedure, specific name, and various data shown in the above document and drawings can be arbitrarily changed unless otherwise specified.

また、図２に示した文字識別装置１００の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。さらに、各装置にて行われる各処理機能は、その全部または任意の一部がＣＰＵおよび当該ＣＰＵにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 Further, each component of the character identification device 100 shown in FIG. 2 is functionally conceptual and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution / integration of each device is not limited to that shown in the figure, and all or a part thereof may be functionally or physically distributed or arbitrarily distributed in arbitrary units according to various loads or usage conditions. Can be integrated and configured. Furthermore, each processing function performed by each device may be realized by a CPU and a program that is analyzed and executed by the CPU, or may be realized as hardware by wired logic.

図１１は、実施例１に係る文字識別装置を構成するコンピュータのハードウェア構成を示す図である。図１１に示すように、このコンピュータ３００は、入力装置３０１、ディスプレイ３０２、ＲＡＭ（Random Access Memory）３０３、ＲＯＭ（Read Only Memory）３０４、ＨＤＤ（Hard Disk Drive）３０５、ＣＰＵ(Central Processing Unit)３０６、媒体読取装置３０７をバス３０８で接続している。 FIG. 11 is a diagram illustrating a hardware configuration of a computer constituting the character identification device according to the first embodiment. As shown in FIG. 11, the computer 300 includes an input device 301, a display 302, a RAM (Random Access Memory) 303, a ROM (Read Only Memory) 304, an HDD (Hard Disk Drive) 305, and a CPU (Central Processing Unit) 306. The medium reader 307 is connected by a bus 308.

そして、ＨＤＤ３０５には、上述した類似文字識別機能と同様の機能を発揮する文字認識プログラム３０５ａが記憶されている。ＣＰＵ３０６が、文字認識プログラム３０５ａを読み出して実行することにより、文字認識プロセス３０６ａが起動される。ここで、文字認識プロセス３０６ａは、図２に示した文字認識処理部１０２ｃに対応する。 The HDD 305 stores a character recognition program 305a that performs the same function as the similar character identification function described above. When the CPU 306 reads and executes the character recognition program 305a, the character recognition process 306a is activated. Here, the character recognition process 306a corresponds to the character recognition processing unit 102c shown in FIG.

尚、ＲＡＭ３０３には、処理部１０２に記憶されている各種データや、文字認識プロセス３０６ａによって利用されるデータを含んだ各種データ３０３ａを記憶している。ＣＰＵ３０６は、各種データ３０３ａに含まれる各文字カテゴリの平均ベクトル、固有値、固有ベクトルに基づいて類似文字の識別を行う。 The RAM 303 stores various data 303a including various data stored in the processing unit 102 and data used by the character recognition process 306a. The CPU 306 identifies similar characters based on the average vector, eigenvalue, and eigenvector of each character category included in the various data 303a.

ところで、図１１に示した文字認識プログラム３０５ａは、必ずしも最初からＨＤＤ３０５に記憶させておかなくても良い。たとえば、コンピュータに挿入されるフレキシブルディスク（ＦＤ）、ＣＤ−ＲＯＭ、ＤＶＤディスク、光磁気ディスク、ＩＣカードなどの「可搬用の物理媒体」、または、コンピュータの内外に備えられるハードディスクドライブ（ＨＤＤ）などの「固定用の物理媒体」、さらには、公衆回線、インターネット、ＬＡＮ、ＷＡＮなどを介してコンピュータに接続される「他のコンピュータ（またはサーバ）」などに記憶しておき、コンピュータが文字認識プログラム３０５ａを読み出して実行するようにしてもよい。 By the way, the character recognition program 305a shown in FIG. 11 does not necessarily have to be stored in the HDD 305 from the beginning. For example, a “portable physical medium” such as a flexible disk (FD), a CD-ROM, a DVD disk, a magneto-optical disk, or an IC card inserted into a computer, or a hard disk drive (HDD) provided inside or outside the computer. Are stored in the “fixed physical medium”, and “another computer (or server)” connected to the computer via a public line, the Internet, a LAN, a WAN, or the like. 305a may be read and executed.

以上の各実施例を含む実施形態に関し、さらに以下の付記を開示する。 The following supplementary notes are further disclosed with respect to the embodiments including the above examples.

（付記１）コンピュータに、
特徴空間上において、形状が異なり同意の文字を含む文字カテゴリ分布を、対比する文字ごとに特定し、各文字カテゴリ分布に共通する情報を共通分布として求める共通分布処理手順と、
前記共通分布と各文字カテゴリ分布を基にして各文字カテゴリ分布を修正する修正手順と、
識別対象となる文字を取得した場合に、当該文字の前記特徴空間上の位置と、前記修正手順により修正された各文字カテゴリ分布とを基にして、前記文字を識別する文字識別手順と
を実行させる文字識別プログラム。 (Supplementary note 1)
A common distribution processing procedure for identifying a character category distribution having different shapes and including agreed characters on the feature space for each character to be compared, and obtaining information common to each character category distribution as a common distribution,
A correction procedure for correcting each character category distribution based on the common distribution and each character category distribution;
When a character to be identified is acquired, a character identification procedure for identifying the character is executed based on the position of the character in the feature space and each character category distribution corrected by the correction procedure. Character recognition program

（付記２）前記修正手順は、前記文字カテゴリ分布に対し、前記共通分布を用いた畳み込み積分を行って、当該文字カテゴリ分布を修正し、前記文字識別手順は、当該修正した文字カテゴリ分布に基づいて、前記文字を識別する付記１に記載の文字識別プログラム。 (Supplementary Note 2) The correction procedure performs convolution integration using the common distribution on the character category distribution to correct the character category distribution, and the character identification procedure is based on the corrected character category distribution. The character identification program according to supplementary note 1, for identifying the character.

（付記３）前記共通分布処理手順が、前記共通分布を求める際に、前記文字カテゴリ分布間の差異を減少させ、前記文字カテゴリ分布間に共通する情報に基づいて、前記共通分布を求める付記１または２に記載の文字識別プログラム。 (Additional remark 3) When the said common distribution process procedure calculates | requires the said common distribution, the difference between the said character category distribution is reduced, and the said common distribution is calculated | required based on the information common between the said character category distributions. Or the character identification program of 2.

（付記４）前記共通分布処理手順は、利用者によって入力された画像データに基づき、前記画像データに共通する情報を算出し、算出した情報を前記共通分布に加えて、共通分布を新たに求める付記１〜３のいずれか一つに記載の文字識別プログラム。 (Additional remark 4) The said common distribution process procedure calculates the information common to the said image data based on the image data input by the user, adds the calculated information to the said common distribution, and calculates | requires a new common distribution The character identification program as described in any one of appendix 1-3.

（付記５）特徴空間上において、形状が異なり同意の文字を含む文字カテゴリ分布を、対比する文字ごとに特定し、各文字カテゴリ分布に共通する情報を共通分布として求める共通分布処理部と、
前記共通分布と各文字カテゴリ分布を基にして各文字カテゴリ分布を修正する修正部と、
識別対象となる文字を取得した場合に、当該文字の前記特徴空間上の位置と、前記修正部により修正された各文字カテゴリ分布とを基にして、前記文字を識別する文字識別部と
を有する文字識別装置。 (Supplementary Note 5) A common distribution processing unit that specifies a character category distribution having different shapes and including agreed characters on the feature space for each character to be compared, and obtaining information common to each character category distribution as a common distribution;
A correction unit for correcting each character category distribution based on the common distribution and each character category distribution;
A character identification unit that identifies the character based on the position of the character in the feature space and each character category distribution corrected by the correction unit when a character to be identified is acquired; Character identification device.

（付記６）前記修正部は、前記文字カテゴリ分布に対し、前記共通分布を用いた畳み込み積分を行って、当該文字カテゴリ分布を修正し、前記文字識別部は、当該修正した文字カテゴリ分布に基づいて、前記文字を識別する付記５に記載の文字識別装置。 (Additional remark 6) The said correction part performs the convolution integral using the said common distribution with respect to the said character category distribution, and corrects the said character category distribution, The said character identification part is based on the said corrected character category distribution. The character identification device according to attachment 5, which identifies the character.

（付記７）前記共通分布処理部が、前記共通分布を求める際に、前記文字カテゴリ分布間の差異を減少させ、前記文字カテゴリ分布間に共通する情報に基づいて、前記共通分布を求める付記５または６に記載の文字識別装置。 (Additional remark 7) When the said common distribution process part calculates | requires the said common distribution, the difference between the said character category distributions is reduced, and the said common distribution is calculated | required based on the information common between the said character category distributions. Or the character identification device of 6.

（付記８）前記共通分布処理手順は、利用者によって入力された画像データに基づき、前記画像データに共通する情報を算出し、算出した情報を前記共通分布に加えて、共通分布を新たに求める付記５〜７のいずれか一つに記載の文字識別装置。 (Additional remark 8) The said common distribution process procedure calculates the information common to the said image data based on the image data input by the user, adds the calculated information to the said common distribution, and calculates | requires a new common distribution The character identification device according to any one of appendices 5 to 7.

（付記９）文字識別装置が文字を識別する方法であって、
前記文字識別装置が、
特徴空間上において、形状が異なり同意の文字を含む文字カテゴリ分布を、対比する文字ごとに特定し、各文字カテゴリ分布に共通する情報を共通分布として求める共通分布処理ステップと、
前記共通分布と各文字カテゴリ分布を基にして各文字カテゴリ分布を修正する修正ステップと、
識別対象となる文字を取得した場合に、当該文字の前記特徴空間上の位置と、前記修正ステップにより修正された各文字カテゴリ分布とを基にして、前記文字を識別する文字識別ステップと
を実行する文字識別方法。 (Supplementary note 9) A method by which a character identification device identifies a character,
The character identification device is
A common distribution processing step for identifying a character category distribution having different shapes and including agreed characters on the feature space for each character to be compared, and obtaining information common to each character category distribution as a common distribution;
A correction step of correcting each character category distribution based on the common distribution and each character category distribution;
When a character to be identified is acquired, a character identification step for identifying the character based on the position of the character in the feature space and each character category distribution corrected by the correction step is executed. Character identification method.

１０特徴空間
５０共通分布
１００、２００、３００文字識別装置
１０１、２０１画像入力部
１０２、２０２処理部
１０２ａ、２０２ｂ類似文字テーブル
１０２ｂ、２０２ａ文字認識辞書
１０３、２０３認識結果出力部
２０２ｃ類似文字認識辞書
２０４インターフェース
３０１入力装置
３０２ディスプレイ
３０３ＲＡＭ
３０３ａ各種データ
３０４ＲＯＭ
３０５ＨＤＤ
３０５ａ文字認識プログラム
３０６ＣＰＵ
３０６ａ文字認識プロセス
３０７媒体読取装置
Ａ、Ｂ文字カテゴリ分布 DESCRIPTION OF SYMBOLS 10 Feature space 50 Common distribution 100, 200, 300 Character identification device 101, 201 Image input part 102, 202 Processing part 102a, 202b Similar character table 102b, 202a Character recognition dictionary 103, 203 Recognition result output part 202c Similar character recognition dictionary 204 Interface 301 Input device 302 Display 303 RAM
303a Various data 304 ROM
305 HDD
305a Character recognition program 306 CPU
306a Character recognition process 307 Media reader A, B Character category distribution

Claims

On the computer,
A common distribution processing procedure for identifying a character category distribution having different shapes and including agreed characters on the feature space for each character to be compared, and obtaining information common to each character category distribution as a common distribution,
A correction procedure for correcting each character category distribution based on the common distribution and each character category distribution;
When a character to be identified is acquired, a character identification procedure for identifying the character is executed based on the position of the character in the feature space and each character category distribution corrected by the correction procedure. Character recognition program

The correction procedure corrects the character category distribution by performing convolution integration using the common distribution on the character category distribution, and the character identification procedure is based on the corrected character category distribution. The character identification program according to claim 1 for identifying a character.

3. The method according to claim 1, wherein when the common distribution processing procedure obtains the common distribution, the difference between the character category distributions is reduced, and the common distribution is obtained based on information common to the character category distributions. The character identification program described.

The common distribution processing procedure calculates information common to the image data based on image data input by a user, adds the calculated information to the common distribution, and newly obtains a common distribution. 4. The character identification program according to any one of 3.

On the feature space, a common distribution processing unit that specifies a character category distribution that includes differently shaped and consensus characters for each character to be compared, and obtains information common to each character category distribution as a common distribution;
A correction unit for correcting each character category distribution based on the common distribution and each character category distribution;
A character identification unit that identifies the character based on the position of the character in the feature space and each character category distribution corrected by the correction unit when a character to be identified is acquired; Character identification device.