JPH09311909A

JPH09311909A - Character recognition method and device therefor

Info

Publication number: JPH09311909A
Application number: JP8150095A
Authority: JP
Inventors: Takuya Okamoto; 卓哉岡本; Hisafumi Azuma; 尚史東
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1996-05-21
Filing date: 1996-05-21
Publication date: 1997-12-02

Abstract

PROBLEM TO BE SOLVED: To speed up a Japanese optical character reader(OCR). SOLUTION: The grouping of characters is executed for each feature amount and the number of matching object characters is reduced on the respective stages of matching so that recognition can be accelerated. At a group preparing part 103, the grouping results of respective features are stored in a group dictionary 104. At a representative feature amount dictionary preparing part 105, the feature amounts on behalf of groups are stored in a representative feature amount dictionary 106. When a character pattern 107 is inputted, the feature amount is extracted by a feature amount extracting part 108. At a matching processing part 109, the representative feature amount dictionary 106 corresponding to the candidate character of a candidate character storage table 110 is read out and stored in an evaluation value table 111 as the evaluation value of matching with all the characters belonging to the character group by performing matching processing and at a candidate character narrowing part 112, the candidate characters are narrowed down. By repeating this processing, the final candidate character is outputted.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、ＯＣＲなどで用い
られる文字認識技術に関し、特に日本語ＯＣＲにおいて
認識速度の向上を図ることができる文字認識方法および
装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition technique used in OCR or the like, and more particularly to a character recognition method and apparatus capable of improving recognition speed in Japanese OCR.

【０００２】[0002]

【従来の技術】日本語では多種多様な文字種が存在する
ため、日本語ＯＣＲ（光学文字読み取り装置）において
は、文字認識の高速性と高精度に文字認識を行うことと
を両立することが難しい。処理量の小さな文字認識手法
で認識すれば、高速性は実現できるが認識率が低下する
ことになる。逆に認識精度を向上するために、複雑な評
価を行う手法では、処理量が大きくなり、認識速度の低
下を招く。2. Description of the Related Art Since there are various types of characters in Japanese, it is difficult for a Japanese OCR (optical character reader) to achieve both high speed character recognition and highly accurate character recognition. . If the character recognition method with a small processing amount is used for recognition, high speed can be realized, but the recognition rate is reduced. On the contrary, in the method of performing the complicated evaluation in order to improve the recognition accuracy, the processing amount becomes large and the recognition speed is lowered.

【０００３】高い認識率を実現しながら高速な文字認識
を実現するために、従来より、文字パターンから複数の
特徴を抽出し、処理の容易な特徴や特徴の次元数の少な
い特徴から順に、大分類、中分類、詳細分類というよう
に階層的に分類することにより処理の高速化を図る手法
が用いられている。そのような手法は、例えば特開平7-
306916号「ニューラルネットワークによる認識方法」な
どに開示されている。In order to realize high-speed character recognition while realizing a high recognition rate, conventionally, a plurality of features are extracted from a character pattern, and the features that are easy to process and the features having a small number of dimensions are selected in descending order. Hierarchical classification such as classification, middle classification, and detailed classification is used to speed up the process. Such a method is disclosed in, for example, Japanese Patent Laid-Open No. 7-
No. 306916, "Recognition method using neural network" and the like.

【０００４】[0004]

【発明が解決しようとする課題】従来より用いられてい
る階層的な絞り込みを行う手法によると、大分類に用い
る特徴は、特徴の次元数が少なく、分類能力の低いもの
となる。次元数の少ない特徴を大分類に多数利用するこ
とにより、高い絞り込みを実現する方法も存在するが、
特徴抽出に処理時間がかかるため、全体としての認識速
度の向上は難しい。一般的に階層的な絞り込み方法にお
いては、中分類、詳細分類と後段になるにつれ、次元数
の大きな特徴を利用するため、大分類での絞り込みが十
分に行われないと、処理時間の増大を招くという問題が
ある。特に漢字や仮名文字のように文字種が多く、また
類似した文字が多数存在する文字を認識する場合、従来
の大分類方法による絞り込みの割合は十分とは言えな
い。According to the conventional hierarchical narrowing-down method, the features used for large classification have a small number of dimensions of the features and low classification ability. There is also a method of realizing high narrowing down by using many features with a small number of dimensions for large classification,
It takes a lot of processing time to extract the features, so it is difficult to improve the overall recognition speed. Generally, in the hierarchical narrowing method, since the features with a large number of dimensions are used in the later stages of middle classification and detailed classification, the processing time increases if the large classification is not sufficiently narrowed down. There is a problem of inviting. In particular, when recognizing a character having many character types such as kanji and kana characters and having many similar characters, the narrowing down ratio by the conventional large classification method cannot be said to be sufficient.

【０００５】本発明は、上述の従来技術における問題点
に鑑み、高い認識率を実現しながら高速な文字認識を実
現できる文字認識方法および装置を提供することを目的
とする。In view of the above-mentioned problems in the prior art, it is an object of the present invention to provide a character recognition method and device capable of realizing high-speed character recognition while realizing a high recognition rate.

【０００６】[0006]

【課題を解決するための手段】上記目的を達成するた
め、本発明は、上記の階層的な分類の各段階で、類似し
た特徴を持つ文字をあらかじめ統合（グループ化）し、
各段階でのマッチング対象文字数を削減する方法を付加
することにより、認識速度の向上を実現しようとするも
のである。In order to achieve the above object, the present invention integrates (groups) characters having similar characteristics in advance at each stage of the above hierarchical classification,
By adding a method for reducing the number of matching target characters at each stage, the recognition speed is improved.

【０００７】すなわち、請求項１に係る発明は、文書画
像を認識し、コード情報に変換する文字認識方法におい
て、文字の標準的な文字パターンから文字認識用の複数
の特徴を抽出し、抽出された特徴量を特徴量辞書に格納
するステップと、該特徴量辞書に格納された複数の特徴
について、各特徴毎に、類似した特徴を持つ文字をグル
ープ化することにより複数の文字グループを作成すると
ともに、各文字グループの代表となる代表文字を定める
ステップと、文書画像から抽出された認識対象の文字パ
ターンから文字認識用の複数の特徴を抽出するステップ
と、各特徴毎に、認識対象の文字パターンから得られた
特徴量とその特徴に対して作成した各文字グループの代
表文字の特徴量である代表特徴量とのマッチング処理を
実施し、マッチングの評価値を求める評価値取得ステッ
プと、求めた評価値を、認識対象の文字パターンと各文
字グループに属するすべての文字とのマッチングの評価
値として用いることにより、候補文字の絞り込みを行う
候補文字絞り込みステップとを備えたことを特徴とす
る。That is, the invention according to claim 1 is a character recognition method for recognizing a document image and converting it into code information, by extracting a plurality of features for character recognition from a standard character pattern of a character and extracting the extracted features. A step of storing the feature quantity in the feature quantity dictionary, and creating a plurality of character groups by grouping characters having similar features for each of the plurality of characteristics stored in the feature quantity dictionary. At the same time, a step of defining a representative character that is a representative of each character group, a step of extracting a plurality of features for character recognition from a character pattern of the recognition target extracted from the document image, and a character of the recognition target for each feature The matching process is performed by matching the feature amount obtained from the pattern with the representative feature amount that is the feature amount of the representative character of each character group created for that feature. By using the evaluation value acquisition step to obtain the evaluation value of, and the obtained evaluation value as the evaluation value of the matching between the character pattern to be recognized and all the characters belonging to each character group, candidate characters for narrowing down the candidate characters And a narrowing step.

【０００８】請求項２に係る発明は、文書画像を認識
し、コード情報に変換する文字認識方法において、文字
の標準的な文字パターンから文字認識用の複数の特徴を
抽出し、抽出された特徴量を特徴量辞書に格納するステ
ップと、該特徴量辞書に格納された複数の特徴につい
て、各特徴毎に、類似した特徴を持つ文字をグループ化
することにより複数の文字グループを作成するととも
に、各文字グループの代表となる代表文字を定めるステ
ップと、各特徴毎に、各文字がどの文字グループに属す
るかを示す情報を格納したグループ辞書を作成するステ
ップと、各特徴毎に、その特徴に関する各文字グループ
の代表文字の特徴量である代表特徴量を格納した代表特
徴量辞書を作成するステップと、文書画像から抽出され
た認識対象の文字パターンから文字認識用の複数の特徴
を抽出するステップと、特徴毎に、認識対象の文字パタ
ーンから得られた特徴量とその特徴に関する各文字グル
ープの代表特徴量とのマッチング処理を実施し、マッチ
ングの評価値を求める評価値取得ステップと、求めた評
価値を、認識対象の文字パターンと各文字グループに属
するすべての文字とのマッチングの評価値として、各文
字の評価値テーブルに格納するステップと、各文字につ
いて特徴毎に該評価値テーブルに格納された評価値を用
いて、候補文字の絞り込みを行う候補文字絞り込みステ
ップとを備えたことを特徴とする。According to a second aspect of the present invention, in a character recognition method for recognizing a document image and converting it into code information, a plurality of characteristics for character recognition are extracted from a standard character pattern of characters, and the extracted characteristics are extracted. Storing a quantity in a feature amount dictionary, and for a plurality of features stored in the feature amount dictionary, creating a plurality of character groups by grouping characters having similar features for each feature, A step of defining a representative character that is a representative of each character group, a step of creating a group dictionary storing information indicating which character group each character belongs to for each feature, and a feature related to each feature A step of creating a representative feature amount dictionary storing representative feature amounts which are the feature amounts of the representative characters of each character group, and the character pattern of the recognition target extracted from the document image. A step of extracting a plurality of features for character recognition from the character recognition process, and for each feature, perform a matching process between the feature amount obtained from the character pattern to be recognized and the representative feature amount of each character group related to that feature, An evaluation value acquisition step of obtaining an evaluation value, a step of storing the obtained evaluation value in the evaluation value table of each character as an evaluation value of matching between the character pattern to be recognized and all the characters belonging to each character group, And a candidate character narrowing step of narrowing down candidate characters by using the evaluation value stored in the evaluation value table for each character.

【０００９】請求項３に係る発明は、請求項１または２
に記載の文字認識方法において、文字認識用の複数の特
徴に関する特徴量を第ｎ特徴量（ｎ＝１，２，３，…）
と呼ぶとき、前記評価値取得ステップおよび候補文字絞
り込みステップで、第ｎ特徴量に関するマッチングと絞
り込みを行った後に、次の第ｎ＋１特徴量に関するマッ
チングと絞り込みを行う際、第ｎ特徴量に関する絞り込
みが行われた候補文字についてそれらの候補文字が含ま
れる第ｎ＋１特徴量の文字グループをすべて求め、次の
第ｎ＋１特徴量に関する前記評価値取得ステップでは、
求めたグループについて第ｎ＋１特徴量に関するマッチ
ングを行うようにしたものである。The invention according to claim 3 is the invention according to claim 1 or 2.
In the character recognition method described in [3], the feature quantity relating to a plurality of features for character recognition is set to the nth feature quantity (n = 1, 2, 3, ...
In the evaluation value acquisition step and the candidate character narrowing step, after performing matching and narrowing down on the nth feature amount, when performing matching and narrowing down on the next n + 1th feature amount, the narrowing down regarding the nth feature amount is performed. For the performed candidate characters, all character groups of the (n + 1) th feature amount including those candidate characters are obtained, and in the evaluation value acquisition step for the next (n + 1) th feature amount,
The obtained group is matched with the (n + 1) th feature amount.

【００１０】請求項４に係る発明は、請求項１または２
に記載の文字認識方法において、前記文字のグループ化
は、前記特徴量辞書に格納されたすべての文字につい
て、その文字が代表文字であったと仮定して該文字から
所定の閾値の距離内に含まれる文字により仮のグループ
を作成して、その仮のグループに属する文字数を求めて
おき、該文字数が大きい仮のグループから正規のグルー
プとして決定していくとともに、複数のグループに含ま
れる文字があったときは、その文字を代表文字の特徴量
までの距離が近い方のグループに属するようにしたもの
である。[0010] The invention according to claim 4 is the invention according to claim 1 or 2.
In the character recognition method described in (1), the grouping of the characters is included within a predetermined threshold distance from all the characters stored in the feature dictionary, assuming that the character is a representative character. A temporary group is created from the characters that are written, the number of characters that belong to that temporary group is calculated, and the temporary group with the largest number of characters is determined as the regular group. In this case, the character belongs to a group having a shorter distance to the characteristic amount of the representative character.

【００１１】請求項５に係る発明は、文書画像を認識
し、コード情報に変換する文字認識装置において、文字
の標準的な文字パターンから抽出した文字認識用の複数
の特徴に関する特徴量を格納した特徴量辞書と、該特徴
量辞書に格納された複数の特徴について、各特徴毎に、
類似した特徴を持つ文字をグループ化して、各文字がど
の文字グループに属するかを示す情報を格納したグルー
プ辞書と、各特徴毎に、作成された文字グループの代表
となる文字の特徴量である代表特徴量を格納した代表特
徴量辞書と、認識対象の文字パターンの入力手段と、前
記入力手段により入力された認識対象の文字パターンか
ら文字認識用の複数の特徴を抽出する手段と、入力され
た認識対象の文字パターンから得られた特徴量と各文字
グループの代表特徴量とのマッチング処理を特徴毎に実
施し、マッチングの評価値を求める手段と、該評価値を
認識対象の文字パターンと各文字グループに属するすべ
ての文字とのマッチングの評価値として各文字の評価値
テーブルに格納する手段と、各文字について特徴毎に前
記評価値テーブルに格納された評価値を用いて、候補文
字の絞り込みを行う候補文字絞り込み手段と、候補文字
絞り込み手段で最終的に得られた候補文字である認識結
果を出力する手段とを備えたことを特徴とする。According to a fifth aspect of the present invention, in a character recognition apparatus for recognizing a document image and converting it into code information, characteristic quantities relating to a plurality of character recognition features extracted from a standard character pattern of a character are stored. For a feature dictionary and a plurality of features stored in the feature dictionary, for each feature,
A group dictionary that stores information indicating which character group each character belongs to by grouping characters having similar characteristics, and a characteristic amount of a character that is a representative of the created character group for each characteristic. A representative feature quantity dictionary storing representative feature quantities, input means for inputting a character pattern to be recognized, means for extracting a plurality of features for character recognition from the character pattern to be recognized input by the input means, and Means for performing the matching process of the feature amount obtained from the character pattern of the recognition target and the representative feature amount of each character group for each feature, and obtaining the evaluation value of the matching, and the evaluation value as the character pattern of the recognition target. Means for storing in the evaluation value table of each character as an evaluation value for matching with all characters belonging to each character group, and the evaluation value table for each feature for each character A candidate character narrowing means for narrowing down candidate characters using the stored evaluation value; and means for outputting a recognition result which is a candidate character finally obtained by the candidate character narrowing means. To do.

【００１２】[0012]

【発明の実施の形態】以下、図面を用いて本発明の実施
の形態を説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００１３】図１に、本発明に係る文字認識装置のブロ
ック図を示す。本装置での処理内容の概要は以下に示す
通りである。FIG. 1 shows a block diagram of a character recognition apparatus according to the present invention. The outline of the processing contents in this device is as follows.

【００１４】文字認識装置（１０１）では、あらかじめ
全文字について、文字の標準パターンから抽出した複数
の特徴に係る特徴量（第１特徴量、第２特徴量、…と呼
ぶものとする）を格納した特徴量辞書（１０２）が作成
されている。グループ作成部（１０３）では、特徴量辞
書（１０２）から各文字の特徴量を読み出し、特徴毎
に、その特徴に関して類似している文字のグループ（文
字グループ）にグループ分けし、各グループの代表とな
る文字の文字番号を求める。代表となる文字の文字番号
は、各文字のグループ番号として、特徴毎に作成したグ
ループ辞書（１０４）に格納する。本図では、第１特徴
量の文字グループは２次記憶装置１０４−ａ、第２特徴
量の文字グループは２次記憶装置１０４−ｂに格納す
る。特徴量辞書（１０２）とグループ辞書（１０４）の
構造については、図２および図３で後述する。In the character recognition device (101), a feature amount (referred to as a first feature amount, a second feature amount, ...) Associated with a plurality of features extracted from a standard character pattern is stored in advance for all characters. The feature amount dictionary (102) is created. The group creating unit (103) reads out the feature amount of each character from the feature amount dictionary (102), divides each feature into groups of similar characters (character groups) with respect to the feature, and represents each group. Find the character number of the character. The character number of the representative character is stored in the group dictionary (104) created for each feature as the group number of each character. In this figure, the character group having the first characteristic amount is stored in the secondary storage device 104-a, and the character group having the second characteristic amount is stored in the secondary storage device 104-b. The structures of the feature dictionary (102) and the group dictionary (104) will be described later with reference to FIGS. 2 and 3.

【００１５】次に、代表特徴量辞書作成部（１０５）で
は、グループ辞書（１０４）から各文字グループの代表
となる文字を抽出し、抽出した文字の特徴量を特徴量辞
書（１０２）から読み出し、文字グループの代表特徴量
として代表特徴量辞書（１０６）に格納する。本図で
は、第１特徴量の代表特徴量は２次記憶装置１０６−
ａ、第２特徴量の代表特徴量は２次記憶装置１０６−ｂ
に格納する。代表特徴量辞書（１０６）については、図
４で後述する。Next, in the representative feature quantity dictionary creating section (105), the representative character of each character group is extracted from the group dictionary (104), and the feature quantity of the extracted character is read out from the feature quantity dictionary (102). , Is stored in the representative feature amount dictionary (106) as the representative feature amount of the character group. In this figure, the representative feature amount of the first feature amount is the secondary storage device 106-
a, the representative feature amount of the second feature amount is the secondary storage device 106-b.
To be stored. The representative feature amount dictionary (106) will be described later with reference to FIG.

【００１６】以上の辞書（１０２，１０４，１０６）の
作成は、文字認識処理の前処理としてあらかじめ行って
おく。Creation of the above-mentioned dictionaries (102, 104, 106) is performed in advance as a pre-process of the character recognition process.

【００１７】次に、文字パターン（１０７）が文字認識
装置（１０１）に入力されると、まず特徴量抽出部（１
０８）が、入力された文字パターン（１０７）から認識
に用いる特徴量（第１特徴量、第２特徴量、…）を抽出
する。抽出された特徴量は、マッチング処理部（１０
９）に出力される。マッチング処理部（１０９）では、
マッチング処理の前準備として、特徴量辞書（１０２）
に登録されているすべての文字を、候補文字テーブル
（１１０）に候補文字として初期設定する（候補文字テ
ーブル（１１０）については図５で詳しく説明する）。
次に、マッチング処理部（１０９）は、第１特徴量、第
２特徴量、…の順にマッチング処理を行い候補文字テー
ブル（１１０）上の候補文字を絞り込んでいく。第ｎ特
徴量に関するマッチング処理の概要は次の通りである。
まず、候補文字テーブル（１１０）に格納されている候
補文字について、それらの候補文字が含まれる第ｎ特徴
量の文字グループのグループ番号をすべて求める。そし
て、それらのグループについて、代表特徴量辞書（１０
６）に格納されている代表特徴量と、入力文字パターン
（１０７）の特徴量とのマッチング処理を行う。マッチ
ング結果は、評価値テーブル（１１１）に格納する。評
価値テーブル（１１１）は、全文字について評価値を格
納する領域を備えており（評価値テーブル（１１１）に
ついては図６で詳しく説明する）、代表特徴量とのマッ
チング結果はその文字グループ内のすべての候補文字の
評価値として設定する。したがって、代表特徴量との１
回のマッチング処理で複数の文字のマッチング結果が得
られることになり、マッチングの処理時間を削減するこ
とが可能になる。次に、候補文字絞り込み部（１１２）
で、評価値テーブル（１１１）に格納された各文字の評
価値に基づいて、候補文字テーブル（１１０）上の候補
文字を絞り込む。以上のように、第ｎ特徴量に関するマ
ッチングと候補文字の絞り込みが行われる。これを各特
徴量について順次実行していき、認識の最終結果が得ら
れた段階で、認識結果出力部（１１３）から認識結果
（１１４）を出力する。この認識結果（１１４）は、大
分類あるいは中分類のレベルまで絞り込んだ候補文字の
集合である。Next, when the character pattern (107) is input to the character recognition device (101), first, the feature amount extraction unit (1)
08) extracts the feature amount (first feature amount, second feature amount, ...) Used for recognition from the input character pattern (107). The extracted feature amount is stored in the matching processing unit (10
It is output to 9). In the matching processing unit (109),
As a preparation for the matching process, the feature quantity dictionary (102)
All the characters registered in (1) are initialized as candidate characters in the candidate character table (110) (the candidate character table (110) will be described in detail in FIG. 5).
Next, the matching processing unit (109) narrows down the candidate characters on the candidate character table (110) by performing the matching process in the order of the first feature amount, the second feature amount, .... The outline of the matching process regarding the nth feature amount is as follows.
First, for the candidate characters stored in the candidate character table (110), all the group numbers of the character groups of the nth feature amount including those candidate characters are obtained. Then, for those groups, the representative feature amount dictionary (10
A matching process of the representative feature amount stored in 6) and the feature amount of the input character pattern (107) is performed. The matching result is stored in the evaluation value table (111). The evaluation value table (111) has an area for storing evaluation values for all characters (the evaluation value table (111) will be described in detail with reference to FIG. 6), and the matching result with the representative feature amount is within the character group. Set as the evaluation value of all candidate characters of. Therefore, 1 with the representative feature
Since the matching result of a plurality of characters can be obtained by performing the matching process once, it is possible to reduce the matching processing time. Next, the candidate character narrowing unit (112)
Then, the candidate characters in the candidate character table (110) are narrowed down based on the evaluation value of each character stored in the evaluation value table (111). As described above, the matching regarding the nth feature amount and the narrowing down of the candidate characters are performed. This is sequentially performed for each feature amount, and when the final recognition result is obtained, the recognition result output unit (113) outputs the recognition result (114). The recognition result (114) is a set of candidate characters narrowed down to the level of large classification or middle classification.

【００１８】図２に、特徴量辞書（１０２）の内容を示
す。本図に示すように、特徴量辞書（１０２）には、辞
書内での文字番号（２０１）とその文字が持つ特徴量
（２０２−ａ，２０２−ｂ，２０２−ｃ，．．．）が格
納される。各特徴量は多次元の値を持ち、それぞれの次
元の値が特徴量の要素として格納される。FIG. 2 shows the contents of the feature quantity dictionary (102). As shown in the figure, in the feature quantity dictionary (102), the character number (201) in the dictionary and the feature quantity (202-a, 202-b, 202-c, ...) Having that character are stored. Is stored. Each feature amount has a multidimensional value, and each dimension value is stored as an element of the feature amount.

【００１９】図３に、グループ辞書（１０４）の内容を
示す。本図に示すように、グループ辞書（１０４）に
は、文字番号（３０１）とその文字の属するグループ番
号（３０２−ａ，３０２−ｂ，３０２−ｃ，．．．）が
特徴量毎に格納される。グループ番号としては、そのグ
ループの代表文字の文字番号（すなわち、代表特徴量を
もつ文字の文字番号）を用いる。例えば、図３の第１特
徴量のグループ番号（３０２−ａ）から、第１特徴量に
関して、文字番号１の文字は代表が文字番号３の文字で
あるグループに属し、文字番号２の文字は代表が文字番
号２（自分自身）の文字であるグループに属し、文字番
号３の文字は代表が文字番号３（自分自身）の文字であ
るグループに属し、文字番号４の文字は代表が文字番号
２の文字であるグループに属していることが分かる。他
の特徴量に関しても同様である。FIG. 3 shows the contents of the group dictionary (104). As shown in the figure, a character number (301) and a group number (302-a, 302-b, 302-c, ...) To which the character belongs are stored in the group dictionary (104) for each feature amount. To be done. As the group number, the character number of the representative character of the group (that is, the character number of the character having the representative feature amount) is used. For example, from the group number (302-a) of the first characteristic amount in FIG. 3, regarding the first characteristic amount, the character with the character number 1 belongs to the group whose representative is the character with the character number 3, and the character with the character number 2 is The representative belongs to the group whose character number is 2 (self), the character of character number 3 belongs to the group whose representative is character of character 3 (self), and the character of character number 4 is the character number It can be seen that it belongs to the group which is the character of 2. The same applies to other feature quantities.

【００２０】図４に、代表特徴量辞書（１０６）の内容
を示す。本図は第１特徴量の代表特徴量辞書（１０６−
ａ）を示しており、グループ番号（４０１）とそのグル
ープの代表特徴量（４０２）が格納されている。上述し
たように、代表特徴量（４０２）は、そのグループの代
表文字（文字番号がグループ番号（４０１）である文
字）の特徴量である。他の特徴量についても、同様の形
式で特徴量が格納される。以上の図２〜図４の辞書（１
０２，１０４，１０６）は、文字認識処理を行う前処理
として作成しておく。FIG. 4 shows the contents of the representative feature quantity dictionary (106). This figure shows a representative feature quantity dictionary (106-
9A shows a group number (401) and a representative feature amount (402) of the group are stored. As described above, the representative characteristic amount (402) is the characteristic amount of the representative character (character whose character number is the group number (401)) of the group. For other characteristic amounts, the characteristic amounts are stored in the same format. The dictionary (1
02, 104, 106) are created as a pre-process for performing the character recognition process.

【００２１】図５に、候補文字テーブル（１１０）の内
容を示す。候補文字テーブル（１１０）は、文字パター
ン（１０７）を入力して文字認識を行う際に使用する動
的なテーブルである。候補文字テーブル（１１０）に
は、候補文字数（５０１）と候補文字の文字番号（５０
２）が格納される。候補文字テーブル（１１０）は、文
字パターンを１文字入力して文字認識を行う際に初期化
（初期化では特徴量辞書（１０２）に存在するすべての
文字の文字番号を設定する）され、各特徴量についてマ
ッチングを行う毎にその候補文字を絞り込んでいくため
に用いる。FIG. 5 shows the contents of the candidate character table (110). The candidate character table (110) is a dynamic table used when performing character recognition by inputting a character pattern (107). In the candidate character table (110), the number of candidate characters (501) and the character number of the candidate character (50
2) is stored. The candidate character table (110) is initialized (character numbers of all characters existing in the feature quantity dictionary (102) are set) when performing character recognition by inputting one character pattern. It is used to narrow down the candidate characters each time the feature amount is matched.

【００２２】図６に、評価値テーブル（１１１）の内容
を示す。評価値テーブル（１１１）は、文字パターン
（１０７）を入力して文字認識を行う際に使用するテー
ブルである。評価値テーブル（１１１）には、各文字の
文字番号（６０１）とその文字に関する特徴量毎の評価
値（６０２−ａ，６０２−ｂ，６０２−ｃ，．．．）と
が格納される。文字番号（６０１）のフィールドには、
あらかじめ特徴量辞書（１０２）に存在するすべての文
字の文字番号が設定されている。FIG. 6 shows the contents of the evaluation value table (111). The evaluation value table (111) is a table used when the character pattern (107) is input for character recognition. The evaluation value table (111) stores the character number (601) of each character and the evaluation value (602-a, 602-b, 602-c, ...) For each feature amount related to the character. In the field of character number (601),
Character numbers of all the characters existing in the feature dictionary (102) are set in advance.

【００２３】図７は、図１の装置における認識処理の流
れを表すフローチャートである。まず文字パターン（１
０７）を読み出し（処理７０１）、入力された文字パタ
ーンの特徴量を抽出する（処理７０２）。次に、認識対
象の候補文字の初期値として、候補文字テーブル（１１
０）にすべての文字を候補として設定する（処理７０
３）。入力した文字パターン（１０７）の認識処理は、
特徴番号のループカウンタｎに初期値として１を代入し
（処理７０４）、各特徴量について、マッチングおよび
候補文字の絞り込みを繰り返すことで実現する。この繰
り返し処理について以下説明する。FIG. 7 is a flow chart showing the flow of recognition processing in the apparatus of FIG. First, the character pattern (1
07) is read (process 701), and the feature amount of the input character pattern is extracted (process 702). Next, as an initial value of the candidate character to be recognized, the candidate character table (11
All characters are set as candidates in 0) (process 70).
3). The recognition process of the input character pattern (107) is
It is realized by substituting 1 as an initial value into the loop counter n of the feature number (process 704) and repeating matching and narrowing down candidate characters for each feature amount. This repetitive processing will be described below.

【００２４】まず、第ｎ特徴量について、現在候補文字
テーブル（１１０）に設定されている候補文字を含む文
字グループのグループ番号（マッチング対象のグルー
プ）をすべて求める（処理７０５）。処理７０５の内容
は、図８を用いて後で詳述する。次に、処理７０５で得
られたすべての文字グループの代表特徴量と処理７０２
で抽出された入力文字パターンの特徴量との第ｎ特徴量
に関するマッチング処理を行い、評価値を得る。得られ
た評価値は、グループ内の文字の評価値として、評価値
テーブル（１１１）に格納する（処理７０６）。文字グ
ループの代表特徴量とのマッチングで得た評価値を、そ
の文字グループに属するすべての候補文字の評価値とし
て評価値テーブル（１１１）に格納することになるの
で、その文字グループ内に複数の候補文字がある場合、
代表特徴量を用いた１回のマッチング処理でその文字グ
ループの全候補文字とのマッチング処理が行われたのと
同じことになり、マッチング処理が軽減され認識速度の
向上を図ることができる。First, for the n-th feature amount, all group numbers (matching groups) of character groups including candidate characters currently set in the candidate character table (110) are obtained (process 705). Details of the process 705 will be described later with reference to FIG. Next, the representative feature amount of all the character groups obtained in the process 705 and the process 702.
A matching process is performed on the n-th feature amount with the feature amount of the input character pattern extracted in step S5 to obtain an evaluation value. The obtained evaluation value is stored in the evaluation value table (111) as the evaluation value of the character in the group (process 706). Since the evaluation value obtained by matching with the representative feature amount of the character group is stored in the evaluation value table (111) as the evaluation value of all the candidate characters belonging to the character group, a plurality of evaluation values are stored in the character group. If there are candidate characters,
This is the same as performing the matching process with all the candidate characters in the character group in one matching process using the representative feature amount, and the matching process can be reduced and the recognition speed can be improved.

【００２５】処理７０６のマッチング処理が終了する
と、ここで得られた評価値を基に、候補文字テーブル
（１１０）の候補文字の絞り込み（評価値が所定値以上
の候補文字が残るように候補文字テーブル（１１０）内
の候補文字を再設定）を実施する（処理７０７）。第ｎ
特徴量についての認識処理が終了すると、特徴番号のル
ープカウンタｎを１カウントアップする（処理７０
８）。そして、マッチング終りか否か判定する（処理７
０９）。すべての特徴量によるマッチングが終わってお
らず、かつ、候補文字の数が未だ所定数以上ある場合
は、次の特徴量によるマッチングと絞り込み処理を実施
するため、処理７０５に戻る。すべての特徴量における
マッチングと絞り込み処理が終わるか、または、途中段
階で候補文字の数が所定数を下回って認識候補が確定し
たら（処理７０９）、マッチングと絞り込み処理を終了
し、得られた認識候補（候補文字テーブル（１１０）に
残っている候補文字）を出力する（処理７１０）。When the matching process of step 706 is completed, the candidate characters in the candidate character table (110) are narrowed down based on the evaluation value obtained here (candidate characters whose evaluation value is equal to or more than a predetermined value remain. The candidate character in the table (110) is reset (process 707). Nth
When the recognition process for the feature amount is completed, the loop counter n for the feature number is incremented by 1 (process 70).
8). Then, it is determined whether or not the matching is over (process 7).
09). If the matching with all the feature amounts is not completed and the number of candidate characters is still a predetermined number or more, the process returns to the process 705 to perform the matching with the next feature amount and the narrowing down process. When the matching and narrowing down processing is completed for all the feature amounts, or when the number of candidate characters falls below a predetermined number and the recognition candidates are confirmed in the middle stage (processing 709), the matching and narrowing down processing is ended and the obtained recognition The candidates (candidate characters remaining in the candidate character table (110)) are output (process 710).

【００２６】図８は、処理７０５のグループ抽出処理の
内容を示すフローチャートである。本処理では、重複す
るグループ番号を除いて、候補文字テーブル（１１０）
の候補文字が含まれるグループ番号（次の処理７０６で
マッチングを行う第ｎ特徴量に関するグループ）をすべ
て求める。図７の処理において、例えば第１特徴量に関
するマッチングおよび絞り込み処理を終了した段階で
は、候補文字テーブル（１１０）には第１特徴量に関す
る絞り込みが行われた後の候補文字が設定されている。
次に、第２特徴量に関するマッチングおよび絞り込み処
理を行う訳だが、既に第１特徴量に関する絞り込みが行
われているので、第２特徴量に関するすべての文字グル
ープの代表特徴量とマッチングを行う必要があるとは限
らない。そこで、図８の処理によって、次にマッチング
を行う第２特徴量に関して、候補文字テーブル（１１
０）に設定されている候補文字が含まれる文字グループ
のグループ番号をすべて求める。第２特徴量のマッチン
グ処理では、このようにして求められたグループ番号に
ついてのみ代表特徴量とマッチングを行えばよいことに
なる。図８は、このようなグループ抽出の処理を行うも
のである。FIG. 8 is a flow chart showing the contents of the group extraction processing of the processing 705. In this process, the candidate character table (110)
All the group numbers including the candidate characters of (the group regarding the nth feature amount to be matched in the next process 706) are obtained. In the process of FIG. 7, for example, at the stage when the matching and narrowing-down process regarding the first feature amount is completed, the candidate character table (110) is set with candidate characters after the narrowing-down of the first feature amount.
Next, the matching and narrowing-down process regarding the second feature amount is performed. Since the first feature amount has already been narrowed down, it is necessary to perform matching with the representative feature amounts of all the character groups regarding the second feature amount. Not necessarily. Therefore, the candidate character table (11
All the group numbers of the character groups including the candidate character set to 0) are obtained. In the matching process of the second characteristic amount, it suffices to perform matching with the representative characteristic amount only for the group number thus obtained. FIG. 8 shows such a group extraction process.

【００２７】まず、処理８０１で初期化処理として、図
９に示すグループ抽出管理テーブル（９０１）を初期化
する。グループ抽出管理テーブル（９０１）は、グルー
プ番号（９０２）と抽出フラグ（９０３）から構成され
ている。抽出フラグ（９０３）が０であるグループ番号
（９０２）のグループはマッチングを行う必要のないグ
ループを示し、抽出フラグ（９０３）が１であるグルー
プ番号（９０２）のグループはマッチング対象を示す。
処理８０１の初期化処理では、いま処理対象としている
第ｎ特徴量のすべてのグループのグループ番号をグルー
プ番号（９０２）のフィールドに設定し、対応する抽出
フラグ（９０３）をすべて０クリアする。第ｎ特徴量の
全グループ番号は、代表特徴量辞書（１０６）を参照し
て得る。First, in the process 801, as the initialization process, the group extraction management table (901) shown in FIG. 9 is initialized. The group extraction management table (901) includes a group number (902) and an extraction flag (903). A group with a group number (902) whose extraction flag (903) is 0 indicates a group that does not need to be matched, and a group with a group number (902) whose extraction flag (903) is 1 indicates a matching target.
In the initialization processing of the processing 801, the group numbers of all the groups of the nth feature quantity which are the processing targets are set in the field of the group number (902), and the corresponding extraction flags (903) are all cleared to 0. All the group numbers of the nth feature amount are obtained by referring to the representative feature amount dictionary (106).

【００２８】次に、処理８０２から８０７で、マッチン
グ対象となる文字グループの抽出フラグ（９０３）をセ
ットする処理を行い、各文字グループが抽出済みか否か
の管理を行う。具体的には、まず処理８０２で候補文字
テーブル（１１０）から候補文字として残っている文字
を１文字読み出す。処理８０３でグループ辞書（１０
４）を読み出し、処理８０２で読み出した候補文字が属
しているグループ番号を得る。処理８０４で、グループ
抽出管理テーブル（９０１）を参照し、処理８０３で得
たグループ番号に対応する抽出フラグ（９０３）を読み
出す。そして、処理８０５で、そのグループが既に認識
対象として抽出されているか否かをチェックする。まだ
抽出されていなければ（すなわち、抽出フラグ（９０
３）が０のとき）、処理８０６で、グループ抽出管理テ
ーブル（９０１）の当該抽出フラグ（９０３）をセット
する（すなわち、１にする）。処理８０７で、候補文字
テーブル（１１０）に未処理の候補文字が残っているか
否かをチェックし、残っていれば処理８０２に戻って次
の文字を読み出し、処理８０３から処理８０６を繰り返
す。残っていなければ、処理８０８で抽出フラグ（９０
３）がセットされているグループのみをマッチング対象
のグループとして抽出して、処理を終了する。Next, in processes 802 to 807, a process of setting the extraction flag (903) of the character group to be matched is performed to manage whether or not each character group has been extracted. Specifically, first, in process 802, one character remaining as a candidate character is read from the candidate character table (110). In processing 803, the group dictionary (10
4) is read and the group number to which the candidate character read in the process 802 belongs is obtained. In process 804, the group extraction management table (901) is referenced, and the extraction flag (903) corresponding to the group number obtained in process 803 is read. Then, in process 805, it is checked whether the group has already been extracted as a recognition target. If not yet extracted (ie, extraction flag (90
(3) is 0), in process 806, the extraction flag (903) of the group extraction management table (901) is set (that is, set to 1). In process 807, it is checked whether or not an unprocessed candidate character remains in the candidate character table (110), and if it remains, the process returns to process 802 to read the next character, and processes 803 to 806 are repeated. If not, the extraction flag (90
Only the group for which 3) is set is extracted as a group to be matched, and the process ends.

【００２９】次に、グループ辞書（１０４）の作成手順
について説明する。図１０および図１１は、特徴量のグ
ルーピング方法の概念を示した図である。いずれも、特
徴の要素数が２の特徴量を想定して、辞書中の各文字の
特徴量を平面上にマッピングした様子を示している。図
１０に示したように、二重丸で示された点（１００１）
が、特徴量の分布を示す。代表特徴量（１００２）か
ら、一定の閾値（１００３）の距離で円を描き、この円
内をグループ化の範囲（１００４）とする。グループ化
の範囲（１００４）内に含まれる文字を抽出し、これら
を１つのグループとする。Next, the procedure for creating the group dictionary (104) will be described. 10 and 11 are diagrams showing the concept of a method of grouping feature amounts. In each case, the feature quantity of each character in the dictionary is mapped on a plane assuming that the feature quantity has two feature elements. As shown in FIG. 10, a dot (1001) indicated by a double circle.
Indicates the distribution of the feature quantity. A circle is drawn from the representative feature quantity (1002) at a distance of a fixed threshold value (1003), and the inside of this circle is set as a grouping range (1004). Characters included in the grouping range (1004) are extracted, and these are set as one group.

【００３０】具体的には、各特徴量について、特徴量辞
書（１０２）中の文字すべてに対して、その文字の特徴
量が代表特徴量（１００２）であると想定して図１０の
ように所定の閾値（１００３）でグループを作ったと
き、そのグループ内に含まれる文字数を求めておく。そ
して、より多くの文字がそのグループ内に含まれるよう
な文字から順に取り出して、もしその文字が既に他の文
字グループに含まれていた場合はその文字は代表とせ
ず、そうでない場合には、その文字を代表としその文字
の持つ特徴量を代表特徴量（１００２）とする。この処
理を繰り返すことで、すべての代表特徴量の設定および
グルーピングの処理を行う。本処理の内容の詳細につい
ては、図１２のフローチャートを用いて後述する。Specifically, for each feature amount, assuming that the feature amount of the character is the representative feature amount (1002) for all the characters in the feature amount dictionary (102), as shown in FIG. When a group is created with a predetermined threshold value (1003), the number of characters included in the group is calculated. Then, in order from the character in which more characters are included in the group, if the character is already included in another character group, the character is not represented, otherwise, The character is represented and the characteristic amount of the character is set as the representative characteristic amount (1002). By repeating this process, all the representative feature amount setting and grouping processes are performed. Details of the contents of this processing will be described later with reference to the flowchart of FIG.

【００３１】また、上記の処理で１つの文字が複数のグ
ループの範囲内に属する場合、図１１に示したように最
も近い代表特徴量のグループに含めるものとする。すな
わち、複数のグループの範囲内に含まれる文字（１１０
１）が存在する場合は、含まれるグループの代表特徴量
（代表特徴量１：１１０２−ａ，代表特徴量２：１１０
２−ｂ）までの距離（代表特徴量１までの距離：１１０
３−ａ，代表特徴量２までの距離：１１０３−ｂ）をそ
れぞれ求める。そして、この複数のグループに含まれる
文字（１１０１）は、得られた距離が最も近い代表特徴
量のグループ（本図では、代表特徴量１（１１０２−
ａ）のグループ）に含まれる文字として登録する。If one character belongs to the range of a plurality of groups in the above process, it is included in the group of the closest representative feature amount as shown in FIG. That is, the characters (110
1) exists, the representative feature amount of the included group (representative feature amount 1: 1102-a, representative feature amount 2: 110).
2-b) distance (distance to representative feature amount 1: 110
3-a, distance to the representative feature amount 2: 1103-b), respectively. Then, the characters (1101) included in the plurality of groups are the groups of the representative feature amount having the closest obtained distance (in the figure, the representative feature amount 1 (1102-
Register as characters included in group a).

【００３２】グルーピングに用いる距離の閾値がＢとい
う値で与えられた場合、ｎ次元の特徴であれば、文字が
グループに含まれるか否かは、次の数１に示す数式で判
定できる。数１のＸi（i=1...n）が文字の持つ特徴量で
あり、Ｇi（i=1...n）が代表特徴量である。数１の右辺
が、Ｘｉ(i=1...n)とＧi(i=1...n)の距離を二乗した値
となる。When the threshold value of the distance used for the grouping is given by the value B, whether the character is included in the group or not can be determined by the mathematical expression shown in the following Equation 1 if it is an n-dimensional feature. Xi (i = 1 ... n) in the equation 1 is the feature quantity of the character, and Gi (i = 1 ... n) is the representative feature quantity. The right side of Expression 1 is a value obtained by squaring the distance between Xi (i = 1 ... n) and Gi (i = 1 ... n).

【００３３】[0033]

【数１】 [Equation 1]

【００３４】図１２は、特徴量のグルーピング処理の内
容を示したフローチャートである。処理１２０１では、
代表特徴量を決定するために、特徴量辞書（１０２）中
のすべての文字について、各文字の特徴量を代表特徴量
とした場合のグループに含まれる文字数を算出する。次
に、処理１２０２では、処理１２０１で得られた文字数
に対し、その文字数の多い順に文字を取り出す。取り出
された文字がグループに登録されていない（初期段階で
は、すべての文字がグループに登録されていない。）文
字であれば（処理１２０３）、この文字の特徴を代表特
徴量とし、この代表特徴量からの距離が閾値以内の文字
のグループを作成する（処理１２０４）。そして、その
作成したグループの中に、既に他のいずれかのグループ
に属している文字があれば、既に属しているグループ代
表特徴量とこの文字の距離と、いま登録しようとしてい
るグループの代表特徴量とこの文字の距離を比較し、よ
り近い方のグループにだけこの文字を登録する（処理１
２０５）。すべての文字について、いずれかのグループ
に属するまで、処理１２０２から処理１２０５を繰り返
す（処理１２０６）。FIG. 12 is a flow chart showing the contents of the feature amount grouping process. In process 1201,
In order to determine the representative feature amount, the number of characters included in the group is calculated for all the characters in the feature amount dictionary (102) when the feature amount of each character is used as the representative feature amount. Next, in process 1202, the characters are extracted in descending order of the number of characters obtained in process 1201. If the extracted character is not registered in the group (all characters are not registered in the group in the initial stage) (process 1203), the characteristic of this character is set as the representative characteristic amount, and this representative characteristic is set. A group of characters whose distance from the amount is within a threshold is created (process 1204). Then, if the created group contains characters that already belong to any other group, the group representative feature amount that already belongs and the distance between this character and the representative feature of the group that is being registered now. The amount and the distance of this character are compared, and this character is registered only in the closer group (Process 1
205). The processes 1202 to 1205 are repeated until all the characters belong to any group (process 1206).

【００３５】[0035]

【発明の効果】本発明によれば、代表特徴量との１回の
マッチングにより、複数の文字のマッチング結果が得ら
れ、また、本処理によって得られる認識の評価値の誤差
は、適当な閾値を用いてグループ化を行うようにすれ
ば、認識率に影響を与えるほど大きくないため、認識精
度を低下させることなく、認識速度の向上を図ることが
可能となる。According to the present invention, the matching result of a plurality of characters can be obtained by performing the matching with the representative feature value once, and the error of the recognition evaluation value obtained by this processing can be set to an appropriate threshold value. If the grouping is performed by using, the recognition rate is not so large as to affect the recognition rate, and thus the recognition speed can be improved without lowering the recognition accuracy.

【図面の簡単な説明】[Brief description of drawings]

【図１】本発明に係る文字認識装置のブロック図FIG. 1 is a block diagram of a character recognition device according to the present invention.

【図２】特徴量辞書の内容を示す図FIG. 2 is a diagram showing the contents of a feature dictionary.

【図３】グループ辞書の内容を示す図FIG. 3 is a diagram showing the contents of a group dictionary

【図４】代表特徴量辞書の内容を示す図FIG. 4 is a diagram showing the contents of a representative feature quantity dictionary.

【図５】候補文字テーブルの内容を示す図FIG. 5 is a diagram showing the contents of a candidate character table.

【図６】評価値テーブルの内容を示す図FIG. 6 is a diagram showing the contents of an evaluation value table.

【図７】文字認識処理のフローチャート図FIG. 7 is a flowchart of character recognition processing.

【図８】グループ抽出処理のフローチャート図FIG. 8 is a flowchart of group extraction processing.

【図９】グループ抽出管理テーブルの内容を示す図FIG. 9 is a diagram showing the contents of a group extraction management table.

【図１０】グルーピング処理の内容を示す図FIG. 10 is a diagram showing the contents of grouping processing.

【図１１】複数のグループに含まれる文字の処理を示す
図FIG. 11 is a diagram showing processing of characters included in a plurality of groups.

【図１２】グルーピング処理のフローチャート図FIG. 12 is a flowchart of grouping processing.

[Explanation of symbols]

１０１：文字認識装置、１０２：特徴量辞書、１０３：
グループ作成部、１０４：グループ辞書、１０５：代表
特徴量辞書作成部、１０６：代表特徴量辞書、１０７：
文字パターン、１０８：特徴量抽出部、１０９：マッチ
ング処理部、１１０：候補文字テーブル、１１１：評価
値テーブル、１１２：候補文字絞り込み部、１１３：認
識結果出力部、１１４：認識結果、２０１：特徴量辞書
文字番号、２０２：特徴量格納領域、３０１：グループ
辞書の文字番号、３０２：グループ番号の格納領域、４
０１：代表特徴量辞書のグループ番号、４０２：代表特
徴量の格納領域、５０１：候補文字テーブルの候補文字
数、５０２：候補文字テーブルの候補文字番号、６０
１：評価値テーブルの文字番号、６０２：評価値テーブ
ルの評価値格納領域、９０１：グループ抽出管理テーブ
ル、９０２：グループ抽出管理テーブルのグループ番
号、９０３：グループ抽出管理テーブルの抽出フラグ、
１００１：辞書の文字の特徴量のマッピング、１００
２；代表特徴量のマッピング、１００３：グループ化の
閾値、１００４：グループ化の範囲、１１０１：複数の
グループに含まれる文字の特徴量のマッピング、１１０
２：代表特徴量、１１０３：代表特徴量までの距離。101: character recognition device, 102: feature amount dictionary, 103:
Group creation unit, 104: Group dictionary, 105: Representative feature amount dictionary creation unit, 106: Representative feature amount dictionary, 107:
Character pattern, 108: feature amount extraction unit, 109: matching processing unit, 110: candidate character table, 111: evaluation value table, 112: candidate character narrowing unit, 113: recognition result output unit, 114: recognition result, 201: feature Quantity dictionary character number, 202: feature amount storage area, 301: group dictionary character number, 302: group number storage area, 4
01: Group number of representative feature amount dictionary, 402: Storage region of representative feature amount, 501: Number of candidate characters of candidate character table, 502: Candidate character number of candidate character table, 60
1: character number of evaluation value table, 602: evaluation value storage area of evaluation value table, 901: group extraction management table, 902: group number of group extraction management table, 903: extraction flag of group extraction management table,
1001: Mapping of character feature amount of dictionary, 100
2; mapping of representative feature amount, 1003: threshold of grouping, 1004: range of grouping, 1101: mapping of feature amount of characters included in a plurality of groups, 110
2: Representative feature amount, 1103: Distance to the representative feature amount.

Claims

[Claims]

1. In a character recognition method for recognizing a document image and converting it into code information, a plurality of features for character recognition are extracted from a standard character pattern of characters, and the extracted feature quantities are stored in a feature quantity dictionary. The step of storing, and for the plurality of features stored in the feature amount dictionary, a plurality of character groups are created by grouping characters having similar features for each feature, and a representative of each character group is created. A step of determining a representative character, a step of extracting a plurality of features for character recognition from a recognition target character pattern extracted from a document image, and a feature amount obtained from the recognition target character pattern for each feature. An evaluation value acquisition step is performed to perform matching processing with the representative feature amount that is the feature amount of the representative character of each character group created for that feature, and obtain the matching evaluation value. And a candidate character narrowing step for narrowing down candidate characters by using the obtained evaluation value as an evaluation value for matching the character pattern to be recognized with all the characters belonging to each character group. Character recognition method that features.

2. A character recognition method for recognizing a document image and converting it into code information, extracting a plurality of features for character recognition from a standard character pattern of a character, and extracting the extracted feature quantity in a feature quantity dictionary. The step of storing, and for the plurality of features stored in the feature amount dictionary, a plurality of character groups are created by grouping characters having similar features for each feature, and a representative of each character group is created. , A step of creating a group dictionary that stores information indicating which character group each character belongs to, and a representative character of each character group related to that characteristic for each characteristic The step of creating a representative feature amount dictionary that stores the representative feature amount that is the feature amount of the A step of extracting a number of features, and for each feature, perform a matching process between the feature amount obtained from the recognition target character pattern and the representative feature amount of each character group related to that feature, and obtain a matching evaluation value. A value acquisition step, a step of storing the obtained evaluation value in the evaluation value table of each character as an evaluation value for matching the character pattern to be recognized with all the characters belonging to each character group, and for each character And a candidate character narrowing step of narrowing down candidate characters using the evaluation values stored in the evaluation value table.

3. When a feature amount relating to a plurality of features for character recognition is called an nth feature amount (n = 1, 2, 3, ...), the nth feature is obtained in the evaluation value acquisition step and the candidate character narrowing step. After performing the matching and the narrowing on the amount and the next matching and the narrowing on the n + 1th characteristic amount, the n + 1th candidate characters including the candidate characters narrowed down on the nth characteristic amount are included.
The character recognition method according to claim 1 or 2, wherein all character groups of feature quantities are obtained, and in the next evaluation value acquisition step relating to the (n + 1) th feature quantity, matching is performed regarding the n + 1th feature quantity with respect to the obtained group.

4. The grouping of characters is performed for all the characters stored in the feature dictionary by assuming that the character is a representative character and using characters included within a predetermined threshold distance from the character. When a temporary group is created, the number of characters belonging to the temporary group is calculated, the temporary group with the larger number of characters is determined as a regular group, and when there are characters included in multiple groups, The character recognition method according to claim 1 or 2, wherein the character belongs to a group having a shorter distance to the characteristic amount of the representative character.

5. A character recognition device for recognizing a document image and converting it into code information, and a feature quantity dictionary storing feature quantities relating to a plurality of features for character recognition extracted from a standard character pattern of characters, For a plurality of features stored in the feature amount dictionary, groups of characters with similar features are grouped for each feature, and a group dictionary that stores information indicating which character group each character belongs to, and each feature , A representative feature amount dictionary that stores representative feature amounts that are the feature amounts of the characters that are representative of the created character group, input means for the character pattern to be recognized, and the recognition target characters input by the input means. A means for extracting a plurality of features for character recognition from a pattern, and a matching process for a feature amount obtained from an input recognition target character pattern and a representative feature amount of each character group. Is performed for each feature, and the evaluation value of matching is stored in the evaluation value table of each character as an evaluation value of matching between the character pattern to be recognized and all characters belonging to each character group. Means, a candidate character narrowing means for narrowing down candidate characters using the evaluation values stored in the evaluation value table for each character, and a candidate character finally obtained by the candidate character narrowing means. A character recognition device comprising means for outputting a recognition result.