JP4861730B2

JP4861730B2 - Character recognition device, character recognition method, character recognition program, and integrated circuit

Info

Publication number: JP4861730B2
Application number: JP2006083726A
Authority: JP
Inventors: 穂高倉; 磨理子竹之内
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 2005-10-12
Filing date: 2006-03-24
Publication date: 2012-01-25
Anticipated expiration: 2026-03-24
Also published as: JP2007133850A

Description

本発明は、文字認識技術に関し、特に、様々な書体の文書や手書き帳票等、字形の変動の甚だしい文字を認識する文字認識装置及びその方法に関する。 The present invention relates to a character recognition technique, and more particularly to a character recognition apparatus and method for recognizing characters with a large variation in character shape, such as documents of various typefaces and handwritten forms.

従来の文字認識装置には、多数の学習用文字画像から抽出した特徴量をクラスタリングし、クラスタごとの平均特徴量を基準特徴量として格納した分類用辞書と、読み込んだ画像より切り出した文字画像から幾つかの特徴を抽出する分類特徴抽出手段と、抽出した特徴量と分類用辞書中の基準特徴量とを照合し、候補クラスタを選出する分類手段と、分類手段で得られた候補クラスタの内の誤認識しやすい特定の組について、分類で使用した特徴量から、両クラスタ間で差異の出やすい特徴量のみを選択的に用いて、いずれのクラスタであるかを識別する詳細識別手段と、詳細識別手段で第１位に選ばれたクラスタに特徴的な特徴量の値域を用いて、認識結果の正当性を検証する検証手段とを有するものがある（特許文献１参照）。
特開平８−３０５８０４号公報 In the conventional character recognition device, feature quantities extracted from a large number of learning character images are clustered, a classification dictionary storing average feature quantities for each cluster as reference feature quantities, and a character image cut out from the read image. Classification feature extraction means for extracting several features, classification means for collating the extracted feature quantity with the reference feature quantity in the classification dictionary, and selecting candidate clusters, and among candidate clusters obtained by the classification means Detailed identification means for identifying which cluster is selectively using only the feature amount that is likely to be different between both clusters from the feature amount used in the classification for a specific group that is likely to be erroneously recognized. There is a verification unit that verifies the validity of the recognition result by using the characteristic value range characteristic for the cluster selected as the first place by the detailed identification unit (see Patent Document 1).
JP-A-8-305804

しかしながら、前記従来の構成では、分類処理、詳細識別処理、検証処理ともに同じ特徴量を用いているため、特に字形の変動の甚だしい手書き文字の場合、切り出した文字画像から抽出した特徴量が、分類用辞書内の基準特徴量のいずれに対しても類似度が低い場合には、分類処理、詳細識別処理、検証処理の各処理ともが良好には機能せず、誤認識や認識結果の棄却が増えるという課題を有している。 However, in the conventional configuration, since the same feature value is used for the classification process, the detailed identification process, and the verification process, the feature value extracted from the cut-out character image is particularly classified in the case of a handwritten character with a significant variation in character shape. If the similarity is low with respect to any of the reference features in the dictionary, the classification process, detailed identification process, and verification process do not function well, and erroneous recognition and rejection of the recognition result It has the problem of increasing.

本発明は、前記従来の課題を解決するもので、文書や帳票の読み取りにおいて、字形の変動の甚だしい手書き文字でも正確に認識するとともに、認識結果が誤っている場合あるいは判読困難な文字を認識した結果を精度よく棄却する文字認識装置を提供することを目的とする。 The present invention solves the above-mentioned conventional problems. In reading a document or a form, it correctly recognizes even a handwritten character whose character shape is fluctuating, and recognizes a character whose recognition result is wrong or difficult to read. An object of the present invention is to provide a character recognition device that rejects a result with high accuracy.

上記課題を解決するため、本発明は、文書画像の入力を受け、文書画像に含まれる各文字を認識する文字認識装置であって、各文字の基準特徴量を登録した分類辞書と、文書画像から文字画像を抽出する文字画像抽出手段と、抽出された文字画像からその特徴量を抽出する特徴量抽出手段と、抽出された特徴量と前記基準特徴量との類似度を計算し、類似度の高い文字を候補文字として選択する候補文字選択手段と、前記特徴量抽出手段で抽出された特徴量を用いて、同一文書内の文字画像同士の類似度を計算する類似度計算手段と、候補文字としての類似度と、前記文字画像同士の類似度とに基づいて、候補文字として認識の確かさを検証する認識検証手段とを備えることとしている。 In order to solve the above-described problems, the present invention is a character recognition device that receives input of a document image and recognizes each character included in the document image, and includes a classification dictionary in which a reference feature amount of each character is registered, and the document image A character image extracting unit that extracts a character image from the image, a feature amount extracting unit that extracts a feature amount from the extracted character image, a similarity between the extracted feature amount and the reference feature amount is calculated, A candidate character selecting unit that selects a character having a high value as a candidate character, a similarity calculating unit that calculates a similarity between character images in the same document using the feature amount extracted by the feature amount extracting unit, and a candidate A recognition verification means for verifying the certainty of recognition as a candidate character based on the similarity as a character and the similarity between the character images is provided.

上述の構成によって、字形の変動の甚だしい手書き文字の場合でも、同一文書内の文字画像同士の類似度を反映させて認識結果が正しいか否かについて検証するので、文書画像を正確に認識することができる。
また、前記認識検証手段は、前記類似度計算手段で計算された検証対象文字との類似度ｒと、前記候補文字選択手段で計算された類似度Ｒとの積を計算する計算部と、前記候補文字選択手段で選択された同一候補文字についての積よりも大きな積を有する他の候補文字があるとき、前記検証対象文字の認識結果を前記他の候補文字に変更する認識結果変更部とを有することとしている。 With the above-described configuration, even in the case of handwritten characters with a significant variation in character shape, the recognition result is verified by reflecting the similarity between character images in the same document, so that the document image can be recognized accurately. Can do.
The recognition verification unit includes a calculation unit that calculates a product of the similarity r with the verification target character calculated by the similarity calculation unit and the similarity R calculated by the candidate character selection unit; A recognition result changing unit that changes the recognition result of the verification target character to the other candidate character when there is another candidate character having a product larger than the product of the same candidate character selected by the candidate character selecting means. To have.

このような構成によって、同一文書内の文字画像同士は、似通った特徴を有することを利用して、選択された候補文字を同一文書内の他の候補文字に変更することが可能となる。
また、文字ごとに候補文字として備えるべき形態特徴量を示した肯定条件と、候補文字として備えるべきでない形態特徴量を示した否定条件とをルールとして記載した個別検証辞書と、前記文字画像から形態特徴量を抽出し、選択された候補文字が肯定条件又は否定条件の何れに適合するかを判定し、否定条件に適合すると判定されたとき、その候補文字の選択を棄却する個別文字検証手段とを更に備えることとしている。 With such a configuration, it is possible to change the selected candidate character to another candidate character in the same document by utilizing the fact that character images in the same document have similar characteristics.
In addition, an individual verification dictionary that describes, as rules, an affirmative condition indicating a morphological feature quantity to be provided as a candidate character for each character and a negative condition indicating a morphological feature quantity that should not be provided as a candidate character, and a form from the character image An individual character verification means for extracting a feature amount, determining whether the selected candidate character satisfies an affirmative condition or a negative condition, and rejecting the selection of the candidate character when it is determined that the candidate character is satisfied Is further provided.

このような構成によって、選択された候補文字を個別にルールと照合して、あり得ない候補文字を棄却するので、誤った文字認識を防止することができる。
また、類似度の高い２文字の形態特徴量と、２文字を識別するためのルールとを記載した類似候補文字識別辞書と、一の文字画像から選択された候補文字同士が類似候補文字識別辞書に登録されているとき、文字画像から形態特徴量を抽出し、前記ルールと照合し、２文字のうちいずれの文字を候補文字とするかを判定する候補文字判定手段とを更に備えることとしている。 With such a configuration, the selected candidate characters are individually checked against the rules, and the candidate characters that cannot exist are rejected, so that erroneous character recognition can be prevented.
Also, a similar candidate character identification dictionary that describes two character feature quantities with high similarity and rules for identifying two characters, and a candidate character selected from one character image is a similar candidate character identification dictionary And a candidate character determination means for extracting a morphological feature amount from a character image, collating with the rule, and determining which of the two characters is a candidate character. .

このような構成によって、類似度の高い２文字のいずれに判定するのが候補文字としてより相応しいかが検証されるので文字認識の精度が向上する。
また、前記形態特徴量には、ループ位置又は数、所定範囲の平均線密度、凸端点位置又は数、凹端点位置又は数の少なくとも１つを含むこととしている。
このような構成によって、固有の特徴的な形態特徴量を用いて選択された候補文字を検証することによって、文字認識の精度を向上するとともに、少なくとも誤認識した候補文字を棄却することができる。 With such a configuration, it is verified whether it is more suitable as a candidate character to determine which of the two characters has high similarity, so that the accuracy of character recognition is improved.
The form feature amount includes at least one of a loop position or number, an average linear density in a predetermined range, a convex end point position or number, and a concave end point position or number.
With such a configuration, it is possible to improve the accuracy of character recognition and reject at least a misrecognized candidate character by verifying a candidate character selected using a unique characteristic morphological feature.

また、各文字の基準特徴量を登録した分類辞書を有し、文書画像の入力を受け、文書画像に含まれる各文字を認識する文字認識装置の文字認識方法であって、文書画像から文字画像を抽出する文字画像抽出ステップと、抽出された文字画像からその特徴量を抽出する特徴量抽出ステップと、抽出された特徴量と前記基準特徴量との類似度を計算し、類似度の高い文字を候補文字として選択する候補文字選択ステップと、前記特徴量抽出ステップで抽出された特徴量を用いて、同一文書内の文字画像同士の類似度を計算する類似度計算ステップと、候補文字としての類似度と、前記文字画像同士の類似度とに基づいて、候補文字として認識の確かさを検証する認識検証ステップとを有することとしている。 A character recognition method of a character recognition device having a classification dictionary in which reference feature amounts of characters are registered, receiving input of a document image, and recognizing each character included in the document image. A character image extraction step for extracting the feature amount, a feature amount extraction step for extracting the feature amount from the extracted character image, a similarity between the extracted feature amount and the reference feature amount is calculated, and a character with high similarity A candidate character selection step for selecting a character as a candidate character, a similarity calculation step for calculating a similarity between character images in the same document using the feature amount extracted in the feature amount extraction step, and a candidate character as Based on the similarity and the similarity between the character images, a recognition verification step for verifying the certainty of recognition as a candidate character is included.

このような方法によって、字形の変動の甚だしい手書き文字の場合でも、同一文書内の文字画像同士の類似度を反映させて認識結果が正しいか否かについて検証するので文書画像を正確に認識することができる。
また、各文字の基準特徴量を登録した分類辞書を有し、文書画像の入力を受け、文書画像に含まれる各文字を認識する文字認識装置に、文書画像から文字画像を抽出する文字画像抽出ステップと、抽出された文字画像からその特徴量を抽出する特徴量抽出ステップと、抽出された特徴量と前記基準特徴量との類似度を計算し、類似度の高い文字を候補文字として選択する候補文字選択ステップと、前記特徴量抽出ステップで抽出された特徴量を用いて、同一文書内の文字画像同士の類似度を計算する類似度計算ステップと、候補文字としての類似度と、前記文字画像同士の類似度とに基づいて、候補文字として認識の確かさを検証する認識検証ステップとを実行させるための文字認識プログラムとしている。 By using such a method, even in the case of handwritten characters with significant variations in character shape, the similarity between character images in the same document is reflected to verify whether the recognition result is correct, so that the document image can be recognized accurately. Can do.
In addition, a character image extraction unit that has a classification dictionary in which reference feature amounts of each character are registered, extracts a character image from a document image, and receives a document image and recognizes each character included in the document image. A feature amount extraction step for extracting the feature amount from the extracted character image, a similarity between the extracted feature amount and the reference feature amount is calculated, and a character having a high similarity is selected as a candidate character A candidate character selection step, a similarity calculation step of calculating a similarity between character images in the same document using the feature amount extracted in the feature amount extraction step, a similarity as a candidate character, and the character A character recognition program for executing a recognition verification step of verifying the certainty of recognition as a candidate character based on the similarity between images.

このような文字認識プログラムを文字認識装置に適用することによって、字形の変動の甚だしい手書き文字の場合でも、同一文書内の文字画像同士の類似度を反映させて認識結果が正しいか否かについて検証するので文書画像を正確に認識することができる。
また、文書画像の入力を受け、文書画像に含まれる各文字を認識する文字認識装置の集積回路であって、各文字の基準特徴量を登録した分類辞書と、文書画像から文字画像を抽出する文字画像抽出手段と、抽出された文字画像からその特徴量を抽出する特徴量抽出手段と、抽出された特徴量と前記基準特徴量との類似度を計算し、類似度の高い文字を候補文字として選択する候補文字選択手段と、前記特徴量抽出手段で抽出された特徴量を用いて、同一文書内の文字画像同士の類似度を計算する類似度計算手段と、候補文字としての類似度と、前記文字画像同士の類似度とに基づいて、候補文字として認識の確かさを検証する認識検証手段とを備えることとしている。 By applying such a character recognition program to a character recognition device, it is verified whether the recognition result is correct by reflecting the similarity between character images in the same document, even in the case of handwritten characters with significant variations in character shape. Therefore, the document image can be recognized accurately.
An integrated circuit of a character recognition device that receives input of a document image and recognizes each character included in the document image, and extracts a character image from the classification dictionary in which the reference feature amount of each character is registered. A character image extracting unit, a feature amount extracting unit for extracting the feature amount from the extracted character image, a similarity between the extracted feature amount and the reference feature amount is calculated, and a character having a high similarity is selected as a candidate character Candidate character selection means for selecting as, a similarity calculation means for calculating the similarity between character images in the same document using the feature amount extracted by the feature amount extraction means, and the similarity as a candidate character And a recognition verification means for verifying the certainty of recognition as a candidate character based on the similarity between the character images.

このような集積回路を文字認識装置に用いることによって、字形の変動の甚だしい手書き文字の場合でも、同一文書内の文字画像同士の類似度を反映させて認識結果が正しいか否かについて検証するので文書画像を正確に認識することができる。 By using such an integrated circuit for a character recognition device, even in the case of handwritten characters whose character shape fluctuates significantly, it is verified whether the recognition result is correct by reflecting the similarity between character images in the same document. Document images can be accurately recognized.

以下、本発明に係る文字認識装置の実施の形態について、図面を用いて説明する。
（一実施の形態）
図１は、本発明に係る文字認識装置の一実施の形態の構成図である。
この文字認識装置は、分類辞書１０１と、類似候補文字識別辞書１０２と、個別検証辞書１０３と、文書画像入力受付部１０４と、文字画像抽出部１０５と、特徴量抽出部１０６と、候補文字選択部１０７と、形態特徴量抽出部１０８と、候補文字判定部１０９と、個別文字検証部１１０と、類似度計算部１１１と、認識検証部１１２と、認識結果出力部１１３とを備えている。 Hereinafter, embodiments of a character recognition device according to the present invention will be described with reference to the drawings.
(One embodiment)
FIG. 1 is a block diagram of an embodiment of a character recognition device according to the present invention.
This character recognition device includes a classification dictionary 101, a similar candidate character identification dictionary 102, an individual verification dictionary 103, a document image input reception unit 104, a character image extraction unit 105, a feature amount extraction unit 106, and a candidate character selection. Unit 107, morphological feature amount extraction unit 108, candidate character determination unit 109, individual character verification unit 110, similarity calculation unit 111, recognition verification unit 112, and recognition result output unit 113.

分類辞書１０１は、磁気ディスク等からなり、図２に示すようなデータ構造を有する。図３にその一例を示すように、クラスタ番号３０１と識別子３０２と文字コード３０３と属性フラグ３０４と平均特徴３０５とからなる辞書データ３０６を多数登録している。図３は、その識別子３０２に示すように文字「中」を表すものである。なお、分類辞書１０１は、通常、１文字に対し字形の異なる複数の基準特徴量が含まれているマルチテンプレート辞書である。マルチテンプレート辞書の場合、候補文字にクラスタの異なる同一文字が複数個含まれる場合がある。 The classification dictionary 101 is composed of a magnetic disk or the like and has a data structure as shown in FIG. As shown in FIG. 3 as an example, a large number of dictionary data 306 including a cluster number 301, an identifier 302, a character code 303, an attribute flag 304, and an average feature 305 are registered. FIG. 3 shows the character “middle” as indicated by the identifier 302. The classification dictionary 101 is normally a multi-template dictionary that includes a plurality of reference feature amounts having different character shapes for one character. In the case of a multi-template dictionary, candidate characters may include a plurality of identical characters with different clusters.

類似候補文字識別辞書１０２は、磁気ディスク等からなる。図４は、この類似候補文字識別辞書の一例を示すものである。類似候補文字識別辞書１０２には、類似文字組み合わせ４０２と特徴量４０３とルール４０４とが記載されている。類似文字組み合わせ４０２には、類似度の高い２文字が記載されている。
特徴量４０３には、この２文字を識別するための形態特徴量が記載されている。この形態特徴量には、ループ位置、ループ数、所定範囲の平均線密度、凸端点位置、凸端点数、凹端点位置、凹端点数等がある。 The similar candidate character identification dictionary 102 includes a magnetic disk or the like. FIG. 4 shows an example of the similar candidate character identification dictionary. In the similar candidate character identification dictionary 102, a similar character combination 402, a feature amount 403, and a rule 404 are described. In the similar character combination 402, two characters having high similarity are described.
The feature quantity 403 describes the form feature quantity for identifying these two characters. The form feature amount includes a loop position, the number of loops, an average linear density in a predetermined range, a convex end point position, a convex end point number, a concave end point position, a concave end point number, and the like.

ルール４０４には、この形態特徴量を用いて、この２文字のいずれの文字に判定するかの判定基準が記載されている。
図５は、この類似候補文字識別辞書１０２に記載された類似文字組み合わせの「０‐６」、「４（閉型）‐９」、「８(斜型)‐９」の形態特徴量を説明するための図である。
図５（ａ）は、手書き文字「０」５０１と「６」５０２との形態特徴量と判定基準との関係を示している。手書き文字「０」５０１と「６」５０２との上部に引いた横線５１１、５１２が横切る線の数すなわち線密度は、それぞれ「２」と「１」であり、これを形態特徴量として、判定基準として「１．５」以上の場合に「０」に加点し、未満の場合に「６」に加点する。手書き文字「０」５０１と「６」５０２とは共にループが形成されている。このループの中央位置（図中×印で示す）について、文字の外接矩形の底辺からの高さ５１３、５１４を形態特徴量として、下寄りの場合には「６」に加点し、ほぼ中央の場合には「０」に加点する。なお、「０‐６」で形態特徴量として記載はされていないが、手書き文字「６」５０２には凹点５１５が形成されている。 The rule 404 describes a criterion for determining which of the two characters is to be determined using this form feature amount.
FIG. 5 illustrates the form feature quantities “0-6”, “4 (closed type) -9”, and “8 (slanted type) -9” of the similar character combinations described in the similar candidate character identification dictionary 102. It is a figure for doing.
FIG. 5A shows the relationship between the morphological feature amounts of the handwritten characters “0” 501 and “6” 502 and the determination criteria. The number of lines crossed by the horizontal lines 511 and 512 drawn above the handwritten characters “0” 501 and “6” 502, that is, the line density, is “2” and “1”, respectively. As a reference, when “1.5” or more, a score is added to “0”, and when it is less than “6”, a score is added. Both the handwritten characters “0” 501 and “6” 502 form a loop. For the center position of this loop (indicated by x in the figure), the height 513, 514 from the bottom of the circumscribed rectangle of the character is used as the feature value, and in the case of the lower side, “6” is added, In this case, “0” is added. Although not described as “0-6” as a feature value, a recessed point 515 is formed in the handwritten character “6” 502.

図５（ｂ）は、手書き文字「４」５０３と「９」５０４との形態特徴量と判定基準との関係を示している。手書き文字「４」５０３の右側中央部には右向きの凸端点５２１が形成されている。形態特徴量として凸端点５２１が存在すれば「４」に加点する判定基準が設けられている。
図５（ｃ）は、手書き文字「８」５０５と「９」５０６との形態特徴量と判定基準との関係を示している。形態特徴量として下部平均線幅５３１、５３２が文字全体の線幅以下の場合「９」に加点し、それ以外の場合「８」に加点する判定基準が設けられている。また、形態特徴量として中央部左開き凹点５３３が存在すれば、「９」に加点する判定基準が設けられている。 FIG. 5B shows the relationship between the morphological feature amounts of the handwritten characters “4” 503 and “9” 504 and the determination criteria. A right-pointing convex end point 521 is formed at the center on the right side of the handwritten character “4” 503. If the convex end point 521 is present as the form feature amount, a criterion for adding “4” is provided.
FIG. 5C shows the relationship between the morphological feature amounts of the handwritten characters “8” 505 and “9” 506 and the determination criteria. A criterion for adding points to “9” is added when the lower average line widths 531 and 532 are equal to or less than the line width of the entire character, and “8” is added otherwise. In addition, if there is a center-left opening concave point 533 as a form feature amount, a determination criterion for adding “9” is provided.

以上の図５（ａ）〜（ｃ）を用いて説明したような特徴量４０３とルール４０４とが多数の類似文字組み合わせ４０２について類似候補文字識別辞書１０２に記載されている。
個別検証辞書１０３は、磁気ディスク等からなる。図６は、この個別検証辞書１０３の一例を示す。個別検証辞書１０３は、文字６０２毎に肯定条件６０３、６０５、…、否定条件６０４、６０６、…とそれらのルール６０７とが記載されている。 The feature amount 403 and the rules 404 described with reference to FIGS. 5A to 5C are described in the similar candidate character identification dictionary 102 for a large number of similar character combinations 402.
The individual verification dictionary 103 is composed of a magnetic disk or the like. FIG. 6 shows an example of the individual verification dictionary 103. The individual verification dictionary 103 describes positive conditions 603, 605,..., Negative conditions 604, 606,.

図７は、個別検証辞書１０３の内容を説明するための図である。図７（ａ）には、手書き文字「０」７０１、７０２が示されている。手書き文字「０」７０１のループ中央位置は、文字外接矩形の中央部にある。また、ループ数は「１」であり、凸端点数は「０」であり、凹端点数は「０」である。また、中央領域７１１の平均線密度は「２」である。
手書き文字「０」７０２の中央領域７１２の平均線密度は「３」であり、凸端点数は「２」であり、凸端点位置が右上部７１３と左下部７１４とにある。 FIG. 7 is a diagram for explaining the contents of the individual verification dictionary 103. FIG. 7A shows handwritten characters “0” 701 and 702. The loop center position of the handwritten character “0” 701 is at the center of the character circumscribed rectangle. The number of loops is “1”, the number of convex end points is “0”, and the number of concave end points is “0”. The average linear density of the central region 711 is “2”.
The average linear density of the center region 712 of the handwritten character “0” 702 is “3”, the number of convex end points is “2”, and the positions of the convex end points are in the upper right portion 713 and the lower left portion 714.

個別検証辞書１０３には、このような形態特徴量を肯定条件６０３の判定条件であるルール６０７に（１）、（２）として記載されている。また、個別検証辞書１０３の否定条件６０４には、手書き文字「０」７０１、７０２にはあり得ない形態特徴量が記載されている。例えば、否定条件６０４には、ループ数が「３」以上、中央領域の平均線密度が「４」以上等である。 In the individual verification dictionary 103, such morphological features are described as (1) and (2) in the rule 607 which is the determination condition of the affirmative condition 603. The negative condition 604 of the individual verification dictionary 103 describes morphological feature quantities that are impossible for the handwritten characters “0” 701 and 702. For example, in the negative condition 604, the number of loops is “3” or more, the average linear density of the central region is “4” or more, and the like.

図７（ｂ）には、手書き文字「４」７０３、７０４が示されている。手書き文字「４」７０３には、ループ７２０が１つある。また、凸端点位置は右上部７２２、左中央部７２３、右中央部７２４、下部７２５に４凸端点ある。また、上部領域７２１の平均線密度は「２」である。下部領域７３１の平均線密度は「１」である。
個別検証辞書１０３の文字「４」の肯定条件６０５のルール６０７（１）には、ループ数が１であり、凸端点は、４凸端点中の３又は４凸端点が確認されれば判定条件を満たすこととされ、上部領域の平均線密度は１．５より大で、下部領域の平均線密度は１．５より小であると記載されている。 In FIG. 7B, handwritten characters “4” 703 and 704 are shown. The handwritten character “4” 703 has one loop 720. The convex end point positions are four convex end points at the upper right part 722, the left central part 723, the right central part 724, and the lower part 725. The average linear density of the upper region 721 is “2”. The average linear density of the lower region 731 is “1”.
According to the rule 607 (1) of the affirmative condition 605 for the character “4” in the individual verification dictionary 103, the number of loops is 1, and the convex end point is determined if 3 or 4 convex end points in the 4 convex end points are confirmed. The average linear density of the upper region is greater than 1.5 and the average linear density of the lower region is described as being less than 1.5.

手書き文字「４」７０４には、ループは存在しない。凸端点位置は左上部７２６、右上部７２９、左中央部７２７、右中央部７２８、下部７３０に5凸端点ある。また、上部領域７２１の平均線密度は「２」である。下部領域７３１の平均線密度は「１」である。
個別検証辞書１０３の文字「４」の肯定条件６０５のルール６０７（２）には、ループ数が０であり、凸端点は、５凸端点中の４又は５凸端点が確認されれば判定条件を満たすこととされ、上部領域の平均線密度は１．５より大で、下部領域の平均線密度は１．５より小であると記載されている。 The handwritten character “4” 704 has no loop. Convex end point positions are five convex end points at the upper left portion 726, the upper right portion 729, the left central portion 727, the right central portion 728, and the lower portion 730. The average linear density of the upper region 721 is “2”. The average linear density of the lower region 731 is “1”.
According to the rule 607 (2) of the affirmative condition 605 for the character “4” in the individual verification dictionary 103, if the number of loops is 0 and the convex end point is confirmed as 4 or 5 convex end points among the five convex end points, the determination condition The average linear density of the upper region is greater than 1.5 and the average linear density of the lower region is described as being less than 1.5.

個別検証辞書１０３の否定条件６０６のルール６０７には、手書き文字「４」７０３、７０４にはあり得ない形態特徴量が記載されている。凸端点数が６以上、上部領域の平均線密度は３以上で、下部領域の平均線密度は２以上であると記載されている。
文書画像入力受付部１０４は、スキャナ等の光学読み取り装置からなり、ユーザから文書画像の入力を受付け、２値画像データに変換し、文書画像から文字行を切り出し、文字画像抽出部１０５に出力する。 In the rule 607 of the negative condition 606 of the individual verification dictionary 103, morphological feature quantities that are impossible for the handwritten characters “4” 703 and 704 are described. It is described that the number of convex end points is 6 or more, the average linear density of the upper region is 3 or more, and the average linear density of the lower region is 2 or more.
The document image input receiving unit 104 includes an optical reading device such as a scanner. The document image input receiving unit 104 receives a document image input from a user, converts the document image into binary image data, cuts out a character line from the document image, and outputs the character line to the character image extraction unit 105. .

文字画像抽出部１０５は、文書画像入力受付部１０４から入力された文字行から文字画像を抽出し、特徴量抽出部１０６、形態特徴量抽出部１０８および認証検証部１１２に出力する。
図８は、文字画像抽出部１０５によって、文書画像入力受付部１０４で受け付けられた文書画像８０１における１行目の文字列８０２の文字画像８０３〜８０６が抽出された状態を示している。 The character image extraction unit 105 extracts a character image from the character line input from the document image input reception unit 104, and outputs the character image to the feature amount extraction unit 106, the morphological feature amount extraction unit 108, and the authentication verification unit 112.
FIG. 8 shows a state in which the character images 803 to 806 of the character string 802 on the first line in the document image 801 received by the document image input receiving unit 104 are extracted by the character image extracting unit 105.

特徴量抽出部１０６は、各文字画像８０３〜８０６からそれぞれ特徴量を抽出し、候補文字選択部１０７および類似度計算部１１１に通知する。
候補文字選択部１０７は、通知された特徴量と分類辞書１０１の基準特徴量との類似度を計算し、類似度の高い順にN個の候補文字を選択する。ここで、類似度の計算には、例えば市街化距離やユークリッド距離が使われる。なお、この候補文字の選択については、上記した特許文献１に詳細に記載されている。 The feature amount extraction unit 106 extracts feature amounts from the character images 803 to 806, and notifies the candidate character selection unit 107 and the similarity calculation unit 111 of the feature amounts.
The candidate character selection unit 107 calculates the similarity between the notified feature amount and the reference feature amount of the classification dictionary 101, and selects N candidate characters in descending order of similarity. Here, for example, the urbanization distance or the Euclidean distance is used for the calculation of the similarity. This selection of candidate characters is described in detail in the above-mentioned Patent Document 1.

図９は、候補文字選択部１０７で選択された候補文字の一例を示す図である。ここでは、Nを１０としている。ここで、クラスタ番号９０１は、上述したように、マルチテンプレート辞書を用いているので、同一文字に対して複数のクラスタが存在する。また、類似度９０２は、小数点以下第２位までの整数で示している。文字行８０２の文字画像８０３〜８０６は、それぞれ第１候補文字「中」、「内」、「里」、「穂」としての選択されている。 FIG. 9 is a diagram illustrating an example of candidate characters selected by the candidate character selection unit 107. Here, N is set to 10. Here, as described above, since the cluster number 901 uses a multi-template dictionary, a plurality of clusters exist for the same character. The similarity 902 is indicated by an integer up to the second decimal place. Character images 803 to 806 in the character line 802 are selected as the first candidate characters “middle”, “inside”, “sato”, and “ho”, respectively.

候補文字選択部１０７は、選択した候補文字を候補文字判定部１０９および認識検証部１１２に通知する。
形態特徴量抽出部１０８は、文字画像抽出部１０５から入力された文字画像を記憶領域に記憶しており、候補文字判定部１０９の指示に従い、文字画像から形態特徴量を抽出して、候補文字判定部１０９に通知する。同様に、個別文字検証部１１０から文字の肯定条件と否定条件とのルールの通知を受けると、対応する文字画像からルールに対応する形態特徴量を抽出して個別文字検証部１１０に通知する。 The candidate character selection unit 107 notifies the selected candidate character to the candidate character determination unit 109 and the recognition verification unit 112.
The morphological feature amount extraction unit 108 stores the character image input from the character image extraction unit 105 in the storage area, and extracts the morphological feature amount from the character image in accordance with the instruction of the candidate character determination unit 109, and the candidate character The determination unit 109 is notified. Similarly, when the notification of the rule of the character affirmative condition and the negative condition is received from the individual character verification unit 110, the form feature amount corresponding to the rule is extracted from the corresponding character image and notified to the individual character verification unit 110.

候補文字判定部１０９は、候補文字選択部１０７から通知された第１位から第N位の候補文字を組み合わせる。各候補文字の組み合わせが同一文字か否か（クラスタが異なるが同一文字であるか否か）を判定する。同一文字でなければ、類似候補文字識別辞書１０２の類似文字組み合わせ４０２に記載されているか否かを判定する。類似文字組み合わせ４０２に記載されているときには、その辞書データを読み出す。併せて、類似候補文字識別辞書１０２の特徴量４０３を形態特徴量抽出部１０８に通知し、候補文字の文字画像から形態特徴量を抽出するよう指示する。形態特徴量抽出部１０８から通知された形態特徴量と類似候補文字識別辞書１０２のルール４０４とを比較し、類似文字の何れが優位かを判定する。 Candidate character determination unit 109 combines the first to Nth candidate characters notified from candidate character selection unit 107. It is determined whether the combination of each candidate character is the same character (whether the cluster is different but the same character). If it is not the same character, it is determined whether it is described in the similar character combination 402 of the similar candidate character identification dictionary 102. When it is described in the similar character combination 402, the dictionary data is read out. At the same time, the feature quantity 403 of the similar candidate character identification dictionary 102 is notified to the morphological feature quantity extraction unit 108, and an instruction is given to extract the morphological feature quantity from the character image of the candidate character. The morphological feature amount notified from the morphological feature amount extraction unit 108 is compared with the rule 404 of the similar candidate character identification dictionary 102 to determine which of the similar characters is superior.

例えば、図５（ａ）手書き文字「０」５０１の第１位候補文字が「６」であり、第２位候補文字が「０」である場合、ルール４０４に従えば、上部平均線密度が「２」であり、最大ループの中央位置がほぼ中央であるので、判定基準に従えば「０」に加点があり、優位となる。これによって、候補文字判定部１０９部は第１候補文字を「０」に、第２候補文字を「６」に変更する。候補文字判定部１０９は、この判定結果を個別文字検証部１１０に通知する。 For example, when the first candidate character of the handwritten character “0” 501 is “6” and the second candidate character is “0” in FIG. Since it is “2” and the center position of the maximum loop is almost the center, “0” is added according to the determination criterion, which is advantageous. Accordingly, the candidate character determination unit 109 copies the first candidate character to “0” and the second candidate character to “6”. Candidate character determination unit 109 notifies individual character verification unit 110 of the determination result.

個別文字検証部１１０は、候補文字判定部１０９から判定された候補文字の通知を受けると、個別検証辞書１０３の対応する文字６０２ごとに記載された肯定条件と否定条件とのルール６０７を読み出し、形態特徴量抽出部１０８に通知する。個別文字検証部１１０は、形態特徴量抽出部１０８から形態特徴量の通知を受けると、肯定条件と否定条件とのルールと形態特徴量とを比較し、肯定条件、否定条件がそれぞれ成立するか否かを判定する。個別文字検証部１１０は、第１位候補文字で否定条件が成立すると、第１位候補文字を「棄却」し、第１位候補文字で否定条件が不成立で、かつ肯定条件が成立すると第１位候補文字を「有効」として処理を終了する。第１位候補文字について、否定条件も肯定条件も成立しないときは、第２位候補文字について、個別検証辞書１０３の対応する文字の肯定条件と否定条件とのルール６０７を読み出し、形態特徴量抽出部１０８に通知する。形態特徴量抽出部１０８から形態特徴量の通知を受けると、肯定条件と否定条件とのルールと形態特徴量とを比較し、肯定条件、否定条件がそれぞれ成立するか否かを判定する。否定条件が成立せずに、肯定条件が成立したときには、第１位候補文字を「棄却」する。否定条件が成立し、または否定条件も肯定条件も成立しないときは、第３位以下の候補文字について同様の処理を行う。全ての候補文字について、否定条件も肯定条件も成立しないときには、第１位候補文字を有効と検証する。 When the individual character verification unit 110 receives the notification of the candidate character determined from the candidate character determination unit 109, the individual character verification unit 110 reads the rule 607 of the positive condition and the negative condition described for each corresponding character 602 in the individual verification dictionary 103, The morphological feature quantity extraction unit 108 is notified. When the individual character verification unit 110 receives the notification of the morphological feature amount from the morphological feature amount extraction unit 108, the individual character verification unit 110 compares the rule of the positive condition and the negative condition with the morphological feature amount, and whether the positive condition and the negative condition are satisfied. Determine whether or not. The individual character verification unit 110 “dismisses” the first candidate character when the negative condition is satisfied with the first candidate character, and the first condition when the negative condition is not satisfied and the positive condition is satisfied with the first candidate character. The position candidate character is set to “valid” and the process is terminated. When neither a negative condition nor an affirmative condition is satisfied for the first candidate character, the rule 607 of the affirmative condition and negative condition of the corresponding character in the individual verification dictionary 103 is read for the second candidate character, and morphological feature amount extraction is performed. Notification to the unit 108. When receiving the notification of the morphological feature amount from the morphological feature amount extraction unit 108, the rule of the affirmative condition and the negative condition is compared with the morphological feature amount, and it is determined whether the positive condition and the negative condition are satisfied. When the negative condition is not satisfied and the positive condition is satisfied, the first candidate character is “rejected”. When a negative condition is satisfied, or when neither a negative condition nor an affirmative condition is satisfied, the same processing is performed for the third and lower candidate characters. When neither a negative condition nor an affirmative condition holds for all candidate characters, the first candidate character is verified as valid.

なお、誤認識を極力防止する観点からは、全ての候補文字について、否定条件も肯定条件も成立しないときには、第１位候補文字を「棄却」としてもよい。
具体例を用いて説明すると、図７（ｂ）の文字画像７０４の第１位候補文字が「４」である場合、個別文字検証部１１０は、個別検証辞書１０３の文字「４」の肯定条件と否定条件とのルールを読み出し、形態特徴量抽出部１０８に通知する。形態特徴量抽出部１０８では、記憶領域に記憶している文字画像７０４からループ数「０」と、凸端点数「５」と、上部領域の平均線密度「２」と、下部領域の平均線密度「１」とを抽出し、個別文字検証部１１０に通知する。個別文字検証部１１０は、ルール６０７の（２）のループ数０、凸端点数５、上部領域の平均線密度１．５以上、下部領域の平均線密度１．５以下に合致するので、肯定条件が成立すると判定し、凸端点数６以上、上部領域の平均線密度３以上、下部領域の平均線密度２以上との否定条件が成立しないと判定して、第１位候補文字「４」を「有効」と検証する。 From the viewpoint of preventing misrecognition as much as possible, the first candidate character may be “rejected” when a negative condition and an affirmative condition are not satisfied for all candidate characters.
To explain using a specific example, when the first candidate character of the character image 704 in FIG. 7B is “4”, the individual character verification unit 110 determines the positive condition for the character “4” in the individual verification dictionary 103. And the negative condition rule are read out and notified to the morphological feature quantity extraction unit 108. In the morphological feature quantity extraction unit 108, the number of loops “0”, the number of convex end points “5”, the average line density “2” of the upper area, and the average line of the lower area are stored from the character image 704 stored in the storage area. The density “1” is extracted and notified to the individual character verification unit 110. The individual character verification unit 110 matches the rule 607 (2) loop number 0, convex end point number 5, upper region average line density 1.5 or more, lower region average line density 1.5 or less. It is determined that the condition is satisfied, and it is determined that the negative conditions of the number of convex end points of 6 or more, the average linear density of the upper region of 3 or higher, and the average linear density of 2 or lower of the lower region are not satisfied, and the first candidate character “4” Is verified as “valid”.

なお、ルール６０７の（２）において、凸端点数を４又は５としたのは、形態特徴量抽出部１０８において全ての凸端点を抽出できない場合を考慮したものである。同様に、文字「４」の肯定条件の（１）においても凸端点を全て抽出できない場合を考慮して凸端点数を３又は４としている。
個別文字検証部１１０は、検証結果を認識検証部１１２に通知する。 In rule 607 (2), the number of convex end points is set to 4 or 5 in consideration of the case where all the convex end points cannot be extracted by the morphological feature quantity extraction unit 108. Similarly, in the positive condition (1) for the character “4”, the number of convex end points is set to 3 or 4 in consideration of the case where all the convex end points cannot be extracted.
The individual character verification unit 110 notifies the recognition verification unit 112 of the verification result.

類似度計算部１１１は、特徴量抽出部１０６から通知された文字画像の特徴量を記憶領域に記憶している。認識検証部１１２から検証対象とする候補文字の指定を受けると、対応する特徴量と同一行の候補文字の特徴量との類似度を計算する。計算した類似度を認識検証部１１２に通知する。なお、類似度の計算は、市街化距離等により計算され、０〜１の範囲に正規化される。この際、数値の大きな方が類似度が高いとされる。 The similarity calculation unit 111 stores the feature amount of the character image notified from the feature amount extraction unit 106 in the storage area. When a candidate character to be verified is designated from the recognition verification unit 112, the similarity between the corresponding feature amount and the feature amount of the candidate character on the same line is calculated. The calculated similarity is notified to the recognition verification unit 112. The similarity is calculated based on the urbanization distance or the like and normalized to a range of 0 to 1. At this time, the larger the numerical value, the higher the similarity.

認識検証部１１２は、文書画像の全体または同一行或いは帳票の同一フィールド等の一連の文字画像について、個別文字検証部１１０で検証された第１位候補文字を認識結果として採用すべきか否かを判定するため、各文字の位置、形状ならびに各文字相互の類似性に基づく文字間検証を行う。
認識検証部１１２は、文字画像抽出部１０５から入力された各文字画像を記憶領域に記憶しており、各文字画像の文字幅および文字高さを求め、平均文字幅および平均文字高さを計算し、それらの平均値から予め定めた閾値を超えた第１位候補文字およびそれらの平均値から予め定めた閾値以下の第１位候補文字を棄却する。 The recognition verification unit 112 determines whether or not the first candidate character verified by the individual character verification unit 110 should be adopted as a recognition result for the entire document image or a series of character images such as the same line or the same field of a form. In order to make a determination, inter-character verification is performed based on the position and shape of each character and the similarity between the characters.
The recognition verification unit 112 stores each character image input from the character image extraction unit 105 in a storage area, calculates the character width and character height of each character image, and calculates the average character width and average character height. Then, the first candidate character exceeding the predetermined threshold value from the average value and the first candidate character below the predetermined threshold value from the average value are rejected.

図１０は、１行に記載された文字列画像１００１を示している。ここで各文字画像C１０１１〜C１０１８について、平均文字幅１００３より閾値を超えた文字幅を有する文字画像または平均文字高さ１００４より閾値を超えた文字高さを有する文字画像があるか否かをみる。文字画像C１０１１は、平均文字幅１００３より閾値を超えた文字幅を有し、平均文字高さ１００４より閾値を超えた文字高さを有するので棄却する。また、文字画像C１０１７は、平均文字幅１００３より閾値以下の文字幅を有し、平均文字高さ１００４より閾値以下の文字高さを有するので棄却する。 FIG. 10 shows a character string image 1001 written in one line. Here, for each of the character images C1011 to C1018, it is determined whether or not there is a character image having a character width exceeding the threshold value from the average character width 1003 or a character image having a character height exceeding the threshold value from the average character height 1004. . Character image C1011 is rejected because it has a character width that exceeds the threshold value from average character width 1003 and has a character height that exceeds the threshold value from average character height 1004. Character image C1017 is rejected because it has a character width equal to or smaller than a threshold value from average character width 1003 and has a character height equal to or smaller than a threshold value from average character height 1004.

また、認識検証部１１２は、平均文字位置１００２（横書きの場合には垂直方向位置、縦書きの場合には水平方向位置）を求めておき、平均文字位置からずれの大きな文字画像を棄却する。例えば、文字画像C１０１８は、文字位置のずれが大きいので棄却する。
なお、平均文字幅１００３より閾値以下の文字幅を有し、または平均文字高さ１００４より閾値以下の文字高さを有する文字画像であっても、第１位候補文字が「。」、「‐」、「・」等の記号である場合には、その文字画像を棄却しないようにする。 In addition, the recognition verification unit 112 obtains an average character position 1002 (vertical position for horizontal writing, horizontal position for vertical writing), and rejects a character image having a large deviation from the average character position. For example, the character image C1018 is rejected because the character position shift is large.
Note that even if the character image has a character width less than the threshold value from the average character width 1003 or has a character height less than the threshold value from the average character height 1004, the first candidate character is “.”, “−. In the case of a symbol such as “” or “•”, the character image is not rejected.

また、「．」と「・」、「＿」と「―」等文字位置により、同一字形文字を識別し、必要に応じて第１位候補文字の修正を行う位置処理を行う。
次に、認識検証部１１２は、各文字相互の類似性に基づく文字間検証を行う。
認識検証部１１２は、個別文字検証部１１０から第１位候補文字と「有効」で有るとの認識結果の通知を受けると、類似度計算部１１１に検証対象の候補文字を通知し、特徴量抽出部１０６から通知された１行の文字列の他の候補文字との類似度を計算するよう指示する。 In addition, the same character shape character is identified by the character positions such as “.” And “•”, “_” and “−”, and position processing is performed to correct the first candidate character as necessary.
Next, the recognition verification unit 112 performs inter-character verification based on the similarity between the characters.
When the recognition verification unit 112 receives a notification of the recognition result that the first candidate character is “valid” from the individual character verification unit 110, the recognition verification unit 112 notifies the similarity calculation unit 111 of the candidate character to be verified, and the feature amount An instruction is given to calculate the similarity between the character string of one line notified from the extraction unit 106 and other candidate characters.

特徴量抽出部１０６から類似度の計算結果を得ると、候補文字選択部１０７から通知された類似度との積を求める。
図１１と図１２とにその具体例を示して説明する。
図１１は、１行に記載された文字列１１０１の文字画像C１１１１〜C１１１４を示している。これらの文字画像C１１１１〜C１１１４の候補文字選択部１０７および候補文字判定部１０９ならびに個別文字検証部１１０の選択、検証結果は、図１２に示すように、第１位候補文字としてそれぞれ「４」、「６」、「５」、「６」である。これらの第１位候補文字として候補文字選択部１０７で選択されたときの分類辞書１０１の基準特徴量との類似度Rは、それぞれ「０．７」、「０．５」、「０．６」、「０．８」である。なお、個別文字検証部１１０での検証結果では、全て「有効」と検証されている。 When the similarity calculation result is obtained from the feature amount extraction unit 106, the product of the similarity notified from the candidate character selection unit 107 is obtained.
A specific example will be described with reference to FIGS.
FIG. 11 shows character images C1111 to C1114 of the character string 1101 described in one line. The selection and verification results of the candidate character selection unit 107 and candidate character determination unit 109 and the individual character verification unit 110 of these character images C1111-C1114 are “4” as the first candidate character, as shown in FIG. “6”, “5”, and “6”. The similarity R to the reference feature amount of the classification dictionary 101 when selected as the first candidate character by the candidate character selection unit 107 is “0.7”, “0.5”, “0.6”, respectively. ”,“ 0.8 ”. In the verification result in the individual character verification unit 110, all are verified as “valid”.

今、文字画像C１１１２を検証対象の文字（C）とする。ここで文字画像C１１１２を検証対象の文字（C）としたのは、同一文字列１１０１の文字画像C１１１４が文字（C）と同一の候補文字「６」として選択されているからである。
類似度計算部１１１は、文字画像C１１１２と他の文字画像C１１１１、C１１１３、C１１１４との類似度ｒを計算する。この結果、他の文字画像C１１１１、C１１１３、C１１１４の類似度ｒはそれぞれ「０．１」、「０．７５」、「０．３」となっている。 Now, let the character image C1112 be the character (C) to be verified. The character image C1112 is selected as the verification target character (C) because the character image C1114 of the same character string 1101 is selected as the same candidate character “6” as the character (C).
The similarity calculation unit 111 calculates the similarity r between the character image C1112 and the other character images C1111, C1113, and C1114. As a result, the similarities r of the other character images C1111, C1113, and C1114 are “0.1”, “0.75”, and “0.3”, respectively.

認証検証部１１２は、この類似度ｒが最大となっている文字画像C１１１３を文字（C１）とする。文字（C）と同一の候補文字として選択されている文字画像C１１１４を文字（C2）とをする。文字（C１）と文字（C２）との候補文字が同一であれば、検証対象の文字（C）の候補文字は有効とする。しかし、文字（C１）は、「５」であり、文字（C２）は、「６」である。 The authentication verification unit 112 sets the character image C1113 having the maximum similarity r as a character (C1). The character image C1114 selected as the same candidate character as the character (C) is set as the character (C2). If the candidate characters of the character (C1) and the character (C2) are the same, the candidate character of the verification target character (C) is valid. However, the character (C1) is “5” and the character (C2) is “6”.

これらの文字（C１）、（C２）のｒ・Rの値は、それぞれ「０．４５」、「０．２４」と認証検証部１１２によって計算される。認証検証部１１２は、このｒ・Rの積がC１＞Ｃ２であれば、検証対象の文字（Ｃ）の認識結果を文字（Ｃ１）の候補文字に変更する。また、Ｃ１＜＝Ｃ２であるときは、検証対象の文字（Ｃ）を棄却する。
今の場合、Ｃ１＞Ｃ２であるので、文字画像Ｃ１１１２の候補文字「６」を文字（Ｃ１）の候補文字「５」に変更する。そして、検証結果を「有効」とする。 The r · R values of these characters (C1) and (C2) are calculated by the authentication verification unit 112 as “0.45” and “0.24”, respectively. If the product of r · R is C1> C2, the authentication verification unit 112 changes the recognition result of the verification target character (C) to a candidate character of the character (C1). When C1 <= C2, the verification target character (C) is rejected.
In this case, since C1> C2, the candidate character “6” of the character image C1112 is changed to the candidate character “5” of the character (C1). The verification result is “valid”.

認識検証部１１２は、「有効」と判定された第１位候補文字を認識検証結果として、認識結果出力部１１３に通知する。なお、「棄却」と判定された第１位候補文字については、エラー通知をする。
認識結果出力部１１３は、認識検証部１１２から通知された第１位候補文字を認識結果として、例えば、ディスプレイに表示させ、または、外部装置に文字コードとして出力する。この際、エラー通知がなされた文字については、判読困難であったことを表示させる。 The recognition verification unit 112 notifies the recognition result output unit 113 of the first candidate character determined to be “valid” as the recognition verification result. For the first candidate character determined as “rejected”, an error is notified.
The recognition result output unit 113 displays the first candidate character notified from the recognition verification unit 112 as a recognition result, for example, on a display or outputs it as a character code to an external device. At this time, the fact that the error notification has been made is displayed as being difficult to read.

次に、上記の実施の形態の動作についてフローチャートを用いて説明する。
図１３は、候補文字の選択までの動作を説明するフローチャートである。
先ず、文書画像入力受付部１０４は、ユーザからの文書画像の入力を受け付け、２値画像データに変換し（S１３０２）、文書画像から文字行を検出し、文字画像抽出部１０５に出力する（S１３０４）。文字画像抽出部１０５は、入力された文字行から文字画像を検出し（S１３０６）、形態特徴量抽出部１０８および認識検証部１１２の記憶領域に記憶させるとともに特徴量抽出部１０６に通知する（S１３０８）。 Next, the operation of the above embodiment will be described using a flowchart.
FIG. 13 is a flowchart for explaining operations up to selection of a candidate character.
First, the document image input receiving unit 104 receives a document image input from the user, converts it into binary image data (S1302), detects a character line from the document image, and outputs it to the character image extracting unit 105 (S1304). ). The character image extraction unit 105 detects a character image from the input character line (S1306), stores it in the storage area of the morphological feature amount extraction unit 108 and the recognition verification unit 112, and notifies the feature amount extraction unit 106 (S1308). ).

特徴量抽出部１０６は、文字画像から特徴量を抽出し、類似度計算部１１１に記憶させるとともに、候補文字選択部１０７に通知する（S１３１２）。
候補文字選択部１０７は、通知された特徴量と分類辞書１０１に登録されている基準特徴量との類似度を計算し、計算した類似度とその候補文字とを認識検証部１１２に記憶させるとともに（S１３１６）、類似度の高い順にN個の候補文字を選択する（S１３１８）。選択結果を候補文字判定部１０９２通知し（S１３２０）、１行中の全文字の処理が終了か否かを判定し（S１３２２）、終了でなければ、Ｓ１３０６に戻り、終了していれば、文書画像の全行について処理が終了したが否かを判定する（Ｓ１３２４）。終了していなければ、Ｓ１３０４に戻り、終了していれば、処理を終了する。 The feature amount extraction unit 106 extracts the feature amount from the character image, stores the feature amount in the similarity calculation unit 111, and notifies the candidate character selection unit 107 (S1312).
The candidate character selection unit 107 calculates the similarity between the notified feature amount and the reference feature amount registered in the classification dictionary 101, and stores the calculated similarity and the candidate character in the recognition verification unit 112. (S1316), N candidate characters are selected in descending order of similarity (S1318). The selection result is notified of the candidate character determination unit 1092 (S1320), and it is determined whether or not the processing of all characters in one line is finished (S1322). If not, the process returns to S1306. It is determined whether or not the processing has been completed for all the lines of the image (S1324). If not completed, the process returns to S1304, and if completed, the process ends.

図１４は、候補文字判定部１０９の動作を説明するフローチャートである。
候補文字判定部１０９は、候補文字選択部１０７から一の文字画像について第１位から第Ｎ位候補文字の通知を受けると、各候補文字の組み合わせが同一文字か否かを判定する（Ｓ１４０２）。同一文字であれば、Ｓ１４１４に移り、同一文字でなければ、類似候補文字識別辞書１０２の類似文字組み合わせにあるか否かを判定する（Ｓ１４０４）。なければＳ１４１４に移り、有るときは、その辞書データを読み出す（Ｓ１４０６）。 FIG. 14 is a flowchart for explaining the operation of the candidate character determination unit 109.
When the candidate character determination unit 109 receives the notification of the first to Nth candidate characters for one character image from the candidate character selection unit 107, the candidate character determination unit 109 determines whether the combination of the candidate characters is the same character (S1402). . If it is the same character, the process moves to S1414, and if it is not the same character, it is determined whether or not it is in the similar character combination of the similar candidate character identification dictionary 102 (S1404). If not, the process moves to S1414. If there is, the dictionary data is read (S1406).

当該文字画像の形態特徴量を形態特徴量抽出部１０８に抽出させ（Ｓ１４０８）、抽出された形態特徴量と辞書データのルールとを照合し（Ｓ１４１０）、いずれの候補文字が優位かを判定する（Ｓ１４１２）。いずれが優位であるかを判定できないとき、Ｓ１４１４に移り、いずれかが優位であると判定したとき、優位と判定した候補文字を第１位候補文字として（Ｓ１４１６）、処理を終了する。 The morphological feature amount of the character image is extracted by the morphological feature amount extraction unit 108 (S1408), and the extracted morphological feature amount and the rule of the dictionary data are collated (S1410) to determine which candidate character is dominant. (S1412). When it is not possible to determine which is dominant, the process proceeds to S1414. When it is determined that any is dominant, the candidate character determined to be dominant is set as the first candidate character (S1416), and the process ends.

Ｓ１４１４において、別の候補文字の組み合わせがまだあるか否かを判定し、あれば、Ｓ１４０２に戻り、なければ処理を終了する。
以上の処理によって、候補文字選択部１０７で選択された一の文字画像に対するＮ個の候補文字について、類似度の高い２個の候補文字が類似候補文字識別辞書１０２に記載されているとき、その形態特徴量によって、いずれの候補文字がより文字画像に合致するかを判定するものである。 In S1414, it is determined whether there is still another combination of candidate characters. If there is, the process returns to S1402, and if not, the process ends.
With the above processing, when two candidate characters having a high degree of similarity are described in the similar candidate character identification dictionary 102 for N candidate characters for one character image selected by the candidate character selection unit 107, Which candidate character matches the character image more is determined based on the form feature amount.

図１５は、個別文字検証部の動作について説明するフローチャートである。
個別文字検証部１１０は、候補文字判定部１０９から一の文字画像についての第１位から第Ｎ位候補文字の通知を受けると、カウンタＮを「１」に初期化する（Ｓ１５０２）。
次に、個別検証辞書１０３からの第Ｎ位候補文字のルールを読み出す（Ｓ１５０４）。形態特徴量抽出部１０８に文字画像からルールに記載された形態特徴量を抽出させる（Ｓ１５０６）。抽出された形態特徴量と個別検証辞書１０３の肯定・否定条件のルールと比較検証する（Ｓ１５０８）。カウンタＮが１であるか否かを判定し（Ｓ１５１０）、１であれば、否定条件が成立するか否かを判定する（Ｓ１５１２）。成立すれば、第１位候補文字を「棄却」し（Ｓ１５１４）、処理を終了する。 FIG. 15 is a flowchart for explaining the operation of the individual character verification unit.
When the individual character verification unit 110 receives notification of the first to Nth candidate characters for one character image from the candidate character determination unit 109, the individual character verification unit 110 initializes the counter N to “1” (S1502).
Next, the rule of the Nth candidate character from the individual verification dictionary 103 is read (S1504). The morphological feature amount extraction unit 108 extracts the morphological feature amount described in the rule from the character image (S1506). The extracted morphological feature value is compared and verified with the rule of affirmation / negative condition of the individual verification dictionary 103 (S1508). It is determined whether or not the counter N is 1 (S1510). If it is 1, it is determined whether or not a negative condition is satisfied (S1512). If established, the first candidate character is “rejected” (S1514), and the process is terminated.

否定条件が成立しないとき、肯定条件が成立するか否かを判定する（Ｓ１５１６）。成立するときは、第１位候補文字を「有効」とし（Ｓ１５１８）、処理を終了する。
肯定条件が成立しないとき、Ｓ１５２０に移る。Ｓ１５２０において、カウンタＮに１を加える（Ｓ１５２０）。第Ｎ位候補文字があるか否かを判定し（Ｓ１５２２）、有るときはＳ１５０４に戻る。 When the negative condition is not satisfied, it is determined whether the positive condition is satisfied (S1516). When it is established, the first candidate character is set to “valid” (S1518), and the process ends.
If the positive condition is not satisfied, the process moves to S1520. In S1520, 1 is added to the counter N (S1520). It is determined whether there is an Nth candidate character (S1522), and if there is, return to S1504.

Ｓ１５１０において、カウンタＮが１でないとき、否定条件が成立するか否かを判定する（Ｓ１５２４）。成立するときはＳ１５２０に移り、成立しないときは、肯定条件が成立するか否かを判定する（Ｓ１５２６）。成立するときは、Ｓ１５１４に移る。成立しないときは、Ｓ１５２０に移る。
Ｓ１５２２において、Ｎ位候補文字がないときは、第１位候補文字を「有効」として（Ｓ１５２８）、処理を終了する。 If the counter N is not 1 in S1510, it is determined whether or not a negative condition is satisfied (S1524). When it is satisfied, the process proceeds to S1520. When it is not satisfied, it is determined whether an affirmative condition is satisfied (S1526). If established, the process moves to S1514. If not, the process moves to S1520.
If there is no Nth candidate character in S1522, the first candidate character is set to “valid” (S1528), and the process ends.

なお、この実施の形態では、第１位候補文字が否定条件も肯定条件も成立しないとき、第２位以下の候補文字についても否定条件も肯定条件も成立しないときに、第１位候補文字を「有効」としたけれども「棄却」としてもよい。
次に、認識検証部１１２の動作を図１６〜図１８のフローチャートを用いて説明する。
先ず、認識検証部１１２は、文字画像抽出部１０５から入力されている文字画像について、平均文字幅・平均文字高さの算出を行い（Ｓ１６０２）、注目文字の文字幅・文字高さを算出して（Ｓ１６０４）、平均との差異を評価する（Ｓ１６０６）。文字サイズが所定の閾値を超えて大きいか否かを判定し（Ｓ１６０８）、大きいときには注目文字を棄却し（Ｓ１６１０）、Ｓ１６１６に移る。 In this embodiment, when the first candidate character does not satisfy the negative condition or the affirmative condition, when the negative condition or the affirmative condition does not hold for the second or lower candidate character, the first candidate character is Although it is “valid”, it may be “rejected”.
Next, the operation of the recognition verification unit 112 will be described with reference to the flowcharts of FIGS.
First, the recognition verification unit 112 calculates the average character width / average character height for the character image input from the character image extraction unit 105 (S1602), and calculates the character width / character height of the character of interest. (S1604), and the difference from the average is evaluated (S1606). It is determined whether or not the character size exceeds a predetermined threshold (S1608). If the character size is large, the character of interest is rejected (S1610), and the process proceeds to S1616.

文字サイズが所定の閾値を超えない場合には、さらに所定の閾値以下か否かを判定する（S１６１２）。以下の場合には、認識結果が「。」、「・」等の記号であるか否かを判定する（S１６１４）。否定のときにはS１６１０に移る。肯定のときにはS１６１６に移る。
S１６１６において全文字についての処理が終了したか否かを判定して、終了していれば処理をS１７０２に移り、終了していなければS１６０４に戻る。 If the character size does not exceed the predetermined threshold, it is further determined whether or not the character size is equal to or smaller than the predetermined threshold (S1612). In the following cases, it is determined whether or not the recognition result is a symbol such as “.” Or “•” (S1614). If negative, the process moves to S1610. If yes, the process moves to S1616.
In S1616, it is determined whether or not the processing has been completed for all characters. If completed, the process proceeds to S1702, and if not completed, the process returns to S1604.

S１７０２において、認識検証部１１２は、１行の文字列について平均文字位置を算出する。次に、文字サイズが小さいか否かを判定し（S１７０４）、小さくければ注目文字の平均文字位置からの変位を算出し（S１７０６）、その変位を評価する（S１７０８）。変位が大きいか否かを判定し（S１７１０）、大きければ注目文字を棄却して（S１７１２）、S１７１４に移る。変位が小さいときもS１７１４に移る。 In S1702, the recognition verification unit 112 calculates an average character position for one line of character string. Next, it is determined whether or not the character size is small (S1704). If the character size is small, the displacement of the target character from the average character position is calculated (S1706), and the displacement is evaluated (S1708). It is determined whether or not the displacement is large (S1710). If it is large, the character of interest is rejected (S1712), and the process proceeds to S1714. Even when the displacement is small, the process proceeds to S1714.

文字サイズが小さいと判定したとき、位置処理を行い（S１７１８）、全文字の処理を終了したか否かを判定する（S１７１４）。終了していないときは、S１７０４に戻り、終了しているときはS１８０２に移る。
Ｓ１８０２において、認証検証部１１２は、同一文字列に第１位候補文字として同一文字として認識された文字画像があるときに、その文字画像を注目文字（Ｃ）としてその注目文字（Ｃ）の文字画像と他の文字画像との特徴量から類似度ｒを類似度計算部１１１に算出させる。このうち最も類似度の高い文字画像の候補文字を検出し、文字（Ｃ１）とする（Ｓ１８０４）。 When it is determined that the character size is small, position processing is performed (S1718), and it is determined whether or not processing of all characters has been completed (S1714). If not completed, the process returns to S1704, and if completed, the process proceeds to S1802.
In S1802, when there is a character image recognized as the same character as the first candidate character in the same character string, the authentication verification unit 112 uses the character image as the attention character (C) and the character of the attention character (C). The similarity calculation unit 111 calculates the similarity r from the feature amount of the image and another character image. Among these, the candidate character of the character image with the highest similarity is detected and set as a character (C1) (S1804).

注目文字（Ｃ）と同一の候補文字として選択された候補文字で類似度の最も高い候補文字を文字（Ｃ２）とする（Ｓ１８０６）。文字（Ｃ１）と文字（Ｃ２）とが同一か否かを判定する（Ｓ１８０８）。同一であるとき、文字（Ｃ１）が「棄却」されていたか否かを判定し（Ｓ１８１０）、否であれば文字（Ｃ）を「有効」とし（Ｓ１８１２）、Ｓ１８２０に移る。 The candidate character having the highest similarity among the candidate characters selected as the same candidate character as the target character (C) is set as the character (C2) (S1806). It is determined whether or not the character (C1) and the character (C2) are the same (S1808). If they are the same, it is determined whether or not the character (C1) has been “rejected” (S1810). If not, the character (C) is set to “valid” (S1812), and the process proceeds to S1820.

Ｓ１８０８において、同一でないとき、文字（Ｃ１）は「有効」かつ、文字（Ｃ１）のｒ・Ｒ＞文字（Ｃ２）のｒ・Ｒであるか否かを判定する（Ｓ１８１４）。肯定のときには、文字（Ｃ１）の認識結果（候補文字）を注目文字（Ｃ）の第１位候補文字に変更し（Ｓ１８１６）、Ｓ１８２０に移る。Ｓ１８１４において、否のとき、注目文字（Ｃ）の第１位候補文字を「棄却」し（Ｓ１８１８）、全文字についての処理が終了したか否かを判定する（Ｓ１８２０）。終了していれば処理を終了し、残っていればＳ１８０２に戻る。 If they are not the same in S1808, it is determined whether or not the character (C1) is “valid” and r · R of the character (C1)> r · R of the character (C2) (S1814). When the result is affirmative, the recognition result (candidate character) of the character (C1) is changed to the first candidate character of the target character (C) (S1816), and the process proceeds to S1820. If no in S1814, the first candidate character of the target character (C) is “rejected” (S1818), and it is determined whether or not the processing for all the characters has been completed (S1820). If completed, the process is terminated, and if remaining, the process returns to S1802.

なお、本実施の形態では、同一行の文字列の文字画像同士の類似度ｒを用いて第１位候補文字の認識の確かさを検証したけれども、同一の文書画像全体の文字画像同士の類似度ｒを用いて第１位候補文字の認識の確かさを検証してもよい。
また、本実施の形態では、同一行の文字画像中に同一候補文字として選択された候補文字が複数あるとき、そのうちの一の候補文字の文字画像を注目文字としたけれども、注目文字は、同一の文書画像全体の最初の文字から最後の文字まで順に注目文字とすることもできる。 In this embodiment, although the certainty of recognition of the first candidate character is verified using the similarity r between the character images of the character strings on the same line, the similarity between the character images of the same document image as a whole is verified. The degree of recognition of the first candidate character may be verified using the degree r.
Further, in the present embodiment, when there are a plurality of candidate characters selected as the same candidate character in the character image of the same line, the character image of one of the candidate characters is the attention character, but the attention character is the same The first character and the last character of the entire document image can be used as the attention character in order.

さらに、候補文字の認識結果に印刷文字と手書き文字とが混在している場合または、候補文字の認識結果に英数字と漢字、ひらがな、カタカナとが混在している場合には、認識結果の属性によって、検証を行うことができる。
また、本実施の形態において、分類辞書１０１をマルチテンプレート辞書として説明したけれども、文字ごとに１つの基準特徴量を登録した単テンプレート辞書であってもよい。 Furthermore, if the recognition result of the candidate character is a mixture of printed characters and handwritten characters, or if the recognition result of the candidate character is a combination of alphanumeric characters, kanji, hiragana, and katakana, the attribute of the recognition result Can be verified.
Further, although the classification dictionary 101 is described as a multi-template dictionary in the present embodiment, it may be a single template dictionary in which one reference feature amount is registered for each character.

また、本実施の形態において、候補文字選択部１０７においてN個の候補文字を選択するようにしたけれども、所定の類似度を設定し、所定の類似度よりも高い類似度を有する候補文字を選択するようにしてもよい。なお、この場合にも、少なくとも１の候補文字を選択するようにする。
また、本実施の形態において、認識検証部１１２は、個別文字検証部１１０で「有効」と判定された候補文字について認識検証したけれども、候補文字選択部１０７で選択された第１位候補文字について、直接文字画像同士の類似度ｒを用いて、他の第１位候補文字に変更処理をしてもよい。 In the present embodiment, the candidate character selection unit 107 selects N candidate characters, but sets a predetermined similarity and selects a candidate character having a similarity higher than the predetermined similarity. You may make it do. In this case as well, at least one candidate character is selected.
In this embodiment, the recognition verification unit 112 recognizes and verifies the candidate character determined to be “valid” by the individual character verification unit 110, but the first candidate character selected by the candidate character selection unit 107. Alternatively, the modification process may be performed on another first candidate character using the similarity r between the character images directly.

また、本実施の形態において、個別文字検証部１１０は、候補文字判定部１０９の類似文字の判定結果を受けて、個別文字を検証したけれども、候補文字選択部１０７で選択された候補文字について、直接「有効」または「棄却」を検証してもよい。
（その他変形例）
なお、本発明は、上記実施の形態に限定されないのは勿論であり、以下のような変形例を実施することができる。
（１）上記文字認識装置を図１に示す構成図で示したけれども、具体的には、マイクロプロセッサ、ROM、RAM、ハードディスクユニット、ディスプレーユニット、キーボード、マウス、光学読み取り装置等から構成されるコンピュータシステムである。前記RAMまたはハードディスクユニットには、コンピュータプログラムが記録されている。このコンピュータプログラムは、上記構成図の各部の機能をコンピュータに発揮させるプログラムである。前記マイクロプロセッサが前記コンピュータプログラムに従って動作することにより各部の機能を達成する。ここで、コンピュータプログラムは、所定の機能を達成するために、コンピュータに対する指令を示す命令コードが複数個組み合わされて構成されたものである。
（２）上記文字認識装置を構成する構成要素の一部又は全部は、１個のシステムLSI（Large Scale Integration：大規模集積回路）から構成されているとしてもよい。システムLSIは、複数の構成部を１個のチップ上に集積して製造された超多機能LSIであり、具体的には、マイクロプロセッサ、ROM、RAMなどを含んで構成されるコンピュータシステムである。前記RAMには、コンピュータプログラムが記憶されている。前記マイクロプロセッサが、前記コンピュータプログラムに従って動作することにより、システムLSIは、その機能を達成する。
（３）上記文字認識装置を構成する構成要素の一部又は全部は、装置に着脱可能なICカード又は単体のモジュールから構成されているとしてもよい。前記ICカードまたは前記モジュールは、マイクロプロセッサ、ROM、RAMなどから構成されるコンピュータシステムである。前記ICカード又は前記モジュールは、上記超多機能LSIを含むとしてもよい。マイクロプロセッサには、コンピュータプログラムに従って動作することにより、前記ICカード又は前記モジュールは、その機能を達成する。このICカード又はこのモジュールは、耐タンパ性を有するとしてもよい。
（４）本発明は、その動作を説明したように、文字認識装置の文字認識方法であるとしてもよい。またこの方法をコンピュータにより実現するコンピュータプログラムであるとしてもよいし、前記コンピュータプログラムからなるデジタル信号であるとしてもよい。 In the present embodiment, the individual character verification unit 110 receives the determination result of the similar characters from the candidate character determination unit 109 and verifies the individual characters. However, for the candidate characters selected by the candidate character selection unit 107, Directly “valid” or “rejected” may be verified.
(Other variations)
Of course, the present invention is not limited to the above-described embodiments, and the following modifications can be implemented.
(1) Although the character recognition device is shown in the block diagram of FIG. 1, specifically, a computer comprising a microprocessor, ROM, RAM, hard disk unit, display unit, keyboard, mouse, optical reader, and the like. System. A computer program is recorded in the RAM or hard disk unit. This computer program is a program that causes a computer to perform the functions of the respective units in the above configuration diagram. The function of each unit is achieved by the microprocessor operating according to the computer program. Here, the computer program is configured by combining a plurality of instruction codes indicating instructions for the computer in order to achieve a predetermined function.
(2) A part or all of the components constituting the character recognition device may be constituted by one system LSI (Large Scale Integration). The system LSI is an ultra-multifunctional LSI manufactured by integrating a plurality of components on a single chip. Specifically, the system LSI is a computer system including a microprocessor, a ROM, a RAM, and the like. . A computer program is stored in the RAM. The system LSI achieves its functions by the microprocessor operating according to the computer program.
(3) A part or all of the components constituting the character recognition device may be constituted by an IC card that can be attached to and detached from the device or a single module. The IC card or the module is a computer system including a microprocessor, a ROM, a RAM, and the like. The IC card or the module may include the super multifunctional LSI. The IC card or the module achieves its function by the microprocessor operating according to the computer program. This IC card or this module may have tamper resistance.
(4) The present invention may be a character recognition method of a character recognition device as described in its operation. Further, the present invention may be a computer program that realizes this method by a computer, or may be a digital signal composed of the computer program.

また、本発明は、前記コンピュータプログラム又は前記デジタル信号をコンピュータ読み取り可能な記録媒体、例えば、フレキシブルディスク、ハードディスク、CD‐ROM、MO、DVD、DVD‐ROM、DVD‐RAM、BD（Blu‐ray Disc）、半導体メモリなどに記録したものとしてもよい。また、これらの記録媒体に記録されている前記デジタル信号であるとしてもよい。 The present invention also provides a computer-readable recording medium such as a flexible disk, hard disk, CD-ROM, MO, DVD, DVD-ROM, DVD-RAM, BD (Blu-ray Disc). ), Recorded in a semiconductor memory or the like. The digital signal may be recorded on these recording media.

また、本発明は、前記コンピュータプログラムまたは前記デジタル信号を、電気通信回線、無線または有線通信回線、インターネットを代表とするネットワーク、データ放送等を経由して伝送するものとしてもよい。
また、本発明は、マイクロプロセッサとメモリを備えたコンピュータシステムであって、前記メモリは、上記コンピュータプログラムを記憶しており、前記マイクロプロセッサは、前記コンピュータプログラムに従って動作するとしてもよい。 In the present invention, the computer program or the digital signal may be transmitted via an electric communication line, a wireless or wired communication line, a network represented by the Internet, a data broadcast, or the like.
The present invention may be a computer system including a microprocessor and a memory, wherein the memory stores the computer program, and the microprocessor operates according to the computer program.

また、前記プログラムまたは前記デジタル信号を前記記録媒体に記録して移送することにより、または前記プログラムまたは前記デジタル信号を前記ネットワーク等を経由して移送することにより、独立した他のコンピュータシステムにより実施するとしてもよい。
（５）上記実施の形態および上記変形例をそれぞれ組み合わせるとしてもよい。 In addition, the program or the digital signal is recorded on the recording medium and transferred, or the program or the digital signal is transferred via the network or the like, and executed by another independent computer system. It is good.
(5) The above embodiment and the above modifications may be combined.

本発明に係る文字認識装置は、画像読み取り装置を備えるパーソナルコンピュータの文字入力装置として利用され、多様な書体の文書や特に字形の変動の大きい手書き文字の入力装置として、例えば、帳票類からの文字入力装置として事務分野で活用される。 The character recognition device according to the present invention is used as a character input device of a personal computer equipped with an image reading device, and is used as an input device for various typeface documents and particularly handwritten characters having a large variation in character shape. Used in office work as an input device.

本発明に係る文字認識装置の一実施の形態の構成図である。It is a block diagram of one Embodiment of the character recognition apparatus which concerns on this invention. 上記実施の形態の分類辞書のデータ構造を示す図である。It is a figure which shows the data structure of the classification dictionary of the said embodiment. 上記実施の形態の分類辞書の一例を示す図である。It is a figure which shows an example of the classification dictionary of the said embodiment. 上記実施の形態の類似候補文字識別辞書の一例を示す図である。It is a figure which shows an example of the similar candidate character identification dictionary of the said embodiment. 上記実施の形態の類似候補文字識別辞書の内容を説明するための図である。It is a figure for demonstrating the content of the similar candidate character identification dictionary of the said embodiment. 上記実施の形態の個別検証辞書の一例を示す図である。It is a figure which shows an example of the individual verification dictionary of the said embodiment. 上記実施の形態の個別検証辞書の内容を説明するための図である。It is a figure for demonstrating the content of the separate verification dictionary of the said embodiment. 上記実施の形態の文書画像入力受付部で受け付けられた文書画像の一例を示す図である。It is a figure which shows an example of the document image received by the document image input reception part of the said embodiment. 上記実施の形態の候補文字選択部で選択された複数の候補文字の一例を示す図である。It is a figure which shows an example of the several candidate character selected in the candidate character selection part of the said embodiment. 上記実施の形態の認識検証部における棄却処理を説明するための図である。It is a figure for demonstrating the rejection process in the recognition verification part of the said embodiment. 上記実施の形態の認識検証部に通知された文字画像一例を示す図である。It is a figure which shows an example of the character image notified to the recognition verification part of the said embodiment. 上記実施の形態の認識検証部における処理内容を説明するための図である。It is a figure for demonstrating the processing content in the recognition verification part of the said embodiment. 上記実施の形態の文書画像入力受付部、文字画像抽出部、特徴量抽出部および候補文字選択部の動作を説明するフローチャートである。It is a flowchart explaining operation | movement of the document image input reception part of the said embodiment, a character image extraction part, a feature-value extraction part, and a candidate character selection part. 上記実施の形態の候補文字判定部の動作を説明するフローチャートである。It is a flowchart explaining operation | movement of the candidate character determination part of the said embodiment. 上記実施の形態の個別文字検証部の動作を説明するフローチャートである。It is a flowchart explaining operation | movement of the separate character verification part of the said embodiment. 上記実施の形態の認識検証部の動作を説明するフローチャート（その１）である。It is a flowchart (the 1) explaining operation | movement of the recognition verification part of the said embodiment. 上記実施の形態の認識検証部の動作を説明するフローチャート（その２）である。It is a flowchart (the 2) explaining operation | movement of the recognition verification part of the said embodiment. 上記実施の形態の認識検証部の動作を説明するフローチャート（その３）である。It is a flowchart (the 3) explaining operation | movement of the recognition verification part of the said embodiment.

Explanation of symbols

１０１分類辞書
１０２類似候補文字識別辞書
１０３個別検証辞書
１０４文書画像入力受付部
１０５文字画像抽出部
１０６特徴量抽出部
１０７候補文字選択部
１０８形態特徴量抽出部
１０９候補文字判定部
１１０個別文字検証部
１１１類似度計算部
１１２認識検証部
１１３認識結果出力部 DESCRIPTION OF SYMBOLS 101 Classification dictionary 102 Similar candidate character identification dictionary 103 Individual verification dictionary 104 Document image input reception part 105 Character image extraction part 106 Feature-value extraction part 107 Candidate character selection part 108 Morphological feature-value extraction part 109 Candidate character determination part 110 Individual character verification part 111 Similarity Calculation Unit 112 Recognition Verification Unit 113 Recognition Result Output Unit

Claims

A character recognition device that receives input of a document image and recognizes each character included in the document image,
A classification dictionary in which the reference feature values of each character are registered;
A character image extracting means for extracting a character image from a document image;
Feature amount extraction means for extracting the feature amount from the extracted character image;
A candidate character selection means for calculating a similarity between the extracted feature quantity and the reference feature quantity, and selecting a character having a high similarity as a candidate character;
Similarity calculation means for calculating the similarity between character images in the same document using the feature quantity extracted by the feature quantity extraction means;
A recognition verification means for verifying the certainty of recognition as a candidate character based on the similarity as a candidate character and the similarity between the character images ;
The recognition verification means includes
A calculation unit for calculating a product of the similarity r with the verification target character calculated by the similarity calculation unit and the similarity R calculated by the candidate character selection unit;
A recognition result changing unit that changes the recognition result of the verification target character to the other candidate character when there is another candidate character having a product larger than the product of the same candidate character selected by the candidate character selection unit; character recognition apparatus according to claim Rukoto to have a.

An individual verification dictionary that describes, as rules, an affirmative condition indicating morphological feature quantities to be provided as candidate characters for each character and a negative condition indicating morphological feature quantities that should not be provided as candidate characters,
Extracts morphological features from the character image, determines whether the selected candidate character meets an affirmative condition or a negative condition, and rejects the selection of the candidate character when it is determined that the negative condition is met The character recognition apparatus according to claim 1, further comprising individual character verification means.

A similar candidate character identification dictionary describing two character feature quantities having high similarity and rules for identifying the two characters;
When candidate characters selected from one character image are registered in the similar candidate character identification dictionary, a morphological feature amount is extracted from the character image, checked against the rule, and any of the two characters is selected as a candidate character. The character recognition device according to claim 1, further comprising candidate character determination means for determining whether or not

Wherein the configuration characteristic quantity, character recognition loop position or number average linear density of the predetermined range, the convex end positions or number, claim 2 or 3, wherein it contains at least one of the concave end point position or the number apparatus.