JP2014115781A

JP2014115781A - Character recognition device and method, and character recognition program

Info

Publication number: JP2014115781A
Application number: JP2012268549A
Authority: JP
Inventors: Toshinori Miyoshi; 利昇三好; Hiroshi Shinjo; 広新庄; Takeshi Nagasaki; 健永崎; Yasutaka Tsutsumi; 庸昂堤
Original assignee: Hitachi Information and Telecommunication Engineering Ltd
Current assignee: Hitachi Information and Telecommunication Engineering Ltd
Priority date: 2012-12-07
Filing date: 2012-12-07
Publication date: 2014-06-26
Anticipated expiration: 2032-12-07
Also published as: CN103870823A; JP6055297B2; CN103870823B

Abstract

PROBLEM TO BE SOLVED: To perform efficient rejection for reducing an erroneous reading rate while suppressing the drop width of a correct reading rate, and to reduce a calculation amount for rejection determination.SOLUTION: Rejection determinations 109 and 113 based on various indexes are serially combined when rejection values are highly independent of each other, and parallely combined when the independence is low. The rejection index of a high rejection rate and the rejection index of calculation cost are arranged before processing. A character identification part 106 recognizes a character in the image of each character unit cut out by a character cutting-out part 105. A plurality of rejection value calculations 107, 108, and 110 to 112 are arranged with the rejection value calculations 107 and 108 of high rejection performance set first in order. In the rejection determination 109, when rejection is determined based on rejection values calculated in the preceding rejection value calculations 107 and 108, whether a recognition result is rejected by omitting subsequent rejection value calculation processing.

Description

本発明は、文字認識装置及び方法、文字認識プログラムに係り、特に、複数の棄却値を組み合わせた棄却判定方式を有する光学的文字認識装置及び方法、文字認識プログラムに関する。また、本実施例は、文字認識の技術のうちでも、特に、棄却の技術に関する。
The present invention relates to a character recognition device and method, and a character recognition program, and more particularly, to an optical character recognition device and method and a character recognition program having a rejection determination method in which a plurality of rejection values are combined. In addition, the present embodiment relates to a rejection technique among character recognition techniques.

本技術分野は、光学的文字認識（ＯＣＲ：ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅｃｏｇｎｉｔｉｏｎ）装置に関する。ＯＣＲ装置は、紙文書をスキャナ等によって読取り、画像中の文字や記号を認識することによって、Ｕｎｉｃｏｄｅ等にコード化することで、電子化する。ＯＣＲ装置は、一般企業、自治体、金融機関、医療機関、教育機関などにおいて、会計伝票、納付済通知書、給与報告書、注文書、総振り、診療報酬明細書、解答用紙、などの電子化に用いられる。また、一般ユーザ向けには、携帯電話による文字認識、メモ等の一般文書中の文字認識などで用いられる。
ＯＣＲ装置による文書電子化の処理の流れを簡略化して説明する。
図６は、文字認識装置による文書電子化の流れを説明するための図である。まず、スキャナ等による文書の画像化、二値化やノイズ処理等の前処理となる。これにより、例えば、図６の参照番号６０１のような二値の文書画像が得られる。次に、ＯＣＲ装置による、図表の位置や文書の段落構造等のレイアウト解析と文字列抽出によって、文字列画像が、例えば、図６の参照番号６０２のように得られる。その後、ＯＣＲ装置は、文字列画像から文字切出によって文字単位の画像を切出し、その後、個々の画像中の文字を認識する。文書の画像化から文字列抽出までの処理は、例えば、特許文献１、特許文献２に記載されている。また、文字列画像から個々の文字を認識するまでの処理は、例えば、特許文献３、非特許文献１、非特許文献２に記載されている。 The technical field relates to an optical character recognition (OCR) apparatus. The OCR device reads a paper document with a scanner or the like, recognizes characters or symbols in the image, and encodes them into Unicode or the like, thereby digitizing them. OCR devices are digitized such as accounting slips, paid notices, salary reports, order forms, general payments, medical remuneration statements, answer sheets, etc. in general companies, local governments, financial institutions, medical institutions, educational institutions, etc. Used for. For general users, it is used for character recognition by a mobile phone, character recognition in a general document such as a memo, and the like.
A flow of document digitization processing by the OCR apparatus will be described in a simplified manner.
FIG. 6 is a diagram for explaining the flow of document digitization by the character recognition device. First, preprocessing such as imaging, binarization, and noise processing of a document by a scanner or the like is performed. Thereby, for example, a binary document image such as reference numeral 601 in FIG. 6 is obtained. Next, a character string image is obtained, for example, as indicated by reference numeral 602 in FIG. 6 by layout analysis and character string extraction such as the position of a chart and the paragraph structure of a document by the OCR device. After that, the OCR device cuts out character-by-character images by cutting out characters from the character string image, and then recognizes the characters in the individual images. Processing from document imaging to character string extraction is described in, for example, Patent Document 1 and Patent Document 2. Moreover, the process until it recognizes each character from a character string image is described in the patent document 3, the nonpatent literature 1, and the nonpatent literature 2, for example.

本技術は、個々の文字画像の認識技術に関する。以下では、個々の文字画像中に描かれている文字を認識する技術に関して簡単に説明する。
まず、文字画像をベクトル値に変換する特徴抽出処理を行う。ベクトル値の次元数をＮとすると、特徴抽出処理によって、１つの文字画像はＮ次元ベクトルとして表現される。同じ字種の文字画像から抽出されるＮ次元ベクトルは、Ｎ次元空間上の近い位置に分布する。
図９は、その様子を模式図により示したものである。丸、三角、四角がそれぞれ、文字種Ａ，文字種Ｂ、文字種Ｃに対応する各文字画像から抽出されたベクトル値を表している。
次に予め作成しておいた文字識別用辞書を参照し、文字画像から抽出されたベクトル値に基づいて、文字画像中に描かれている文字を識別する。
ここで、まず、文字識別辞書について説明しておく。文字識別用辞書には、例えば、各識別対象文字種ｋに対して、Ｎ次元ベクトルを引数にとり、実数値を値にとる識別関数ｆｋ（ｘ）が保存されている。識別関数ｆｋ（ｘ）は、文字種ｋが描かれている文字画像から生成されるＮ次元ベクトルｘに対しては大きい値を、その他の字種が描かれている文字画像から生成されるＮ次元ベクトルｘに対しては小さい値をとるように、予め、学習によって生成しておく。識別関数ｆｋ（ｘ）の値は、ベクトルｘの字種ｋに対する類似度、尤度などと呼ばれる。例えば、数字を対象とした認識の場合には、０〜９の１０字種に対応して、１０個の識別関数ｆ０（ｘ）、ｆ１（ｘ）、…、ｆ９（ｘ）が存在する。
文字の識別では、文字画像から抽出したＮ次元ベクトルｘを用いて、各字種の識別関数ｆｋ（ｘ）の値を計算する。識別関数ｆｋ（ｘ）の値は、字種ｋに対する類似度であるため、ｆｋ（ｘ）の値が最も大きい字種ｋが認識結果の第一位候補となる。同じように、二番目に値が大きい識別関数に対する字種ｋが認識結果の第二候補となる。このようにして第ｎ候補まで認識結果が得られる。 The present technology relates to a technology for recognizing individual character images. Hereinafter, a technique for recognizing characters drawn in individual character images will be briefly described.
First, feature extraction processing for converting a character image into a vector value is performed. If the number of dimensions of the vector value is N, one character image is expressed as an N-dimensional vector by the feature extraction process. N-dimensional vectors extracted from character images of the same character type are distributed at close positions in the N-dimensional space.
FIG. 9 is a schematic diagram showing the situation. Circles, triangles, and squares represent vector values extracted from character images corresponding to character type A, character type B, and character type C, respectively.
Next, with reference to a character identification dictionary created in advance, a character drawn in the character image is identified based on a vector value extracted from the character image.
Here, first, the character identification dictionary will be described. The character identification dictionary stores, for example, an identification function fk (x) that takes an N-dimensional vector as an argument and takes a real value as a value for each character type k to be identified. The identification function fk (x) has a large value for an N-dimensional vector x generated from a character image in which the character type k is drawn, and an N-dimensional value generated from a character image in which other character types are drawn. The vector x is generated by learning in advance so as to take a small value. The value of the discriminant function fk (x) is called the similarity, likelihood, etc., of the vector x to the character type k. For example, in the case of recognition for numbers, there are ten identification functions f0 (x), f1 (x),..., F9 (x) corresponding to 10 character types of 0-9.
In character identification, the value of the identification function fk (x) of each character type is calculated using the N-dimensional vector x extracted from the character image. Since the value of the discriminant function fk (x) is the similarity to the character type k, the character type k having the largest value of fk (x) is the first candidate for the recognition result. Similarly, the character type k for the discriminant function having the second largest value is the second candidate for the recognition result. In this way, recognition results are obtained up to the nth candidate.

図７は、文字識別の結果を説明するための図である。例えば、図６の文字切出（参照番号６０３）によって切出した文字画像の認識は、図７のようになる。以上により、図６の参照番号６０４のように認識結果が得られ、計算機が扱える文字コードなどのコードに変換される。
上記で説明した文字識別は、文字画像と各認識対象字種の類似度を計算し、それに基づいて、候補文字を得る処理である。ＯＣＲ装置の有用性を高めるためには、この文字識別の精度が重要である。しかし、認識結果が疑わしい場合には、それを知らせる認識結果の棄却処理も重要である。
図１２は、非文字と曖昧文字の例を示すための図である。棄却の対象となるものには、たとえば、図１２の文字例１２０１に示すような非文字や文字例１２０２に示すような曖昧文字がある。非文字は、たとえば、文字切出のミスによる文字の一部や複数文字が合わさった画像、汚れなどの外乱要因が混入したものなどがある。曖昧文字は、たとえば、文字例１２０２の左端の画像のように７と９の区別がつかないものなどがある。
棄却処理が精緻であれば、いくつかの利点がある。ひとつは、もし、誤って文字を認識したまま結果が保存されると、誤ったままにするか、これを修正するためには、全認識結果を人手によって再チェックしなければならない。これに対して、認識結果が疑わしい場合に、これをユーザに知らせることができれば、ユーザはその部分のみ修正すればよい。また、棄却を精度良く行うことができれば、その要因として、前処理、文字行抽出、文字切出など、前の処理に失敗している可能性があると判断して、前のいずれかの処理から処理方法や処理条件などを変えて、再度、処理を試すことができる。これにより、認識精度を高めることができる。 FIG. 7 is a diagram for explaining the result of character identification. For example, the recognition of the character image cut out by the character cutout (reference number 603) in FIG. 6 is as shown in FIG. As described above, a recognition result is obtained as indicated by reference numeral 604 in FIG. 6 and converted into a code such as a character code that can be handled by the computer.
The character identification described above is a process of calculating a similarity between a character image and each recognition target character type, and obtaining a candidate character based on the calculated similarity. In order to increase the usefulness of the OCR device, the accuracy of this character identification is important. However, if the recognition result is suspicious, it is also important to reject the recognition result to inform it.
FIG. 12 is a diagram for illustrating examples of non-characters and ambiguous characters. Examples of objects to be rejected include non-characters as shown in a character example 1201 in FIG. 12 and ambiguous characters as shown in a character example 1202. Non-characters include, for example, a part of characters due to a mistake in character extraction, an image in which a plurality of characters are combined, or a mixture of disturbance factors such as dirt. An ambiguous character is, for example, a character that cannot be distinguished from 7 and 9 as in the leftmost image of the character example 1202.
If the rejection process is elaborate, there are several advantages. One is that if the result is saved with erroneously recognized characters, the entire recognized result must be manually rechecked to remain incorrect or to correct this. On the other hand, if the recognition result is suspicious and can be notified to the user, the user only has to correct that portion. Also, if the rejection can be performed with high accuracy, it is determined that there is a possibility that the previous processing may have failed, such as preprocessing, character line extraction, character extraction, etc. You can try the process again by changing the processing method and processing conditions. Thereby, recognition accuracy can be raised.

以下では、正しく文字画像中の文字を認識する率を正読率、誤って認識する率を誤読率、認識結果を棄却する率を棄却率とよぶことにする。正読率、誤読率、棄却率の和は１となる。一般に、棄却を強くしすぎると、誤読していたものを棄却するようになるだけでなく、正しく読めていたもののうちいくつかは棄却してしまうため、正読率、誤読率ともに低くなる。そのため、棄却は、正読率をなるべく落とさないように、かつ、誤読率を減少させることが望ましい。
棄却の方法について説明する。入力画像から抽出されたＮ次元ベクトルをｘとする。また、第一位候補文字ｋ１に対応する識別関数をｆｋ１とする。このとき、ｆｋ１（ｘ）は文字種ｋ１に対する類似度である。ｒ１（ｘ）＝−ｆｋ１（ｘ）とおくと、ｒ１（ｘ）は、文字種ｋ１に対する非類似度とみなすことができる。そのため、閾値ｈ１をあらかじめ定めておき、ｒ１（ｘ）＞ｈ１のとき、非類似度が高い（類似度が低い）として棄却の判断をする。これは、入力画像が非文字であったとき、第一位候補の文字に対しても類似度が低いことが想定されるため、非文字の棄却を想定したものである。
さらに、第二位候補文字ｋ２に対応する識別関数をｆｋ２とする。このとき、ｆｋ２（ｘ）は文字種ｋ２に対する類似度である。また、ｆｋ１（ｘ）≧ｆｋ２（ｘ）となる。ｒ２（ｘ）＝ｆｋ２（ｘ）−ｆｋ１（ｘ）とおくと、このｒ２（ｘ）の値が大きいほど、ｆｋ１（ｘ）とｆｋ２（ｘ）の値が近いことになる。このとき、第一位候補文字と第二位候補文字の間で識別が曖昧であることを示している。そのため、閾値ｈ２をあらかじめ定めておき、ｒ２（ｘ）＞ｈ２のとき、識別結果が曖昧であるとして棄却する。
図１３は、棄却対象となる画像の例を示すための図である。
ほかにも、特許文献４では、図１３の文字例１３０１のような文字のかすれ度合いｒ３（ｘ）や、文字例１３０２のような文字のつぶれ度合いｒ４（ｘ）を算出して、それを基に棄却判定を行う方法が記載されている。あらかじめ閾値ｈ３を定めておいて、ｒ３（ｘ）＞ｈ３となったときには、かすれが大きいため棄却する。また、あらかじめ閾値ｈ４を定めておいて、ｒ４（ｘ）＞ｈ４となったときには、つぶれが大きいため棄却する。
Hereinafter, the rate of correctly recognizing characters in a character image is referred to as a correct reading rate, the rate of erroneously recognizing is referred to as a misreading rate, and the rate of rejecting recognition results is referred to as a rejection rate. The sum of the correct reading rate, the misreading rate, and the rejection rate is 1. In general, if the rejection is made too strong, not only the misreads are rejected, but also some of the correct readings are rejected, so both the correct reading rate and the misreading rate are lowered. For this reason, it is desirable that the rejection does not decrease the correct reading rate as much as possible and reduces the misreading rate.
The method of rejection will be explained. Let an N-dimensional vector extracted from the input image be x. Also, the identification function corresponding to the first candidate character k1 is assumed to be fk1. At this time, fk1 (x) is the similarity to the character type k1. If r1 (x) = − fk1 (x), r1 (x) can be regarded as a dissimilarity with respect to the character type k1. Therefore, the threshold value h1 is set in advance, and when r1 (x)> h1, it is determined that the dissimilarity is high (similarity is low) and the rejection is determined. In this case, when the input image is non-character, it is assumed that the degree of similarity is low even for the first candidate character, and therefore non-character rejection is assumed.
Further, the discriminant function corresponding to the second candidate character k2 is assumed to be fk2. At this time, fk2 (x) is the similarity to the character type k2. Further, fk1 (x) ≧ fk2 (x). When r2 (x) = fk2 (x) −fk1 (x), the larger the value of r2 (x), the closer the values of fk1 (x) and fk2 (x) are. At this time, the identification is ambiguous between the first candidate character and the second candidate character. Therefore, the threshold value h2 is set in advance, and when r2 (x)> h2, the identification result is rejected as ambiguous.
FIG. 13 is a diagram for illustrating an example of an image to be rejected.
In addition, in Patent Document 4, a character blurring degree r3 (x) like a character example 1301 in FIG. 13 and a character crushing degree r4 (x) like a character example 1302 are calculated and used as the basis. Describes how to make a rejection decision. A threshold value h3 is set in advance, and when r3 (x)> h3, the blur is so large that it is rejected. Further, a threshold value h4 is set in advance, and when r4 (x)> h4 is satisfied, it is rejected because the collapse is large.

特開２０１０−２４４３７２号公報JP 2010-244372 A 特開平１１−５３４６６号公報Japanese Patent Laid-Open No. 11-53466 特開２００４−１７１３１６号公報JP 2004-171316 A 特願２０１１−２１２３０８号Japanese Patent Application No. 2011-212308

ＭｏｈａｍｍｅｄＣｈｅｒｉｅｔ，ＮａｗｗａｆＫｈａｒｍａ，ＣｈｅｎｇｌｉｎＬｉｕ，ａｎｄＣｈｉｎｇＳｕｅｎ．ＣｈａｒａｃｔｅｒＲｅｃｏｇｎｉｔｉｏｎＳｙｓｔｅｍｓ：ＡＧｕｉｄｅｆｏｒＳｔｕｄｅｎｔｓａｎｄＰｒａｃｔｉｔｉｏｎｅｒｓ．Ｗｉｌｅｙ−Ｉｎｔｅｒｓｃｉｅｎｃｅ，２００７．Mohammed Cheriet, Nawawa Kharma, Cheng lin Liu, and Ching Suen. Character Recognition Systems: A Guide for Students and Practitioners. Wiley-Interscience, 2007. 石井健一郎，上田修功，前田英作，村瀬洋．パターン認識．オーム社出版局．Kenichiro Ishii, Noriyoshi Ueda, Eisaku Maeda, Hiroshi Murase. Pattern recognition. Ohm Publishing Office.

文字を棄却するための指標には、上記の非文字度（非類似度）ｒ１、曖昧度ｒ２、かすれ度ｒ３、つぶれ度ｒ４のように様々な指標が考えられる。しかし、これらの指標の組み合わせ方は明らかでない。従来技術では、いずれかの基準により棄却されたものを棄却とするなど単純な方法をとるか、人手で試行錯誤しながら複数の指標を組み合わせる、などの方法がとられる。
前者の単純な方法では、すべての棄却指標を算出する必要があるため、計算コストがかかる。その上、いずれかの棄却指標で閾値を超えた場合に棄却されるため、一般に棄却が強すぎて正読率が低下する場合が想定され、高い正読率且つ低い誤読率を達成するという棄却の目的からしても、必ずしも適しているとは限らない。また、後者の人手での試行錯誤は、この指標の数が多くなると、相当コストのかかる方法であり、実現が困難である場合が想定される。
本発明は、以上の点に鑑み、高正読率、低誤読率、高速な棄却方法を低い人的コストで提供することを目的とする。
Various indices such as the non-character degree (dissimilarity) r1, the ambiguity r2, the faint degree r3, and the collapse degree r4 can be considered as indices for rejecting characters. However, how to combine these indicators is not clear. In the prior art, a simple method such as rejecting those rejected according to any of the criteria, or a method of combining a plurality of indicators by trial and error manually is employed.
In the former simple method, since it is necessary to calculate all the rejection indexes, the calculation cost is high. In addition, since it is rejected when any of the rejection indicators exceeds the threshold, it is generally assumed that the rejection is too strong and the correct reading rate is lowered, so that a high correct reading rate and a low misreading rate are achieved. Even for this purpose, it is not always suitable. Further, the latter manual trial and error is a method that requires a considerable amount of cost when the number of indicators increases, and it may be difficult to implement.
In view of the above, an object of the present invention is to provide a high correct reading rate, a low misreading rate, and a high-speed rejection method at a low human cost.

本発明の第１の解決手段によると、
入力画像から識別された文字の認識結果に対して、予め設定された棄却関数により棄却値を算出する複数の棄却値算出部と、
複数の前記棄却値算出部のいずれかひとつ又はいずれか複数により算出されたひとつ又は複数の棄却値に基づき、それぞれ、前記認識結果を棄却するかどうか判定するひとつ又は複数の棄却判定部と、
を備え、
複数の前記棄却値算出部の相関性に基づいて組み合わせた複数の前記棄却値算出部を用いて、前記棄却判定部が、複数の棄却値に基づき前記認識結果の棄却判定をして、棄却すると判定された前記認識結果を棄却することにより、棄却すると判定されない前記認識結果を記憶部に保存又は表示部に表示させることを特徴とする文字認識装置が提供される。 According to the first solution of the present invention,
A plurality of rejection value calculation units for calculating a rejection value by a preset rejection function for the recognition result of the character identified from the input image;
One or more rejection determination units for determining whether or not to reject the recognition result based on one or more rejection values calculated by any one or more of the plurality of rejection value calculation units, respectively,
With
Using the plurality of rejection value calculation units combined based on the correlation of the plurality of rejection value calculation units, the rejection determination unit makes a rejection determination of the recognition result based on a plurality of rejection values, and rejects By rejecting the determined recognition result, a character recognition device is provided, wherein the recognition result that is not determined to be rejected is stored in a storage unit or displayed on a display unit.

本発明の第２の解決手段によると、
文字認識方法であって、
入力画像から識別された文字の認識結果に対して、予め設定された棄却関数により棄却値を算出する複数の棄却値算出部を用い、
複数の前記棄却値算出部のいずれかひとつ又はいずれか複数により算出されたひとつ又は複数の棄却値に基づき、それぞれ、前記認識結果を棄却するかどうか判定するひとつ又は複数の棄却判定部を用い、
複数の前記棄却値算出部の相関性に基づいて組み合わせた複数の前記棄却値算出部を用いて、前記棄却判定部が、複数の棄却値に基づき前記認識結果の棄却判定をして、棄却すると判定された前記認識結果を棄却することにより、棄却すると判定されない前記認識結果を記憶部に保存又は表示部に表示させることを特徴とする文字認識方法が提供される。 According to the second solution of the present invention,
A character recognition method,
For a recognition result of characters identified from the input image, using a plurality of rejection value calculation units for calculating a rejection value by a preset rejection function,
Based on one or more rejection values calculated by any one or more of the plurality of rejection value calculation units, respectively, using one or more rejection determination units to determine whether to reject the recognition result,
Using the plurality of rejection value calculation units combined based on the correlation of the plurality of rejection value calculation units, the rejection determination unit makes a rejection determination of the recognition result based on a plurality of rejection values, and rejects By rejecting the determined recognition result, a character recognition method is provided, wherein the recognition result that is not determined to be rejected is stored in a storage unit or displayed on a display unit.

本発明の第３の解決手段によると、
文字認識プログラムであって、
処理部が、複数の棄却値算出部を用い、入力画像から識別された文字の認識結果に対して、予め設定された棄却関数により棄却値を算出する機能と、
処理部が、ひとつ又は複数の棄却判定部を用い、複数の前記棄却値算出部のいずれかひとつ又はいずれか複数により算出されたひとつ又は複数の棄却値に基づき、それぞれ、前記認識結果を棄却するかどうか判定する機能と、
処理部が、複数の前記棄却値算出部の相関性に基づいて組み合わせた複数の前記棄却値算出部を用いて、前記棄却判定部が、複数の棄却値に基づき前記認識結果の棄却判定をして、棄却すると判定された前記認識結果を棄却することにより、棄却すると判定されない前記認識結果を記憶部に保存又は表示部に表示させる機能と
をコンピュータに実行させるための文字認識プログラム。
According to the third solution of the present invention,
A character recognition program,
The processing unit uses a plurality of rejection value calculation units, a function of calculating a rejection value by a preset rejection function for the recognition result of the character identified from the input image,
The processing unit rejects the recognition result based on one or a plurality of rejection values calculated by any one or a plurality of the rejection value calculation units using one or a plurality of rejection determination units. A function to determine whether or not
The processing unit uses the plurality of rejection value calculation units combined based on the correlation of the plurality of rejection value calculation units, and the rejection determination unit performs rejection determination of the recognition result based on the plurality of rejection values. A character recognition program for causing a computer to execute a function of storing the recognition result that is not determined to be rejected in a storage unit or displaying it on a display unit by rejecting the recognition result determined to be rejected.

本実施例によると、高正読率、低誤読率、高速な棄却方法を低い人的コストで提供することができる。
According to the present embodiment, a high correct reading rate, a low misreading rate, and a high-speed rejection method can be provided at a low human cost.

本発明の実施例４の文字認識装置の処理を説明するフローチャートの例である。It is an example of the flowchart explaining the process of the character recognition apparatus of Example 4 of this invention. 文字認識装置の構成図の例である。It is an example of the block diagram of a character recognition apparatus. 独立性の高い２つの棄却値を説明するための図である。It is a figure for demonstrating two rejection values with high independence. 独立性の低い２つの棄却値を説明するための図である。It is a figure for demonstrating two rejection values with low independence. 本発明の関連技術の文字認識装置の処理を説明するフローチャートの例である。It is an example of the flowchart explaining the process of the character recognition apparatus of the related technology of this invention. 文字認識装置による文書電子化の流れを説明するための図である。It is a figure for demonstrating the flow of document digitization by a character recognition apparatus. 文字識別の結果を説明するための図である。It is a figure for demonstrating the result of character identification. 棄却値の例を示す図である。It is a figure which shows the example of a rejection value. 文字識別用の方式を説明するための図である。It is a figure for demonstrating the system for character identification. 文字切出処理を説明するための図である。It is a figure for demonstrating a character extraction process. 文字認識と認識結果選定処理を説明するための図である。It is a figure for demonstrating a character recognition and recognition result selection process. 非文字と曖昧文字の例を示すための図である。It is a figure for showing the example of a non-character and an ambiguous character. 棄却対象となる画像の例を示すための図である。It is a figure for showing the example of the image used as rejection object. 特徴抽出の処理の例を示すための図である。It is a figure for showing an example of processing of feature extraction. 学習用文字画像データベースの例を示すための図である。It is a figure for showing the example of the character image database for learning. 直列構成の場合の棄却領域を示す図である。It is a figure which shows the rejection area | region in the case of a serial structure. 本発明の実施例１と実施例２の文字認識装置の処理を説明するフローチャートの例である。It is an example of the flowchart explaining the process of the character recognition apparatus of Example 1 and Example 2 of this invention. 本発明の実施例３の文字認識装置の処理を説明するフローチャートの例である。It is an example of the flowchart explaining the process of the character recognition apparatus of Example 3 of this invention. 勾配特徴抽出方法についての説明図（１）である。It is explanatory drawing (1) about the gradient feature extraction method. 勾配特徴抽出方法についての説明図（２）である。It is explanatory drawing (2) about a gradient feature extraction method. 棄却関数の説明図である。It is explanatory drawing of a rejection function. 棄却値の構成処理のフローチャートである。It is a flowchart of the composition process of a rejection value.

以下、実施例を図面を用いて説明する。

１．概要

本実施形態では、その一例を挙げるならば、
文字認識装置は、
文書を光学的に走査することによって文書画像を取得する文書画像化部と、
前記入力画像からノイズや背景を除去し、二値化して二値画像を生成する手段を有する前処理部と、
前記二値画像の文書構造、図表構造を解析する手段を有するレイアウト解析部と、
前記二値画像から文字列単位の画像を抽出する手段を有する文字列抽出部と、
前記抽出された文字列画像の各々から文字単位の画像を切出す手段を有する文字切出部と、
文字切出部で切りだされた各文字単位の画像中の文字を認識する手段を有する文字識別部と、
複数の棄却値算出手段を備え、棄却能力が高い棄却値算出手段ほど、先に配置し、先の棄却値算出手段によって算出された棄却値に基づいて棄却と判定された場合には、後の棄却値算出処理を省略することで、前記認識結果を棄却するかどうか判定する手段を有する棄却判定部と、
前記認識結果と棄却判定結果に基づいて、前記各文字列画像の認識結果を選定する手段を有する認識結果選定部と、
前記認識結果に基づいて、認識の再処理を行うかどうか判断する手段を有するリトライ判定部と、
認識結果を保存したり表示装置に出力するなどの処理を行う手段を有する認識後処理部と、
を有する。

本実施形態の文字認識装置は、棄却判定部において、棄却能力の強度と棄却値算出コストに基づく棄却効率が高いほど先に配置し、先の棄却値算出手段によって算出された棄却値に基づいて棄却と判定された場合には、後の棄却値算出処理を省略することで、前記認識結果を棄却するかどうか判定することを特徴としていても良い。

本実施形態の文字認識装置は、
上述の棄却判定部において、並列に配置した複数の棄却値算出手段の各々の棄却値に基づいて、新たな棄却値を生成し、その棄却値に基づいて棄却判定を行うことを特徴としていてもよい。

本実施形態の文字認識装置は、
上述の棄却判定部において、複数の棄却値の独立性を判定する手段を有し、独立性の高い棄却値算出手段を直列に処理することを特徴としていてもよい。

本実施形態の文字認識装置は、
上述の棄却判定部において、複数の棄却値の独立性を判定する手段を有し、独立性の低い棄却値算出手段を並列に処理することを特徴としていてもよい。

本実施形態の文字認識装置は、上述の棄却判定部において、複数の棄却値の独立性を判定する手段を有し、前記独立性を判断する手段として、前記棄却値による棄却画像データベースと正読画像データベースを識別する関数を、識別誤差に基づくコスト関数により学習し、前記関数による識別誤差と、棄却値を直列に構成した場合の識別誤差を比較し、両者の誤差の差が予め定めておいた閾値以上であった場合に、独立性が低いと判定し、それ以外の場合に独立性が高いと判定することを特徴としていてもよい。
Hereinafter, examples will be described with reference to the drawings.

1. Overview

In this embodiment, if an example is given,
The character recognition device
A document imaging unit for obtaining a document image by optically scanning the document;
A pre-processing unit having means for removing noise and background from the input image and binarizing to generate a binary image;
A layout analysis unit having means for analyzing the document structure and chart structure of the binary image;
A character string extraction unit having means for extracting an image of a character string unit from the binary image;
A character cutout unit having means for cutting out character-by-character images from each of the extracted character string images;
A character identification unit having means for recognizing characters in the image of each character unit cut out by the character cutout unit;
If a rejection value calculation means having a plurality of rejection value calculation means and having a high rejection capability is arranged earlier and is determined to be rejection based on the rejection value calculated by the previous rejection value calculation means, A rejection determination unit having means for determining whether or not to reject the recognition result by omitting the rejection value calculation process;
A recognition result selection unit having means for selecting a recognition result of each character string image based on the recognition result and the rejection determination result;
A retry determination unit having means for determining whether to perform reprocessing of recognition based on the recognition result;
A post-recognition processing unit having means for storing a recognition result and outputting the result to a display device;
Have

In the character recognition device of this embodiment, in the rejection determination unit, the higher the rejection efficiency based on the strength of the rejection ability and the rejection value calculation cost, the higher the rejection efficiency, and based on the rejection value calculated by the previous rejection value calculation means When it is determined as rejection, it may be characterized by determining whether or not to reject the recognition result by omitting a subsequent rejection value calculation process.

The character recognition device of this embodiment is
In the above-described rejection determination unit, a new rejection value is generated based on each of the rejection values of the plurality of rejection value calculation means arranged in parallel, and the rejection determination is performed based on the rejection value. Good.

The character recognition device of this embodiment is
The above-described rejection determination unit may include means for determining independence of a plurality of rejection values, and the rejection value calculation means having high independence may be processed in series.

The character recognition device of this embodiment is
The rejection determination unit described above may include means for determining the independence of a plurality of rejection values, and the rejection value calculation means having low independence may be processed in parallel.

The character recognition device of the present embodiment has means for determining the independence of a plurality of rejection values in the rejection determination unit described above, and as a means for determining the independence, the rejection image database based on the rejection values and the correct reading A function for identifying the image database is learned by a cost function based on the identification error, and the identification error by the function is compared with the identification error when the rejection value is configured in series, and the difference between the two is determined in advance. It may be characterized that it is determined that the independence is low when the threshold is equal to or greater than the threshold value, and that the independence is high in other cases.

２．実施形態
2. Embodiment

棄却方法を備える文字認識装置の実施例について、図表を参照しながら説明する。本実施例の文字認識装置は、入力文書画像中の文字を検知、認識し、文字をコード化することよって、入力文書を電子化する装置である。入力文書には、一般文書の他に、例えば、帳票、明細書などがある。
図２は、本実施例の文字認識装置の一例を示す構成図である。
本実施例の文字認識装置２０１は、例えば、押印認識および帳票認識を行うものであり、入力装置２０２、表示装置２０３、イメージ取得装置２０４、通信装置２０５、演算装置（ＣＰＵ）２０６、外部記憶装置２０７を備える。外部記憶装置２０７は、正読画像データベース２１１及び棄却画像データベース２１２を含む。
入力装置２０２は、コマンド等を入力するためのキーボードやマウス等である。入力装置２０２は、演算装置（ＣＰＵ）２０６で実行されるプログラムの制御や、その他、接続機器の制御のために実行されるコマンド等を入力するための装置である。
表示装置２０３は、処理内容を適宜表示するディスプレイ等の装置である。
イメージ取得装置２０４は、スキャナなどのイメージ取得用の装置である。取得したイメージは、外部記憶装置等に記憶してもよい。
通信装置２０５は、ＰＣやサーバ等の外部機器からのデータのやりとりを行うために用いる。通信装置２０５は、外部機器からのユーザによる実行コマンドの取得や、画像やテキストなどの情報の外部機器からの取得等の目的に用いられる。また、通信装置２０５は、押印認識および帳票認識装置２０１での処理内容を外部機器に送信する等の目的にも用いられる。
演算装置（ＣＰＵ）２０６は、文書画像中の文字認識に用いる認識用辞書の生成などの処理を実行する演算装置である。
外部記憶装置２０７は、ＨＤＤ，メモリ等の外部記憶装置である。外部記憶装置２０７には、帳票画像、押印画像、押印認識用辞書などの各種データが保存されている。また、外部記憶装置には、演算装置（ＣＰＵ）２０６によって実行される処理の途中で生成されるデータ等を一時的に記憶しておくためにも用いられる。
入力装置２０２、表示装置２０３、イメージ取得装置２０４、通信装置２０５はなくてもよい。入力装置２０２が無い場合には、処理の開始は、通信装置２０５を用いて外部機器から指示するか、または、時刻指定等により自動的に行う。表示装置２０３が無い場合には、処理結果は通信装置２０５を用いて外部機器に送信するか、外部記憶装置２０７に記憶しておく。
処理を実行するモジュールの出力と入力は、外部記憶装置２０７を介して行ってもよい。すなわち、処理部１が、処理結果を処理部２に出力し、処理部２は、その処理結果を入力として受け取る場合、実際には、処理部１が処理結果を外部記憶装置２０７に出力し記憶しておき、処理部２では、外部記憶装置２０７に記憶されている処理部１の出力結果を入力として取得してもよい。 An embodiment of a character recognition device having a rejection method will be described with reference to a diagram. The character recognition apparatus according to the present embodiment is an apparatus that digitizes an input document by detecting and recognizing characters in an input document image and encoding the characters. In addition to general documents, input documents include, for example, forms and specifications.
FIG. 2 is a configuration diagram illustrating an example of a character recognition apparatus according to the present embodiment.
A character recognition device 201 of this embodiment performs, for example, stamp recognition and form recognition, and includes an input device 202, a display device 203, an image acquisition device 204, a communication device 205, a calculation device (CPU) 206, and an external storage device. 207. The external storage device 207 includes a correctly read image database 211 and a reject image database 212.
The input device 202 is a keyboard, a mouse, or the like for inputting commands and the like. The input device 202 is a device for inputting a command executed for control of a program executed by the arithmetic unit (CPU) 206 and other control of connected devices.
The display device 203 is a device such as a display that appropriately displays processing contents.
The image acquisition device 204 is an image acquisition device such as a scanner. The acquired image may be stored in an external storage device or the like.
The communication device 205 is used for exchanging data from an external device such as a PC or a server. The communication device 205 is used for purposes such as acquisition of an execution command by a user from an external device and acquisition of information such as images and text from an external device. The communication device 205 is also used for purposes such as sending the processing content of the stamp recognition and form recognition device 201 to an external device.
An arithmetic unit (CPU) 206 is an arithmetic unit that executes processing such as generation of a recognition dictionary used for character recognition in a document image.
The external storage device 207 is an external storage device such as an HDD or a memory. The external storage device 207 stores various data such as a form image, a seal image, and a seal recognition dictionary. The external storage device is also used for temporarily storing data generated during processing executed by the arithmetic unit (CPU) 206.
The input device 202, the display device 203, the image acquisition device 204, and the communication device 205 may be omitted. When there is no input device 202, the process is started by an instruction from an external device using the communication device 205, or automatically by time designation or the like. If there is no display device 203, the processing result is transmitted to an external device using the communication device 205 or stored in the external storage device 207.
The output and input of the module that executes processing may be performed via the external storage device 207. That is, when the processing unit 1 outputs a processing result to the processing unit 2 and the processing unit 2 receives the processing result as an input, the processing unit 1 actually outputs the processing result to the external storage device 207 and stores it. In addition, the processing unit 2 may acquire the output result of the processing unit 1 stored in the external storage device 207 as an input.

次に、本実施例における文字認識装置２０１によって実施される処理の説明に移る。
以下では、まず、本発明の関連技術による文字認識装置の処理を図５を用いて説明する。なお、その後、本実施例の処理を図１を用いて説明する。
まず、本発明の関連技術による文字認識装置の処理について説明する。
図５に、文字認識装置による文書電子化の流れの典型的な例を示す。
文書の画像化（スキャン）１０１では、文字認識装置２０１のＣＰＵ２０６は、スキャナ等により文書を読込み、画像化する。このときに、背景印刷がカラーで印字されている場合などは、ＣＰＵ２０６は、特定の色の印字を光学的に除去するカラードロップアウト等の処理を行う場合もある。入力文書は、一般文書、帳票類、また、初めから文字認識装置で処理する目的で作成されているマークシート用紙等がある。
前処理１０２では、ＣＰＵ２０６は、文書画像のカラー画像の二値化（白黒化）やノイズ除去、背景印刷などの不要部分の除去等の処理を行う。前処理後の二値画像は、例えば、図６の帳票画像６０１のようになる。
レイアウト解析１０３で、ＣＰＵ２０６は、二値画像のレイアウト解析を行い、図表の位置、段落構造、項目とデータの位置などを認識する。項目とデータの位置については、ＣＰＵ２０６は、例えば、図６の参照番号６０２の場合には、表構造の関係から参照番号６０２の欄の上にある支払金額が項目名で、その下の７，８９０，１２３が記載されている枠がデータ枠である、などと解析する。論文や技術報告書の場合などには、文書の構造と位置関係から、タイトル、著者、要旨、ページ番号などが書かれている位置を認識するなどのメタデータ抽出を行う場合もある。
文字列抽出１０４では、ＣＰＵ２０６は、文書画像中から文字列単位の画像を抽出する。ＣＰＵ２０６は、一般文書の場合には１行分の画像、表の場合には枠内の画像、など文字列単位の画像を抽出する。例えば、図６の参照番号６０２のように、表の枠内の画像を抽出する。
文字切出１０５、文字認識５０３、認識結果選定１１４、の一連の処理では、抽出した各文字列画像中の文字を認識する。ここでの処理は、図６の参照番号６０３のように、文字列画像を文字単位に分割して、各々の文字画像中の文字を認識することにより、最終的に参照番号６０４のように文字コード等の計算機が扱えるコードに変換する。
上記の文字列抽出１０４から後の、文字切出１０５から認識結果選定１１４までの処理を、例を挙げて説明する。 Next, a description will be given of processing performed by the character recognition device 201 in the present embodiment.
Below, the process of the character recognition apparatus by the related technique of this invention is demonstrated using FIG. Thereafter, the processing of this embodiment will be described with reference to FIG.
First, processing of the character recognition device according to the related art of the present invention will be described.
FIG. 5 shows a typical example of the flow of document digitization by the character recognition device.
In document imaging (scanning) 101, the CPU 206 of the character recognition device 201 reads a document with a scanner or the like and converts it into an image. At this time, when background printing is printed in color, the CPU 206 may perform processing such as color dropout for optically removing printing of a specific color. Input documents include general documents, forms, and mark sheet paper created for the purpose of processing by a character recognition device from the beginning.
In the pre-processing 102, the CPU 206 performs processing such as binarization (monochromeization) of a color image of a document image, noise removal, and removal of unnecessary portions such as background printing. The binary image after the preprocessing is, for example, a form image 601 in FIG.
In the layout analysis 103, the CPU 206 performs a layout analysis of the binary image, and recognizes the position of the chart, the paragraph structure, the position of items and data, and the like. Regarding the position of the item and the data, for example, in the case of reference number 602 in FIG. 6, the CPU 206 determines that the payment amount above the column of the reference number 602 is the item name due to the table structure, and It is analyzed that the frame in which 890 and 123 are described is a data frame. In the case of a thesis or technical report, metadata extraction such as recognizing the position where the title, author, abstract, page number, etc. are written may be performed from the structure and positional relationship of the document.
In the character string extraction 104, the CPU 206 extracts a character string unit image from the document image. The CPU 206 extracts an image in units of character strings such as an image for one line in the case of a general document and an image in a frame in the case of a table. For example, an image within a table frame is extracted as indicated by reference numeral 602 in FIG.
In a series of processes of character extraction 105, character recognition 503, and recognition result selection 114, characters in each extracted character string image are recognized. In this process, as shown by reference number 603 in FIG. 6, the character string image is divided into character units, and characters in each character image are recognized. Convert to a code that can be handled by a computer.
The processing from the character extraction 105 to the recognition result selection 114 after the character string extraction 104 will be described with an example.

図１０は、文字切出処理を説明するための図である。
まず、文字切出１０５について説明する。例えば、文字列抽出によって、図１０の画像１００１のような文字列画像が得られたとする。まず、文字切出１０５の処理では、ＣＰＵ２０６は、文字線同士が交差する点や、文字線が途切れた点などを基に、切断候補点を作成する。図１０の画像１００２が、切断候補点による分割を示す。この例では、４つの画像に分割されている。この各分割画像と、隣接した複数個の画像の合成が、文字画像候補となる。図１０の画像１００３の例では、左から１つ目と２つ目の画像、左から２つ目と３つ目の画像も、それぞれ文字画像候補として、６つの文字画像候補を得ている。左端の点から右端の点に左から右に至る各ルートが、文字列１００１の切出し候補となる。
図７は、文字識別の結果を説明するための図である。
次に、文字認識５０３では、ＣＰＵ２０６は、候補となっている個々の文字画像中の文字を認識する。ここでは、例えば、図７のように各文字画像に対する正解候補文字（１位候補文字種）と、その正解候補文字に対する類似度（尤度、信頼度）を得る。
次に、ＣＰＵ２０６は、文字認識５０３で得た正解候補文字と類似度を基に、認識結果の候補となるネットワークを図１１の参照番号１１０１のように作成する。画像を除いたものが、参照番号１１０２である。左端の点から右端の点に左から右に至る各ルートが認識結果候補となる。また、ここでは、ＣＰＵ２０６は、文字画像の認識結果の信頼性が低いと判断した場合、棄却処理を行い、認識結果に対して棄却フラグを立てるなどして、認識結果の信頼性が低いことを後の処理、またはユーザに知らせる。
この文字認識５０３の内部の処理について説明する。ここでは、ＣＰＵ２０６は、個々の文字画像中に描かれている文字を認識する。また、認識結果の棄却処理も行う。
まず、文字識別１０６について説明する。ここでは、まず、ＣＰＵ２０６は、文字画像をベクトル値に変換する特徴抽出処理を行う。ベクトル値の次元数をＮとすると、特徴抽出処理によって、１つの文字画像はＮ次元ベクトルとして表現される。文字画像をベクトル値として表現することにより、文字画像の分布を統計的に扱うことが可能となる。 FIG. 10 is a diagram for explaining the character cutting process.
First, the character extraction 105 will be described. For example, assume that a character string image such as an image 1001 in FIG. 10 is obtained by character string extraction. First, in the process of character cut-out 105, the CPU 206 creates cutting candidate points based on points where the character lines intersect each other, points where the character lines are interrupted, or the like. An image 1002 in FIG. 10 shows division by cutting candidate points. In this example, it is divided into four images. The combination of each divided image and a plurality of adjacent images becomes a character image candidate. In the example of the image 1003 in FIG. 10, the first and second images from the left and the second and third images from the left also obtain six character image candidates as character image candidates, respectively. Each route from the left end point to the right end point from the left to the right is a cutout candidate of the character string 1001.
FIG. 7 is a diagram for explaining the result of character identification.
Next, in character recognition 503, the CPU 206 recognizes characters in individual character images that are candidates. Here, for example, as shown in FIG. 7, the correct candidate character (first candidate character type) for each character image and the similarity (likelihood and reliability) for the correct candidate character are obtained.
Next, the CPU 206 creates a network as a recognition result candidate as indicated by reference numeral 1101 in FIG. 11 based on the correct candidate character obtained by the character recognition 503 and the similarity. A reference numeral 1102 is obtained by removing the image. Each route from the left end point to the right end point from the left to the right is a recognition result candidate. Here, if the CPU 206 determines that the reliability of the recognition result of the character image is low, the CPU 206 performs a rejection process and sets a rejection flag for the recognition result to confirm that the reliability of the recognition result is low. Inform later user or user.
The internal processing of this character recognition 503 will be described. Here, the CPU 206 recognizes characters drawn in individual character images. In addition, the recognition result is rejected.
First, the character identification 106 will be described. Here, first, the CPU 206 performs a feature extraction process for converting a character image into a vector value. If the number of dimensions of the vector value is N, one character image is expressed as an N-dimensional vector by the feature extraction process. By expressing the character image as a vector value, the distribution of the character image can be statistically handled.

図１４は、特徴抽出の処理の例を示すための図である。
特徴抽出について、図１４を用いて説明する。まず、ＣＰＵ２０６は、文字画像の正規化を行う。一般に入力文字画像は、サイズが異なる。そのため、正規化では、文字画像のサイズを揃えることによって、後の処理で統一的に扱えるようにする。また、入力文字画像は、筆記具、筆記者、フォントなどの違いによって同じ字種の文字であっても字形が大きくことなる場合がある。このことは、認識精度低下の原因となる。そこで、正規化処理では、入力文字画像のサイズの変形と字形の変形によって、サイズの統一や同一字種間での字形のばらつきを低減する。図１４の画像１４０１が入力文字画像の例で、画像１４０２は６４×６４のサイズに変形した画像である。正規化処理に関しては、様々な方法があり、例えば、非特許文献１に詳しく記載されている。
次に、正規化により生成された正規化画像をベクトル値に変換する特徴抽出を行う。特徴抽出にも様々な方法があり、例えば、非特許文献１に詳しく記載されている。ここでは、最も簡単な画素特徴抽出の例を用いて説明する。画素特徴抽出では、正規化画像を小領域に分割する。図１４の例では、正規化画像１４０２を６４個の小領域に分割している。分割の様子を画像１４０３に示した。次に、各小領域の黒画素の個数を要素とするベクトル値に変換する。小領域が６４個あるため、画像１４０４のように６４次元のベクトル値が生成される。
広く用いられている特徴抽出の方法のもう一つ例として、勾配特徴抽出方法について説明する。 FIG. 14 is a diagram illustrating an example of feature extraction processing.
Feature extraction will be described with reference to FIG. First, the CPU 206 normalizes the character image. In general, input character images have different sizes. Therefore, in normalization, the sizes of character images are made uniform so that they can be handled uniformly in later processing. Also, the input character image may have a large character shape even if it is a character of the same character type due to differences in writing tools, writers, fonts, and the like. This causes a reduction in recognition accuracy. Therefore, in the normalization process, the size of the input character image and the deformation of the character shape are reduced to reduce the variation in character shape among the same character type. An image 1401 in FIG. 14 is an example of an input character image, and an image 1402 is an image deformed to a size of 64 × 64. There are various methods for normalization processing, which are described in detail in Non-Patent Document 1, for example.
Next, feature extraction is performed to convert the normalized image generated by normalization into a vector value. There are various methods for feature extraction, which are described in detail in Non-Patent Document 1, for example. Here, description will be made using the simplest example of pixel feature extraction. In pixel feature extraction, the normalized image is divided into small regions. In the example of FIG. 14, the normalized image 1402 is divided into 64 small regions. A state of division is shown in an image 1403. Next, it is converted into a vector value having the number of black pixels in each small region as an element. Since there are 64 small regions, a 64-dimensional vector value is generated as in the image 1404.
As another example of a widely used feature extraction method, a gradient feature extraction method will be described.

図１９及び図２０は、勾配特徴抽出方法についての説明図（１）及び（２）である。
ここでは、正規化により生成される正規化画像には、１画素分の白縁をつけているとする。また、画素点（ｉ、ｊ）の正規化画像の画素値をｆ（ｉ、ｊ）とおく。このとき、ＣＰＵ２０６は、正規化画像の各画素点（ｉ、ｊ）において、勾配ベクトルｇ＝（ｇｘ、ｇｙ）を以下のように計算する。これは、図１９に示すフィルタをかけることに相当する。
ｇｘ（ｉ、ｊ）＝｛ｆ（ｉ＋１、ｊ＋１）＋２ｆ（ｉ、ｊ＋１）＋ｆ（ｉ−１、ｊ＋１）−ｆ（ｉ＋１、ｊ−１）−２ｆ（ｉ、ｊ−１）−ｆ（ｉ−１、ｊ−１）｝／８
ｇｙ（ｉ、ｊ）＝｛ｆ（ｉ＋１、ｊ＋１）＋２ｆ（ｉ＋１、ｊ）＋ｆ（ｉ＋１、ｊ−１）−ｆ（ｉ−１、ｊ＋１）−２ｆ（ｉ−１、ｊ）−ｆ（ｉ−１、ｊ−１）｝／８
ただし、上記の式において、画素点（ｉ、ｊ）が画像の縁にある場合には、その周囲の画素点が画像の領域外となる場合がある。そのときは、画像外の領域におけるｆの値は０と考えて、上記の式を計算する。これによって、各画素点（ｉ、ｊ）において、画素値の勾配ベクトルｇ＝（ｇｘ、ｇｙ）が得られる。
次に、ＣＰＵ２０６は、ベクトルｇ（ｉ、ｊ）を図２０の参照番号２００１に示す４５度間隔の８方向ｇ０（ｉ、ｊ）、ｇ１（ｉ、ｊ）、…、ｇ７（ｉ、ｊ）に分解する。分解は、ｇ（ｉ、ｊ）の方向に近接する２つの方向に分解する。但し、ｇ（ｉ、ｊ）の方向が８方向のいずれかに完全に一致する場合には、分解の必要はなく、仮に方向０に一致した場合には、ｇ０（ｉ、ｊ）＝ベクトルｇ（ｉ、ｊ）の長さ、とし、他の方向については、ｇ１（ｉ、ｊ）＝…＝ｇ７（ｉ、ｊ）＝０とおく。図２０の参照番号２００２の図によって、分解の方法を説明する。ＣＰＵ２０６は、ｇ（ｉ、ｊ）が参照番号２００２に示すように、方向０と方向１の間に存在する場合、ベクトルｇ（ｉ、ｊ）を方向０と方向１の成分に分解する。このとき、方向０の成分の長さをｐ０、方向１の成分の長さをｐ１とすると、ｇ０（ｉ、ｊ）＝ｐ０、ｇ１（ｉ、ｊ）＝ｐ１、ｐ２（ｉ、ｊ）＝…＝ｐ７（ｉ、ｊ）＝０とする。
以上のようにして、８つの方向画像ｇ０（ｉ、ｊ）、…、ｇ７（ｉ、ｊ）が生成される。文字の変形に対する頑健性を高めるために、この画像にガウスフィルタによるぼかしを施す場合もある。その場合には、ぼかしをかけた方向画像をあたらめて、ｇ０（ｉ、ｊ）、…、ｇ７（ｉ、ｊ）とおく。次に、ＣＰＵ２０６は、各方向画像ｇｉ（ｘ、ｙ）を小領域に分割し、各小領域の画素値の合計値を要素とするベクトルを生成する。いま、各方向画像を６４の小領域に分割したとすると、各方向画像から６４個の値が得られる。これが、各方向について得られるため、８方向で合計６４×８＝５１２個の値が得られる。これをベクトルの成分として、５１２次元のベクトルが生成される。
以上が、勾配特徴抽出方法の説明である。 19 and 20 are explanatory diagrams (1) and (2) for the gradient feature extraction method.
Here, it is assumed that the normalized image generated by normalization has a white edge for one pixel. Further, the pixel value of the normalized image at the pixel point (i, j) is set to f (i, j). At this time, the CPU 206 calculates the gradient vector g = (gx, gy) at each pixel point (i, j) of the normalized image as follows. This corresponds to applying the filter shown in FIG.
gx (i, j) = {f (i + 1, j + 1) + 2f (i, j + 1) + f (i-1, j + 1) -f (i + 1, j-1) -2f (i, j-1) -f (i -1, j-1)} / 8
gy (i, j) = {f (i + 1, j + 1) + 2f (i + 1, j) + f (i + 1, j-1) -f (i-1, j + 1) -2f (i-1, j) -f (i -1, j-1)} / 8
However, in the above formula, when the pixel point (i, j) is at the edge of the image, the surrounding pixel points may be outside the image area. At that time, the value of f in the area outside the image is considered to be 0, and the above formula is calculated. As a result, a gradient vector g = (gx, gy) of pixel values is obtained at each pixel point (i, j).
Next, the CPU 206 converts the vector g (i, j) into eight directions g0 (i, j), g1 (i, j),..., G7 (i, j) at intervals of 45 degrees indicated by reference numeral 2001 in FIG. Disassembled into The decomposition is performed in two directions close to the direction of g (i, j). However, if the direction of g (i, j) completely matches any of the eight directions, there is no need for decomposition, and if it matches the direction 0, g0 (i, j) = vector g It is assumed that the length is (i, j), and g1 (i, j) =... = G7 (i, j) = 0 for the other directions. The decomposition method will be described with reference to the reference numeral 2002 in FIG. When g (i, j) exists between direction 0 and direction 1 as indicated by reference numeral 2002, the CPU 206 decomposes the vector g (i, j) into components of direction 0 and direction 1. At this time, assuming that the length of the component in the direction 0 is p0 and the length of the component in the direction 1 is p1, g0 (i, j) = p0, g1 (i, j) = p1, p2 (i, j) = ... = p7 (i, j) = 0.
As described above, eight direction images g0 (i, j),..., G7 (i, j) are generated. In order to improve robustness against deformation of characters, the image may be blurred by a Gaussian filter. In this case, the blurred direction images are collected and set as g0 (i, j),..., G7 (i, j). Next, the CPU 206 divides each direction image gi (x, y) into small areas, and generates a vector having the total value of the pixel values of each small area as an element. Now, assuming that each direction image is divided into 64 small regions, 64 values are obtained from each direction image. Since this is obtained for each direction, a total of 64 × 8 = 512 values are obtained in 8 directions. Using this as a vector component, a 512-dimensional vector is generated.
The above is the description of the gradient feature extraction method.

以上のようにして、ＣＰＵ２０６は、文字画像をベクトル値に変換する。以下では、特徴抽出によって生成されるベクトル値の次元数をＮとする。これによって、１つ１つの文字画像は、Ｎ次元空間上の点として表現され、同一文字種は近い領域に分布することになる。その様子を次の図９に模式的に示した。
図９は、文字識別用の方式を説明するための図である。丸、三角、四角がそれぞれ、文字種Ａ，文字種Ｂ、文字種Ｃに対応する各文字画像から抽出されたＮ次元ベクトル点を表している。例えば、一つ一つの○は、文字種Ａの異なる画像から抽出されたベクトルを表している。 As described above, the CPU 206 converts the character image into a vector value. In the following, it is assumed that N is the number of dimensions of the vector value generated by feature extraction. Thus, each character image is expressed as a point on the N-dimensional space, and the same character type is distributed in a close region. The situation is schematically shown in FIG.
FIG. 9 is a diagram for explaining a system for character identification. Circles, triangles, and squares represent N-dimensional vector points extracted from character images corresponding to character type A, character type B, and character type C, respectively. For example, each circle represents a vector extracted from images of different character types A.

次に、ＣＰＵ２０６は、予め作成しておいた文字識別用辞書を参照し、文字画像から抽出されたベクトル値に基づいて、文字画像中に描かれている文字を識別する。
ここで、まず、文字識別辞書について説明しておく。文字識別用辞書には、例えば、各識別対象文字種ｋに対して、Ｎ次元ベクトルを引数にとり、実数値を値にとる識別関数ｆｋ（ｘ）が保存されている。識別関数ｆｋ（ｘ）は、文字種ｋが描かれている文字画像から生成されるＮ次元ベクトルｘに対しては大きい値を、その他の字種が描かれている文字画像から生成されるＮ次元ベクトルｘに対しては小さい値をとるように、予め、学習によって生成しておく。識別関数ｆｋ（ｘ）の値は、ベクトルｘの字種ｋに対する類似度、尤度などと呼ばれる。例えば、数字を対象とした認識の場合には、０〜９の１０字種に対応して、１０個の識別関数ｆ０（ｘ）、ｆ１（ｘ）、…、ｆ９（ｘ）が存在する。
ＣＰＵ２０６は、この識別関数を、例えば、文字画像と文字ラベルから成る学習用文字画像データベースを用いて作成することができる。
図１５は、学習用文字画像データベースの例を示すための図である。図示のように、文字ラベルは、文字画像中に描かれている文字を示すコード化されている正解ラベルである。学習用文字画像データベースは、例えば、指定の枠内に指定の文字を人に書いてもらうなどとして、文字画像を収集して作成することができる。ＣＰＵ２０６は、この学習用文字画像データベースに含まれている各画像を上記と同様の方法によってＮ次元ベクトルに変換する。ここで、ＣＰＵ２０６は、これらのＮ次元ベクトルと正解ラベルに基づいて、識別関数ｆｋ（ｘ）を字種ｋに対応するＮ次元ベクトルに対しては大きな値を、それ以外の字種に対応するＮ次元ベクトルに対しては小さな値をとるように学習により生成する。識別関数の学習方法には、例えば、ＳＶＭ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ），ニューラルネットワーク、ガウスモデル、ＬＶＱ（ＬｅａｒｎｉｎｇＶｅｃｔｏｒＱｕａｎｔｉｚａｔｉｏｎ）等の様々な方法を用いることができる。
文字の識別では、ＣＰＵ２０６は、文字画像から抽出したＮ次元ベクトルｘを用いて、各字種の識別関数ｆｋ（ｘ）の値を計算する。識別関数ｆｋ（ｘ）の値は、字種ｋに対する類似度であるため、ｆｋ（ｘ）の値が最も大きい字種ｋが認識結果の第一位候補となる。同じように、二番目に値が大きい識別関数に対する字種ｋが認識結果の第二候補となる。このようにして第ｎ候補まで認識結果が得られる。例えば、図６の文字切出６０３によって切出した文字画像の認識は、図７のようになる。以上により、図６の参照番号６０４のように認識結果が得られ、計算機が扱える文字コードなどのコードに変換される。
以上が文字識別１０６の説明である。 Next, the CPU 206 refers to a character identification dictionary created in advance, and identifies a character drawn in the character image based on a vector value extracted from the character image.
Here, first, the character identification dictionary will be described. The character identification dictionary stores, for example, an identification function fk (x) that takes an N-dimensional vector as an argument and takes a real value as a value for each character type k to be identified. The identification function fk (x) has a large value for an N-dimensional vector x generated from a character image in which the character type k is drawn, and an N-dimensional value generated from a character image in which other character types are drawn. The vector x is generated by learning in advance so as to take a small value. The value of the discriminant function fk (x) is called the similarity, likelihood, etc., of the vector x to the character type k. For example, in the case of recognition for numbers, there are ten identification functions f0 (x), f1 (x),..., F9 (x) corresponding to 10 character types of 0-9.
The CPU 206 can create this identification function using, for example, a learning character image database including character images and character labels.
FIG. 15 is a diagram illustrating an example of a learning character image database. As shown in the figure, the character label is a coded correct label indicating a character drawn in the character image. The learning character image database can be created by collecting character images, for example, by letting a person write a designated character in a designated frame. The CPU 206 converts each image included in the learning character image database into an N-dimensional vector by the same method as described above. Here, based on the N-dimensional vector and the correct answer label, the CPU 206 sets the discrimination function fk (x) to a large value for the N-dimensional vector corresponding to the character type k and to other character types. The N-dimensional vector is generated by learning so as to take a small value. Various methods such as SVM (Support Vector Machine), neural network, Gaussian model, and LVQ (Learning Vector Quantization) can be used as the learning method of the discriminant function.
In character identification, the CPU 206 calculates the value of the identification function fk (x) for each character type using the N-dimensional vector x extracted from the character image. Since the value of the discriminant function fk (x) is the similarity to the character type k, the character type k having the largest value of fk (x) is the first candidate for the recognition result. Similarly, the character type k for the discriminant function having the second largest value is the second candidate for the recognition result. In this way, recognition results are obtained up to the nth candidate. For example, the recognition of the character image cut out by the character cutout 603 in FIG. 6 is as shown in FIG. As described above, a recognition result is obtained as indicated by reference numeral 604 in FIG. 6 and converted into a code such as a character code that can be handled by the computer.
The above is the description of the character identification 106.

上記で説明した文字識別は、文字画像と各認識対象字種の類似度を計算し、それに基づいて、候補文字を得る処理である。ＯＣＲ装置の有用性を高めるためには、この文字識別の精度が重要である。しかし、認識結果が疑わしい場合には、それを知らせる認識結果の棄却処理も重要である。
図１２は、非文字と曖昧文字の例を示すための図である。棄却の対象となるものには、たとえば、図１２の参照番号１２０１に示すような非文字や参照番号１２０２に示すような曖昧文字がある。非文字は、たとえば、文字切出のミスによる文字の一部や複数文字が合わさった画像、汚れなどの外乱要因が混入したものなどがある。曖昧文字は、たとえば、参照番号１２０２の左端の画像のように７と９の区別がつかないものなどがある。
棄却処理が精緻であれば、いくつかの利点がある。ひとつは、もし、誤って文字を認識したまま結果が保存されると、誤ったままにするか、これを修正するためには、全認識結果を人手によって再チェックしなければならない。これに対して、認識結果が疑わしい場合に、これをユーザに知らせることができれば、ユーザはその部分のみ修正すればよい。また、棄却を精度良く行うことができれば、その要因として、前処理、文字行抽出、文字切出など、前の処理に失敗している可能性があると判断して、前のいずれかの処理から処理方法や処理条件などを変えて、再度、処理を試すことができる。これにより、認識精度を高めることができる。
以下では、正しく文字画像中の文字を認識する率を正読率、誤って認識する率を誤読率、認識結果を棄却する率を棄却率とよぶことにする。一般に、棄却を強くしすぎると、誤読していたものを棄却するようになるだけでなく、正しく読めていたもののうちいくつかは棄却してしまうため、正読率、誤読率ともに低くなる。そのため、棄却は、正読率をなるべく落とさないように、かつ、誤読率を減少させることが望ましい。
以下では棄却判定部の処理である非文字棄却５０１、曖昧文字棄却５０２について説明する。
非文字棄却５０１について説明する。入力文字画像から抽出されたＮ次元ベクトルをｘとする。また、第一位候補文字ｋ１に対応する識別関数をｆｋ１とする。このとき、ｆｋ１（ｘ）は文字種ｋ１に対する類似度である。ｒ１（ｘ）＝−ｆｋ１（ｘ）とおくと、ｒ１（ｘ）は、文字種ｋ１に対する非類似度とみなすことができる。そのため、ＣＰＵ２０６は、閾値ｈ１をあらかじめ定めておき、ｒ１（ｘ）＞ｈ１のとき、非類似度が高い（類似度が低い）として棄却の判断をする。これは、入力画像が非文字であったとき、第一位候補の文字に対しても類似度が低いことが想定されるため、非文字の棄却を想定したものである。
次に、曖昧文字棄却５０２について説明する。第二位候補文字ｋ２に対応する識別関数をｆｋ２とする。このとき、ｆｋ２（ｘ）は文字種ｋ２に対する類似度である。また、ｆｋ１（ｘ）≧ｆｋ２（ｘ）となる。ｒ２（ｘ）＝ｆｋ２（ｘ）−ｆｋ１（ｘ）とおくと、このｒ２（ｘ）の値が大きいほど、ｆｋ１（ｘ）とｆｋ２（ｘ）の値が近いことになる。このとき、第一位候補文字と第二位候補文字の間で識別が曖昧であることを示している。そのため、ＣＰＵ２０６は、閾値ｈ２をあらかじめ定めておき、ｒ２（ｘ）＞ｈ２のとき、識別結果が曖昧であるとして棄却する。この処理は、非文字棄却５０１において、すでに棄却判定がされている場合には、スキップしてもよい。 The character identification described above is a process of calculating a similarity between a character image and each recognition target character type, and obtaining a candidate character based on the calculated similarity. In order to increase the usefulness of the OCR device, the accuracy of this character identification is important. However, if the recognition result is suspicious, it is also important to reject the recognition result to inform it.
FIG. 12 is a diagram for illustrating examples of non-characters and ambiguous characters. Examples of objects to be rejected include non-characters as indicated by reference number 1201 in FIG. 12 and ambiguous characters as indicated by reference number 1202. Non-characters include, for example, a part of characters due to a mistake in character extraction, an image in which a plurality of characters are combined, or a mixture of disturbance factors such as dirt. The ambiguous character includes, for example, an indistinguishable 7 and 9 such as the image at the left end of the reference number 1202.
If the rejection process is elaborate, there are several advantages. One is that if the result is saved with erroneously recognized characters, the entire recognized result must be manually rechecked to remain incorrect or to correct this. On the other hand, if the recognition result is suspicious and can be notified to the user, the user only has to correct that portion. Also, if the rejection can be performed with high accuracy, it is determined that there is a possibility that the previous processing may have failed, such as preprocessing, character line extraction, character extraction, etc. You can try the process again by changing the processing method and processing conditions. Thereby, recognition accuracy can be raised.
Hereinafter, the rate of correctly recognizing characters in a character image is referred to as a correct reading rate, the rate of erroneously recognizing is referred to as a misreading rate, and the rate of rejecting recognition results is referred to as a rejection rate. In general, if the rejection is made too strong, not only the misreads are rejected, but also some of the correct readings are rejected, so both the correct reading rate and the misreading rate are lowered. For this reason, it is desirable that the rejection does not decrease the correct reading rate as much as possible and reduces the misreading rate.
Below, the non-character rejection 501 and the ambiguous character rejection 502 which are the processes of a rejection determination part are demonstrated.
The non-character rejection 501 will be described. Let an N-dimensional vector extracted from the input character image be x. Also, the identification function corresponding to the first candidate character k1 is assumed to be fk1. At this time, fk1 (x) is the similarity to the character type k1. If r1 (x) = − fk1 (x), r1 (x) can be regarded as a dissimilarity with respect to the character type k1. For this reason, the CPU 206 determines a threshold value h1 in advance, and when r1 (x)> h1, determines that the dissimilarity is high (similarity is low) and is rejected. In this case, when the input image is non-character, it is assumed that the degree of similarity is low even for the first candidate character, and therefore non-character rejection is assumed.
Next, the ambiguous character rejection 502 will be described. The identification function corresponding to the second candidate character k2 is assumed to be fk2. At this time, fk2 (x) is the similarity to the character type k2. Further, fk1 (x) ≧ fk2 (x). When r2 (x) = fk2 (x) −fk1 (x), the larger the value of r2 (x), the closer the values of fk1 (x) and fk2 (x) are. At this time, the identification is ambiguous between the first candidate character and the second candidate character. Therefore, the CPU 206 determines a threshold value h2 in advance, and rejects that the identification result is ambiguous when r2 (x)> h2. This process may be skipped if the rejection determination has already been made in the non-character rejection 501.

以上が、文字認識５０３での処理の説明である。この処理を各文字画像に対して行う。
認識結果選定１１４では、ＣＰＵ２０６は、単語辞書等を参照し、各文字に対する認識の類似度（信頼度）を総合的に判断しながら、認識結果候補の中から、最終的な認識結果を選定する。単語辞書は、例えば、住所認識をしている場合には、予め、住所のリストを保存した辞書等とすることができる。一般文書の認識の場合には、単語などとなる。
以上が、文字切出１０５から認識結果選定１１４までの処理である。この処理を各文字列画像に対して行う。 The above is the description of the processing in the character recognition 503. This process is performed for each character image.
In the recognition result selection 114, the CPU 206 refers to a word dictionary or the like, and selects a final recognition result from recognition result candidates while comprehensively determining the recognition similarity (reliability) for each character. . For example, when the address is recognized, the word dictionary can be a dictionary that stores a list of addresses in advance. In the case of recognition of a general document, it becomes a word or the like.
The above is the processing from character extraction 105 to recognition result selection 114. This process is performed on each character string image.

次に、リトライ判定１１５では、ＣＰＵ２０６は、処理を変えて認識の再処理を行うかどうか判断する。再処理は、例えば、文書画像全体を対象とする場合もあるし、文字列画像単位、文字画像単位の場合もある。例えば、文字列認識結果に類似度（尤度、信頼度）が低い文字が存在する場合、単語辞書に合致する結果が得られなかった場合、読みとれなかった文字が存在する場合、などに、ＣＰＵ２０６は、再処理を行う。再処理を行う場合には、ＣＰＵ２０６は、これ以前のいずれかの処理から処理方式を変える、処理条件を変える、などして、再度認識を試す。例えば、前処理１０２の二値化やノイズ除去の方式を変えるなどである。最後に、認識後処理１１６では、ＣＰＵ２０６は、認識結果を記憶装置等に保存する、ディスプレイに表示する、などの処理を行う。
以上が本発明の関連技術による文字認識装置の処理の流れである。 Next, in the retry determination 115, the CPU 206 determines whether to change the process and reprocess the recognition. For example, the reprocessing may be performed on the entire document image, or may be performed in character string image units or character image units. For example, when there is a character with low similarity (likelihood or reliability) in the character string recognition result, when a result matching the word dictionary is not obtained, or when there is a character that cannot be read, the CPU 206 Do reprocessing. In the case of performing reprocessing, the CPU 206 tries recognition again by changing the processing method or changing processing conditions from any of the previous processing. For example, the binarization of the preprocessing 102 or the noise removal method is changed. Finally, in post-recognition processing 116, the CPU 206 performs processing such as saving the recognition result in a storage device or displaying the result on a display.
The above is the processing flow of the character recognition apparatus according to the related art of the present invention.

図１３は、棄却対象となる画像の例を示すための図である。棄却指標には、上記のｒ１、ｒ２のほかにも、ＣＰＵ２０６は、図１３の参照番号１３０１のような文字のかすれ度合いｒ３（ｘ）や、参照番号１３０２のような文字のつぶれ度合いｒ４（ｘ）を算出して、それを基に棄却判定を行う方法がある。あらかじめ閾値ｈ３を定めておいて、ｒ３（ｘ）＞ｈ３となったときには、かすれが大きいため棄却する。また、あらかじめ閾値ｈ４を定めておいて、ｒ４（ｘ）＞ｈ４となったときには、つぶれが大きいため棄却する。また、他にも、文字画像の重心位置や文字線の線幅の平均値なども使うことができる。例えば、重心位置の場合には、文字識別結果が８であるのに重心位置が中心から大きくずれている場合は、棄却する、などの判定を行う。
ここで、かすれ度合いｒ３（ｘ）、つぶれ度合いｒ４（ｘ）の例を挙げる。ただし、上記では、ｘは特徴抽出により抽出されるベクトルとしたが、ここでは、ｘは正規化画像であるとする。各字種ごとに、学習ＤＢから正規化画像の平均合計画素値ｍを予め計算しておく。入力画像に対して、ｒ３（ｘ）は、ｍから入力画像の正規化画像の合計画素値を引いた値、ｒ４（ｘ）は、入力画素の正規化画像の合計画素値からｍを引いた値とする。これによって、入力画像の正規化画像の合計画素値がｍより小さい場合にはｒ３が大きく、逆に大きい場合にはｒ４が小さくなる。
しかし、これらの指標の組み合わせ方は従来明らかでない。従来技術では、いずれかの基準により棄却されたものを棄却とするなど単純な方法をとるか、人手で試行錯誤しながら複数の指標を組み合わせる、などの方法がとられる。
前者の単純な方法では、すべての棄却指標を算出する必要があるため、計算コストがかかる。その上、いずれかの棄却指標で閾値を超えた場合に棄却されるため、一般に棄却が強すぎて正読率が低下する場合が想定され、高い正読率、且つ低い誤読率を達成するという棄却の目的からしても、必ずしも適しているとは限らない。また、後者の人手での試行錯誤は、この指標の数が多くなると、相当コストのかかる方法であり、実現が困難である場合が想定される。 FIG. 13 is a diagram for illustrating an example of an image to be rejected. In addition to the above-described r1 and r2, the CPU 206 determines that the rejection index includes a character blurring degree r3 (x) as indicated by reference numeral 1301 in FIG. ) Is calculated and a rejection determination is made based on the calculation. A threshold value h3 is set in advance, and when r3 (x)> h3, the blur is so large that it is rejected. Further, a threshold value h4 is set in advance, and when r4 (x)> h4 is satisfied, it is rejected because the collapse is large. In addition, the center of gravity of the character image, the average value of the line width of the character line, and the like can also be used. For example, in the case of the barycentric position, if the character identification result is 8, but the barycentric position is greatly deviated from the center, a determination is made such as rejection.
Here, examples of the blurring degree r3 (x) and the squashing degree r4 (x) are given. In the above description, x is a vector extracted by feature extraction. Here, x is a normalized image. For each character type, the average total pixel value m of the normalized image is calculated in advance from the learning DB. For the input image, r3 (x) is a value obtained by subtracting the total pixel value of the normalized image of the input image from m, and r4 (x) is a value obtained by subtracting m from the total pixel value of the normalized image of the input pixel. Value. As a result, r3 is large when the total pixel value of the normalized image of the input image is smaller than m, and r4 is small when it is larger.
However, how to combine these indicators has not been clear so far. In the prior art, a simple method such as rejecting those rejected according to any of the criteria, or a method of combining a plurality of indicators by trial and error manually is employed.
In the former simple method, since it is necessary to calculate all the rejection indexes, the calculation cost is high. In addition, since it is rejected when any of the rejection indicators exceeds the threshold, it is generally assumed that the rejection is too strong and the correct reading rate is lowered, and that a high correct reading rate and a low misreading rate are achieved. Even for the purpose of rejection, it is not always suitable. Further, the latter manual trial and error is a method that requires a considerable amount of cost when the number of indicators increases, and it may be difficult to implement.

３．文字認識

本実施例では、複数の棄却指標を効果的に組み合わせた棄却方式を自動的に構成することができる。これによって、複数の棄却指標を組み合わせるための人的コストを削減できる。また、正読率を高水準に維持したまま、誤読率を削減することができ、精緻かつ高速な棄却方式を構成することができる。
本実施例の文字認識装置の処理を図を用いて説明する。
図１７は、本発明の実施例の文字認識装置の処理を説明するフローチャートの例である。
文書の画像化１０１、前処理１０２、レイアウト解析１０３、文字列抽出１０４、文字切出１０５、文字識別１０６、認識結果選定１１４、リトライ判定１１５、認識後処理１１６は図５及びその説明箇所で記載したように、本発明の関連技術文字認識装置の処理と同様である。 3. Character recognition

In the present embodiment, a rejection method that effectively combines a plurality of rejection indexes can be automatically configured. Thereby, the human cost for combining a plurality of rejection indicators can be reduced. Further, the misreading rate can be reduced while maintaining the correct reading rate at a high level, and a precise and high-speed rejection method can be configured.
Processing of the character recognition device of this embodiment will be described with reference to the drawings.
FIG. 17 is an example of a flowchart for explaining processing of the character recognition apparatus according to the embodiment of the present invention.
Document imaging 101, preprocessing 102, layout analysis 103, character string extraction 104, character extraction 105, character identification 106, recognition result selection 114, retry determination 115, and post-recognition processing 116 are described in FIG. As described above, the processing is the same as that of the related art character recognition device of the present invention.

以下では、文字認識１７０７の内部の棄却判定部である処理１７０１から処理１７０６までの処理について説明する。棄却処理では、ＣＰＵ２０６は、文字識別１０６の結果と、棄却値とを用いて、棄却判断を行う。ＣＰＵ２０６は、棄却と判定された場合には、当該文字認識結果に棄却フラグを立てるなどして、後の処理や、ユーザに知らせ、その結果を利用できるようにする。
本実施例の棄却組合せの構成には、予め、棄却したい画像サンプルを集めた棄却画像データベースと正読させたい画像サンプルを集めた正読画像データベースを準備しておく。棄却画像データベースは、文字識別１０６で誤読してしまうサンプル、非文字画像、曖昧文字画像、かすれ画像、つぶれ画像など、棄却したい画像サンプルを集めたデータベースである。正読画像データベースは、文字識別１０６の処理で正しく文字識別できるものなど、正読させたい文字画像サンプルを集めたデータベースである。以下では、正読画像データベースのサンプルのうち棄却判定されるものの割合を誤棄却率、棄却画像データベースのサンプルのうち棄却判定されないものの割合を誤受理率とよぶことにする。誤棄却率、誤受理率がともに小さいほど、棄却判定の精度が良いことになる。
以下では、ｎ個の棄却値算出部があるとして、棄却値に棄却値１、棄却値２、…、棄却値ｎのように、番号を付ける。また、画像ｘを入力として、棄却値を出力する関数（棄却関数）をｒ１（ｘ）、ｒ２（ｘ）、…、ｒｎ（ｘ）などと書くことにする。
棄却値の性質について簡単に説明しておく。棄却関数ｒｉ（ｘ）は、棄却したいサンプルに対しては高い値をとり、棄却したくないサンプルに対しては低い値をとるような性質をもつように構成されたものである。例えば、すでに述べたように、かすれ度、つぶれ度や、識別関数の値を用いて計算される非文字度、曖昧度などである。閾値ｈ１を設けておき、ｒｉ（ｘ）＞ｈ１のときに棄却する、などとして用いる。このとき、ｈ１が大きすぎると、十分に棄却することができず、誤読率が高くなる。一方で、ｈ１が低すぎると、誤読率は小さくなるが、正読率も落ちてしまう。そのため、ユーザの要求に応じて、正読率をなるべく落とさないように、かつ、誤読率を減少させるように、ｈ１を調整する。 Hereinafter, processing from processing 1701 to processing 1706, which is a rejection determination unit inside the character recognition 1707, will be described. In the rejection process, the CPU 206 makes a rejection determination using the result of the character identification 106 and the rejection value. When it is determined that the rejection is made, the CPU 206 sets a rejection flag on the character recognition result to notify the user of the subsequent processing or the user so that the result can be used.
In the configuration of the reject combination of this embodiment, a reject image database in which image samples to be rejected and a correct image database in which image samples to be correct are collected are prepared in advance. The reject image database is a database in which image samples to be rejected, such as samples misread by the character identification 106, non-character images, ambiguous character images, blurred images, collapsed images, and the like are collected. The correct reading image database is a database in which character image samples that are to be read correctly, such as those that can be correctly identified by the character identification 106 process, are collected. In the following, the proportion of correctly read image database samples that are rejected is referred to as an erroneous rejection rate, and the proportion of rejected image database samples that are not rejected is referred to as an erroneous acceptance rate. The smaller the error rejection rate and the error acceptance rate, the better the accuracy of the rejection determination.
In the following, assuming that there are n rejection value calculation units, the rejection values are numbered such as a rejection value 1, a rejection value 2,..., A rejection value n. Further, a function (rejection function) that outputs the rejection value with the image x as an input will be written as r1 (x), r2 (x),.
The nature of the rejection value will be briefly described. The rejection function ri (x) is configured to have such a property that it takes a high value for a sample that is desired to be rejected and a low value for a sample that is not desired to be rejected. For example, as described above, the degree of blur, the degree of collapse, the non-character level calculated using the value of the discriminant function, the degree of ambiguity, and the like. A threshold value h1 is provided, and is used as a rejection when ri (x)> h1. At this time, if h1 is too large, it cannot be rejected sufficiently and the misreading rate increases. On the other hand, if h1 is too low, the misreading rate decreases, but the correct reading rate also falls. Therefore, h1 is adjusted according to the user's request so as not to decrease the correct reading rate as much as possible and to reduce the misreading rate.

図１６は、二つの棄却値に対して、いずれかの棄却値で閾値を超えたときに棄却と判断する場合に、棄却と判断される値の領域を斜線により示した。棄却値１が閾値１を超えた場合、または、棄却値２が閾値２を超えた場合に棄却されるので、棄却領域は図１６の斜線部のようになる。
本実施例では、これらｎ個の棄却値を棄却強度が強い順に配置する。棄却強度が強いとは、当該棄却値に基づく棄却判定の棄却率が高いことを意味する。棄却強度の定め方の例をいくつか挙げる。
一つ目の例を挙げる。まず、誤棄却率と誤受理率の和ｅを指定する。各棄却関数ｒｉに対して、ｒｉ（ｘ）＞ｈｉによって棄却判定を行った場合の誤棄却率と誤受理率の和ｅが最も小さくなるように、ｈｉを設定する。このとき、ｒｉ（ｘ）＞ｈｉのときに棄却することによる棄却判定を行った場合の学習用文字画像データベースのサンプルの棄却率が高い順に、棄却値を選定する。
二つ目の例を挙げる。各棄却関数ｒｉに対して、予めユーザによって閾値ｈｉが指定されているとする。このとき、ｒｉ（ｘ）＞ｈｉのときに棄却することによる棄却判定を行った場合の学習用文字画像データベースの棄却率が高い順に、棄却値を選定する。
いま、棄却値が３つ存在し、ｒ１、ｒ２、ｒ３の順に棄却率が高い、すなわち、棄却強度が高いとする。このとき、図１７の処理１７０１から処理１７０６のような順で処理を行う。つまり、棄却値１算出１７０１で入力画像ｘに対する棄却値ｒ１（ｘ）を算出し、棄却判定１（１７０２）で、ｒ１（ｘ）＞ｈ１であれば、棄却と判定し、そうでなければ、棄却しない。棄却と判定された場合には、後の棄却処理である処理１７０３から処理１７０６までの処理をスキップする。棄却と判定されなかった場合には、次の処理１７０３に移る。以下、同様にして、棄却判定２の処理、又は、棄却判定２と棄却判定３の処理を続ける。例では、棄却値が３つの場合について説明したが、２個以上のいずれの個数の棄却値がある場合にも同様である。
本実施例では、棄却と判定された時点で処理を終えることができる。さらに、棄却率が高い順番に先に配置されているため、計算コスト上、効率的である。
FIG. 16 shows a region of values determined to be rejected by hatching when it is determined that the reject value is exceeded when the threshold value is exceeded for any of the two reject values. When the rejection value 1 exceeds the threshold value 1 or when the rejection value 2 exceeds the threshold value 2, the rejection region is as shown by the hatched portion in FIG. 16.
In this embodiment, these n rejection values are arranged in descending order of the rejection strength. A high rejection strength means that the rejection rate of the rejection determination based on the rejection value is high. Here are some examples of how to set the rejection strength.
Take the first example. First, the sum e of the false rejection rate and the false acceptance rate is designated. For each rejection function ri, hi is set so that the sum e of the erroneous rejection rate and the erroneous acceptance rate when the rejection determination is performed by ri (x)> hi is minimized. At this time, the rejection values are selected in descending order of the rejection rate of the learning character image database samples when the rejection determination is performed by rejecting when ri (x)> hi.
Here is a second example. It is assumed that a threshold value hi is designated in advance by the user for each rejection function ri. At this time, rejection values are selected in descending order of the rejection rate of the learning character image database when rejection determination is performed by rejecting when ri (x)> hi.
Now, it is assumed that there are three rejection values and the rejection rates are high in the order of r1, r2, and r3, that is, the rejection strength is high. At this time, processing is performed in the order of processing 1701 to processing 1706 in FIG. In other words, the rejection value r1 (x) for the input image x is calculated in the rejection value 1 calculation 1701, and if r1 (x)> h1 in the rejection determination 1 (1702), it is determined as rejection, otherwise, Do not reject. If it is determined to be rejected, processing from processing 1703 to processing 1706, which is subsequent rejection processing, is skipped. If it is not determined to be rejected, the process proceeds to the next process 1703. Hereinafter, similarly, the processing of rejection determination 2 or the processing of rejection determination 2 and rejection determination 3 is continued. In the example, the case where there are three rejection values has been described, but the same applies to cases where there are two or more rejection values.
In the present embodiment, the processing can be completed when it is determined to be rejected. Furthermore, since the rejection rate is arranged first in order of high rejection rate, the calculation cost is efficient.

図２は、本実施例の文字認識装置の一例を示す構成図であり、実施例１と同様である。図１７に、本実施例の文字認識装置の処理の流れを示す。文書の画像化１０１、前処理１０２、レイアウト解析１０３、文字列抽出１０４、文字切出１０５、文字識別１０６、認識結果選定１１４、リトライ判定１１５、認識後処理１１６の文字認識装置の処理も、実施例１と同様である。また、文字識別１０６も実施例１と同様である。
本実施例では、棄却判定部の各１７０１〜１７０６の処理の流れが異なる。
実施例１では、棄却値算出処理と棄却判定処理を棄却強度が強い順に配置した。棄却値算出の計算コストにあまり差が無い場合にはこの方法で十分であるが、そうでない場合には、非効率である場合がある。例えば、棄却率が高くとも、棄却値算出の計算コストが高い棄却値算出処理が先にあると、常に計算コストが高い棄却値を計算することになる。ここで、計算コストは、例えば、学習用文字画像データベースに含まれる画像を処理する場合の棄却関数の計算にかかる平均処理時間などとして求める。
そのため、本実施例では、各棄却値算出の計算コスト（処理時間）も考慮して、処理の順番を定める。つまり、棄却値の棄却率と計算コスト（処理時間）に基づいて定めた棄却効率をもとに、棄却効率が高い程、先に配置するような構成にしてもよい。棄却効率は、例えば、棄却率×計算コスト（平均処理時間）、で算出できる。
図２１は、棄却関数の説明図である。
本実施例の棄却の構成を表で示すと、図２１の表２１０１のようになる。表の各行（横方向）は並列の並びを示し、合成する棄却関数とその合成関数、列方向（縦方向）は直列での並びを示す。本実施例の場合には、いずれの棄却関数も直列につないでいるため、各列は１つの棄却関数である。棄却値１算出１７０１、棄却値２算出１７０３、棄却値３算出１７０５は、それぞれ、ｆ１（ｒ１（ｘ））、ｆ２（ｒ２（ｘ））、ｆ３（ｒ３（ｘ））、を計算して棄却値とするが、本実施例のように、並列方向に１つの棄却関数しかない場合には、ｆ１、ｆ２、ｆ３は恒等関数として、例えば、ｆ１（ｒ１（ｘ））＝ｒ１（ｘ）としてよい。
FIG. 2 is a configuration diagram illustrating an example of the character recognition apparatus according to the present exemplary embodiment, which is the same as that of the first exemplary embodiment. FIG. 17 shows a processing flow of the character recognition apparatus of this embodiment. Document recognition 101, pre-processing 102, layout analysis 103, character string extraction 104, character extraction 105, character identification 106, recognition result selection 114, retry determination 115, post-recognition processing 116 are also performed by the character recognition device. Similar to Example 1. The character identification 106 is the same as that in the first embodiment.
In a present Example, the flow of a process of each 1701-1706 of a rejection determination part differs.
In Example 1, the rejection value calculation process and the rejection determination process are arranged in descending order of the rejection intensity. This method is sufficient when there is not much difference in the calculation cost for calculating the rejection value, but it may be inefficient otherwise. For example, even if the rejection rate is high, if there is a rejection value calculation process with a high calculation cost for calculating the rejection value, a rejection value with a high calculation cost is always calculated. Here, the calculation cost is obtained as, for example, an average processing time for calculating a rejection function when processing an image included in the learning character image database.
Therefore, in this embodiment, the processing order is determined in consideration of the calculation cost (processing time) for calculating each rejection value. That is, based on the rejection efficiency determined based on the rejection rate of the rejection value and the calculation cost (processing time), the higher the rejection efficiency, the higher the rejection efficiency. The rejection efficiency can be calculated by, for example, rejection rate × calculation cost (average processing time).
FIG. 21 is an explanatory diagram of the rejection function.
When the rejection configuration of this embodiment is shown in a table, it is as shown in Table 2101 of FIG. Each row (horizontal direction) in the table indicates a parallel arrangement, and a rejection function to be combined and its synthesis function, and a column direction (vertical direction) indicates a series arrangement. In this embodiment, since all the rejection functions are connected in series, each column is one rejection function. Rejection value 1 calculation 1701, Rejection value 2 calculation 1703, Rejection value 3 calculation 1705 calculate f1 (r1 (x)), f2 (r2 (x)), and f3 (r3 (x)), respectively, and reject When there is only one rejection function in the parallel direction as in this embodiment, f1, f2, and f3 are assumed to be identity functions, for example, f1 (r1 (x)) = r1 (x) As good as

図２は、本実施例の文字認識装置の一例を示す構成図であり、実施例１と同様である。図１８に、本実施例の文字認識装置の処理の流れを示す。文書の画像化１０１、前処理１０２、レイアウト解析１０３、文字列抽出１０４、文字切出１０５、文字識別１０６、認識結果選定１１４、リトライ判定１１５、認識後処理１１６の文字認識装置の処理も、実施例１と同様である。また、文字識別１０６も実施例１と同様である。
本実施例では、文字認識１８０５における棄却判定を行う処理１８０１〜１８０４が異なる。本実施例では、処理１８０１〜１８０３に示すように、複数の棄却値を平行して算出し、それらの値に基づいて、処理１８０４において棄却判定処理を行う。
まず、このように棄却値算出を並列につなぐ理由について説明する。
図１６は、二つの棄却値に対して、いずれかの棄却値で閾値を超えたときに棄却と判断する場合に、棄却と判断される値の領域を斜線により示した。棄却値１が閾値１を超えた場合、または、棄却値２が閾値２を超えた場合に棄却されるので、棄却領域は図１６の斜線部のようになる。これは、実施例１や実施例２のように棄却値算出と棄却判定を順に行い、直列に処理を繋いだ場合に相当する。
図４は、２つの棄却値の値と、棄却したいサンプル、正読したいサンプルの分布を模式的に表したものである。三角が棄却画像データベースのサンプルを表し、丸が正読画像データベースのサンプルを表す。このような分布の場合には、正読画像データベースのサンプルの分布と棄却画像データベースのサンプルの分布の境界が、図４の境界線のようになっており、棄却すべきサンプルは、この境界線よりも右上の側に位置している。一方で、棄却を直列に行った場合には、図１６のような棄却領域となり、この例では、多数の棄却すべきサンプルを棄却できなくなってしまう。これらの棄却すべきサンプルが棄却できるように、閾値１と閾値２の値を小さくすると、今度は、正読させたい丸のサンプルを多数棄却してしまうことになる。
このようなことから、本実施例では、棄却値１と棄却値２の両方の値に基づいて棄却判断を行う。つまり、棄却値１の値をｘ１、棄却値２の値をｘ２としたとき、これらを引数にとる関数ｆ（ｘ１、ｘ２）により新たな棄却値を定め、ｆ（ｘ１、ｘ２）の値が一定の閾値以上の場合に棄却する。ｆ（ｘ１、ｘ２）としては、例えば、ｆ（ｘ１、ｘ２）＝ｘ１＋ｘ２を用いることができる。関数ｆ（ｘ１、ｘ２）の定め方について、もうひとつ例を挙げる。 FIG. 2 is a configuration diagram illustrating an example of the character recognition apparatus according to the present exemplary embodiment, which is the same as that of the first exemplary embodiment. FIG. 18 shows the flow of processing of the character recognition apparatus of this embodiment. Document recognition 101, pre-processing 102, layout analysis 103, character string extraction 104, character extraction 105, character identification 106, recognition result selection 114, retry determination 115, post-recognition processing 116 are also performed by the character recognition device. Similar to Example 1. The character identification 106 is the same as that in the first embodiment.
In the present embodiment, processes 1801 to 1804 for performing rejection determination in character recognition 1805 are different. In this embodiment, as shown in processes 1801 to 1803, a plurality of rejection values are calculated in parallel, and a rejection determination process is performed in process 1804 based on these values.
First, the reason why the rejection value calculation is connected in parallel will be described.
FIG. 16 shows a region of values determined to be rejected by hatching when it is determined that the reject value is exceeded when the threshold value is exceeded for any of the two reject values. When the rejection value 1 exceeds the threshold value 1 or when the rejection value 2 exceeds the threshold value 2, the rejection region is as shown by the hatched portion in FIG. 16. This corresponds to the case where the rejection value calculation and the rejection determination are sequentially performed as in the first and second embodiments, and the processes are connected in series.
FIG. 4 schematically shows two rejection values, distributions of samples to be rejected, and samples to be correctly read. A triangle represents a sample of the reject image database, and a circle represents a sample of the correct reading image database. In the case of such a distribution, the boundary between the distribution of the sample of the correct reading image database and the distribution of the sample of the rejection image database is as shown in FIG. 4, and the sample to be rejected is the boundary line. It is located on the upper right side. On the other hand, when the rejection is performed in series, a rejection region as shown in FIG. 16 is obtained, and in this example, a large number of samples to be rejected cannot be rejected. If the values of threshold 1 and threshold 2 are made small so that these samples to be rejected can be rejected, many round samples to be correctly read will be rejected.
For this reason, in this embodiment, the rejection determination is performed based on both the rejection value 1 and the rejection value 2. That is, when the value of the rejection value 1 is x1, and the value of the rejection value 2 is x2, a new rejection value is determined by a function f (x1, x2) that takes these as arguments, and the value of f (x1, x2) is Reject if it is above a certain threshold. For example, f (x1, x2) = x1 + x2 can be used as f (x1, x2). Another example of how to define the function f (x1, x2) will be described.

関数ｆ（ｘ１、ｘ２）は、ａ１１、ａ２２、ａ１２、ａ１、ａ２、ａ０をパラメータとしてもつｘ１、ｘ２の二次関数ｆ（ｘ１、ｘ２）＝ａ１１ｘ１ｘ１＋ａ２２ｘ２ｘ２＋ａ１２ｘ１ｘ２＋ａ１ｘ１＋ａ２ｘ２＋ａ０、として定義する。このパラメータａ１１、ａ２２、ａ１２、ａ１、ａ２、ａ０を、正読画像データベースのサンプルに対して負の値をとるように、棄却画像データベースのサンプルに対して正の値をとるように、設定する。ただし、全てのサンプルに対してこの条件を満たすようなパラメータを設定することは、一般には不可能な場合が想定されるので、パラメータを引数にとり、条件を満たさない度合いを示すコスト関数（損失関数）（又は、正読画像データベースのサンプルと、棄却画像データベースのサンプルとの識別誤差に基づくコスト関数）ｃ（ｆ）を定義し、この値が小さくなるように、機械学習によって学習する。例えば、棄却画像データベースのサンプルに対してはｆが１、正読画像データベースのサンプルに対してはｆが−１をとる方向に学習するとして、ｃ（ｆ）をこれらの値からの全サンプルに対する二乗誤差の和とする。ｃ（ｆ）は、例えば、棄却画像データベースのサンプルから計算されるｆの値と１との二乗誤差の和をｖ１＝Σ｜ｆ−１｜＾２、正読画像データベースのサンプルから計算されるｆの値と−１との二乗誤差の和をｖ２＝Σ｜ｆ＋１｜＾２とし、ｃ（ｆ）＝ｖ１＋ｖ２（二乗誤差の和）などとする。例えば、ニューラルネットワークやＳＶＭなどを用いることができる。このようにして作成したｆのｆ＝０となる等高線は、図４の境界線のように、正読画像データベースの分布と棄却画像データベースのサンプルの境界線となる。なお、ここでは、ｆは二次関数を例として説明したが、より一般の関数、例えば、より高次の関数や、ニューラルネットワーク、動径基底関数の線型結合なども用いることができる。
以上、説明を簡単にするために、２つの棄却値をもつ場合について説明したが、３つ以上の棄却値の場合も同様である。図１８には、３つの棄却値が存在する場合に処理の流れを示している。処理１８０１、処理１８０２、処理１８０３では、それぞれ棄却値１、棄却値２、棄却値３、を算出する。それぞれの棄却値をｘ１、ｘ２、ｘ３とする。棄却判定３（１８０４）では、上記で説明したようにして作成した新たな棄却値ｆ（ｘ１、ｘ２、ｘ３）に基づいて、ｆ（ｘ１、ｘ２、ｘ３）が予め定めておいた閾値より大きい場合には棄却とし、そうでない場合には、棄却しない。
本実施例の方法は、直列につなぐよりも精度のよい棄却を行うことができる。しかし、全ての棄却値を算出しなければならない上、それらの棄却値に基づいてｆの値も計算する必要がある。そのため、棄却にかかる計算コストは大きくなる場合が想定される。
本実施例の棄却の構成を表で示すと図２１の表２１０２のようになる。表の各行（横方向）は並列の並びを示し、合成する棄却関数とその合成関数、列方向（縦方向）は直列での並びを示す。本実施例の場合には、いずれの棄却関数も並列につないでいるため、１行である。合成関数はｆで、棄却判定１８０４で算出される値は、ｆ（ｒ１（ｘ）、ｒ２（ｘ）、ｒ３（ｘ））となる。ｆは、例えば、上記で説明した方法で作成した関数である。
The function f (x1, x2) is defined as a quadratic function f1 (x1, x2) = a11x1x1 + a22x2x2 + a12x1x2 + a1x1 + a2x2 + a0 of x1, x2 having a11, a22, a12, a1, a2, a0 as parameters. The parameters a11, a22, a12, a1, a2, and a0 are set so as to take a negative value with respect to the sample of the correct image database and to take a positive value with respect to the sample of the reject image database. . However, since it is generally impossible to set a parameter that satisfies this condition for all samples, a cost function (loss function) indicating the degree of not satisfying the condition by taking the parameter as an argument ) (Or a cost function based on the discrimination error between the sample of the correctly read image database and the sample of the reject image database) c (f) is defined, and learning is performed by machine learning so that this value becomes small. For example, assuming that f is 1 for samples in the reject image database and f is -1 for samples in the correct image database, c (f) is applied to all samples from these values. The sum of squared errors. c (f) is, for example, calculated from the sample of the correct reading image database by v1 = Σ | f−1 | ^ 2, which is the sum of the square error between the value of f calculated from the sample of the rejection image database and 1. The sum of square errors between the value of f and −1 is v2 = Σ | f + 1 | ^ 2, and c (f) = v1 + v2 (sum of square errors) or the like. For example, a neural network or SVM can be used. The contour line at which f = 0 created in this way becomes the boundary line between the distribution of the correct reading image database and the sample of the rejection image database, like the boundary line in FIG. Here, f has been described by taking a quadratic function as an example, but a more general function, for example, a higher-order function, a neural network, a linear combination of radial basis functions, or the like can also be used.
As mentioned above, in order to simplify explanation, although the case where it had two rejection values was demonstrated, it is the same also in the case of three or more rejection values. FIG. 18 shows the flow of processing when there are three rejection values. In processing 1801, processing 1802, and processing 1803, a rejection value 1, a rejection value 2, and a rejection value 3 are calculated, respectively. The rejection values are x1, x2, and x3. In rejection determination 3 (1804), f (x1, x2, x3) is larger than a predetermined threshold based on the new rejection value f (x1, x2, x3) created as described above. If it is not, it will be rejected, otherwise it will not be rejected.
The method of the present embodiment can perform rejection with higher accuracy than connecting in series. However, all of the rejection values must be calculated, and the value of f needs to be calculated based on the rejection values. Therefore, the case where the calculation cost concerning rejection becomes large is assumed.
A table of the rejection configuration of this embodiment is shown in Table 2102 of FIG. Each row (horizontal direction) in the table indicates a parallel arrangement, and a rejection function to be combined and its synthesis function, and a column direction (vertical direction) indicates a series arrangement. In the case of this embodiment, since all the rejection functions are connected in parallel, there is one line. The composite function is f, and the values calculated in the rejection determination 1804 are f (r1 (x), r2 (x), r3 (x)). f is, for example, a function created by the method described above.

図２は、本実施例の文字認識装置の一例を示す構成図であり、実施例１と同様である。図１に、本実施例の文字認識装置の処理の流れを示す。文書の画像化１０１、前処理１０２、レイアウト解析１０３、文字列抽出１０４、文字切出１０５、文字識別１０６、認識結果選定１１４、リトライ判定１１５、認識後処理１１６の文字認識装置の処理も、実施例１と同様である。また、文字識別１０６も実施例１と同様である。
本実施例では、文字認識１１７における棄却判定を行う処理の組み合わせ（１０７〜１１３の部分に相当）が異なる。
本実施例の棄却組合せの構成には、予め、棄却したい画像サンプルを集めた棄却画像データベースと正読させたい画像サンプルを集めた正読画像データベースを準備しておく。棄却画像データベースは、文字識別１０６で誤読してしまうサンプル、非文字画像、曖昧文字画像、かすれ画像、つぶれ画像など、棄却したい画像サンプルを集めたデータベースである。正読画像データベースは、文字識別１０６の処理で正しく文字識別できるものなど、正読させたい文字画像サンプルを集めたデータベースである。以下では、正読画像データベースのサンプルのうち棄却判定されるものの割合を誤棄却率、棄却画像データベースのサンプルのうち棄却判定されないものの割合を誤受理率とよぶことにする。誤棄却率、誤受理率がともに小さいほど、棄却判定の精度が良いことになる。
以下では、ｎ個の棄却値算出部があるとして、棄却値に棄却値１、棄却値２、…、棄却値ｎのように、番号を付ける。また、画像ｘを入力として、棄却値を出力する関数をｒ１（ｘ）、ｒ２（ｘ）、…、ｒｎ（ｘ）などと書くことにする。 FIG. 2 is a configuration diagram illustrating an example of the character recognition apparatus according to the present exemplary embodiment, which is the same as that of the first exemplary embodiment. FIG. 1 shows the flow of processing of the character recognition apparatus of this embodiment. Document recognition 101, pre-processing 102, layout analysis 103, character string extraction 104, character extraction 105, character identification 106, recognition result selection 114, retry determination 115, post-recognition processing 116 are also performed by the character recognition device. Similar to Example 1. The character identification 106 is the same as that in the first embodiment.
In the present embodiment, the combination of processes for performing rejection determination in the character recognition 117 (corresponding to the portions 107 to 113) is different.
In the configuration of the reject combination of this embodiment, a reject image database in which image samples to be rejected and a correct image database in which image samples to be correct are collected are prepared in advance. The reject image database is a database in which image samples to be rejected, such as samples misread by the character identification 106, non-character images, ambiguous character images, blurred images, collapsed images, and the like are collected. The correct reading image database is a database in which character image samples that are to be read correctly, such as those that can be correctly identified by the character identification 106 process, are collected. In the following, the proportion of correctly read image database samples that are rejected is referred to as an erroneous rejection rate, and the proportion of rejected image database samples that are not rejected is referred to as an erroneous acceptance rate. The smaller the error rejection rate and the error acceptance rate, the better the accuracy of the rejection determination.
In the following, assuming that there are n rejection value calculation units, the rejection values are numbered such as a rejection value 1, a rejection value 2,..., A rejection value n. In addition, a function that receives an image x and outputs a rejection value is written as r1 (x), r2 (x),..., Rn (x), and the like.

本実施例では、これらｎ個の棄却値を棄却値同士の独立性の高さ、独立性の低さ（相関性の高さ）、棄却効率を考慮しながら、組み合わせる。本実施例の棄却値算出器の組み合わせの方針は、棄却値算出器を直列または並列につなぐ。その組み合わせ方は、独立性の高い棄却値算出器同士は直列に組合せ、独立性の低い（相関性の高い）棄却値算出器同士は並列に組合せ、棄却強度が強い棄却値算出器ほど先に配置する。また、並列に組み合わせる場合には、組み合わせた複数の棄却値に基づいて、新たな棄却値を定め、それに基づいて棄却判断を行う。さらに、棄却効率が高い処理ほど先に配置する。
図１６は、二つの棄却値に対して、いずれかの棄却値で閾値を超えたときに棄却と判断する場合に、棄却と判断される値の領域を斜線により示した。棄却値１が閾値１を超えた場合、または、棄却値２が閾値２を超えた場合に棄却されるので、棄却領域は図１６の斜線部のようになる。
まず、図３を用いて、棄却値同士の独立性について説明する。図３は、２つの棄却値の値と、棄却したいサンプル、正読したいサンプルの分布を模式的に表したものである。三角が棄却画像データベースのサンプルを表し、丸が正読画像データベースのサンプルを表す。このような分布では、正読画像データベースのサンプルの分布と棄却画像データベースのサンプルの分布の境界線が、図３のように、右上方向に大きく凸状になる。このような場合に、２つの棄却値は独立性が高いと呼ぶことにする。このような状況は、２つの棄却値が独立性の高い事象を基に棄却値を算出する場合に起こり得る。例えば、棄却値１は、文字のかすれ度を計算しており、棄却値２は、文字の重心位置の標準的な重心位置からの乖離の大きさを計算している場合などである。 In this embodiment, these n rejection values are combined in consideration of the high independence between the rejection values, the low independence (high correlation), and the rejection efficiency. The policy of combination of the critical value calculators of the present embodiment connects the critical value calculators in series or in parallel. The combination of the independence value calculators with high independence is combined in series, the independence value calculators with low independence (high correlation) are combined in parallel, and the rejection value calculator with strong rejection strength comes first. Deploy. Further, when combining in parallel, a new rejection value is determined based on a plurality of combined rejection values, and a rejection determination is made based on the new rejection value. Furthermore, the processing with higher rejection efficiency is arranged earlier.
FIG. 16 shows a region of values determined to be rejected by hatching when it is determined that the reject value is exceeded when the threshold value is exceeded for any of the two reject values. When the rejection value 1 exceeds the threshold value 1 or when the rejection value 2 exceeds the threshold value 2, the rejection region is as shown by the hatched portion in FIG. 16.
First, the independence between rejection values will be described with reference to FIG. FIG. 3 schematically shows two rejection values, distributions of samples to be rejected, and samples to be correctly read. A triangle represents a sample of the reject image database, and a circle represents a sample of the correct reading image database. In such a distribution, the boundary line between the distribution of the sample of the correct reading image database and the distribution of the sample of the rejection image database has a large convex shape in the upper right direction as shown in FIG. In such a case, the two rejection values will be called highly independent. Such a situation may occur when a rejection value is calculated based on an event in which two rejection values are highly independent. For example, the rejection value 1 is a character blurring degree calculation, and the rejection value 2 is a case where a deviation degree of a character center of gravity position from a standard center of gravity position is calculated.

本実施例では、棄却値が独立性が高い場合には、棄却値１算出と棄却値２算出を直列に処理する。つまり、まず、棄却値１を算出した上で、閾値１より値が高い場合には棄却とする判断を行う。棄却と判定されれば、棄却処理を終える。棄却と判定されなかった場合には、棄却値２を算出した上で、閾値２より値が高い場合には棄却と判断する。棄却と判定されれば、棄却処理を終える。棄却と判定されなかった場合には、次の棄却処理に移る。図３のように閾値１、閾値２を定め、棄却値１が閾値１を超えた場合、または、棄却値２が閾値２を超えた場合に棄却と判断することで、効率良く棄却することができる。このような棄却値は直列に処理すれば良い。
次に、図４を用いて、棄却値同士の独立性の低さ（相関性の高さ）について説明する。図４は、２つの棄却値の値と、棄却したいサンプル、正読したいサンプルの分布を模式的に表したものである。三角が棄却画像データベースのサンプルを表し、丸が正読画像データベースのサンプルを表す。このような分布では、正読画像データベースのサンプルの分布と棄却画像データベースのサンプルの分布の境界線が、図４のように、図３の場合ほど凸度が大きく無い場合、直線に近い場合、または、逆に左下方向に凸となる場合、２つの棄却値は独立性が低いと呼ぶことにする。このような状況は、２つの棄却値が相関性の高い事象を基に棄却値を算出する場合に起こり得る。例えば、棄却値１は、すでに説明したような識別関数に基づく非文字度を算出しており、棄却値２は、識別関数に基づく曖昧度を算出しているような場合である。このような場合には、どちらも識別関数を基にして棄却値を計算しているため、互いに関連性をもち、図４のような分布となる。
本実施例では、棄却値の独立性が低い場合には、棄却値１算出と棄却値２算出を並列に処理する。つまり、棄却値１をｘ１、棄却値２をｘ２としたとき、これらを引数にとる関数ｆ（ｘ１、ｘ２）により新たに棄却値を定め、ｆ（ｘ１、ｘ２）の値が一定の閾値以上の場合に棄却する。棄却と判定されれば、棄却処理を終える。棄却と判定されなかった場合には、次の棄却処理に移る。関数ｆの定め方は、実施例３と同様である。図４の分布の場合には、例えば、ｆ（ｘ１、ｘ２）＝ｘ１＋ｘ２とすれば、左上から右下に斜め方向に閾値境界線を定めることができ、正読画像データベースのサンプルと棄却画像データベースのサンプルを分離することができる。図４のような分布の場合には、棄却値を直列につなぐと、棄却値１が閾値１より大きいか、または、棄却値２が閾値２より大きい領域のみが棄却され、閾値１より左で、かつ、閾値２より下に分布している三角のサンプルが棄却できない。また、これらを棄却するために閾値１や閾値２の値を下げると、今度は、正読させたい丸のサンプルを多数棄却してしまうことになる。そのため、このような棄却値は並列に繋ぐ必要がある。
以上のように、本実施例では、独立性の高い棄却値同士は直列に処理し、並列性の高い棄却値同士は並列に処理する。 In this embodiment, when the rejection value is highly independent, the rejection value 1 calculation and the rejection value 2 calculation are processed in series. That is, first, after calculating the rejection value 1, if the value is higher than the threshold value 1, it is determined to be rejection. If it is determined to be rejected, the reject process is finished. If it is not determined to be rejected, a reject value 2 is calculated, and if the value is higher than threshold 2, it is determined to be rejected. If it is determined to be rejected, the reject process is finished. If it is not determined to be rejected, the process proceeds to the next reject process. As shown in FIG. 3, threshold 1 and threshold 2 are set, and when the rejection value 1 exceeds the threshold 1 or when the rejection value 2 exceeds the threshold 2, it is determined that the rejection is made, so that the rejection can be efficiently performed. it can. Such rejection values may be processed in series.
Next, the low independence between reject values (high correlation) will be described with reference to FIG. FIG. 4 schematically shows two rejection values, distributions of samples to be rejected, and samples to be correctly read. A triangle represents a sample of the reject image database, and a circle represents a sample of the correct reading image database. In such a distribution, when the boundary between the distribution of the sample of the correct reading image database and the distribution of the sample of the rejection image database is not as large as the case of FIG. Or, conversely, when it becomes convex in the lower left direction, the two rejection values will be referred to as having low independence. Such a situation may occur when a rejection value is calculated based on an event in which two rejection values are highly correlated. For example, the rejection value 1 is a case where the non-character level based on the discrimination function as described above is calculated, and the rejection value 2 is a case where the ambiguity level is calculated based on the discrimination function. In such a case, since the rejection value is calculated based on the discriminant function, they are related to each other and have a distribution as shown in FIG.
In this embodiment, when the independence of the rejection value is low, the rejection value 1 calculation and the rejection value 2 calculation are processed in parallel. That is, when the rejection value 1 is x1 and the rejection value 2 is x2, a new rejection value is determined by a function f (x1, x2) that takes these as arguments, and the value of f (x1, x2) is equal to or greater than a certain threshold value. Reject in case of. If it is determined to be rejected, the reject process is finished. If it is not determined to be rejected, the process proceeds to the next reject process. The method for defining the function f is the same as in the third embodiment. In the case of the distribution of FIG. 4, for example, if f (x1, x2) = x1 + x2, a threshold boundary line can be defined in an oblique direction from the upper left to the lower right, and a sample of the correctly read image database and the reject image database Samples can be separated. In the case of the distribution as shown in FIG. 4, when the rejection values are connected in series, only the region where the rejection value 1 is greater than the threshold value 1 or the rejection value 2 is greater than the threshold value 2 is rejected. In addition, triangular samples distributed below the threshold value 2 cannot be rejected. Further, if the values of the threshold value 1 and the threshold value 2 are lowered in order to reject these, a large number of round samples to be read correctly will be rejected. Therefore, such rejection values need to be connected in parallel.
As described above, in this embodiment, reject values with high independence are processed in series, and reject values with high parallelism are processed in parallel.

ここで、２つの棄却値の独立性が高いか、独立性が低い（相関性が高い）か、判断するための方法の例を挙げる。２つの棄却値をそれぞれｘ１、ｘ２とおく。この２個の棄却値を引数とする２つの関数ｇ１（ｘ１、ｘ２）、ｇ２（ｘ１、ｘ２）を定義する。
関数ｇ１（ｘ１、ｘ２）は、実施例３と同様に二次関数で、正読画像データベースのサンプルに対して負の値をとり、棄却画像データベースのサンプルに対して正の値をとるように、コスト関数ｃに基づいて、機械学習により設定する。ｇ１は、例えば、図３、図４に示すように、ｇ１＝０となる等高線が境界線となり、境界線より左下の領域で負、右上の領域で正となるような関数となる。
関数ｇ２（ｘ１、ｘ２）は、２つの値ｈ１、ｈ２をパラメータとしてもち、ｘ１＞ｈ１、またはｘ２＞ｈ２となる場合にｇ２（ｘ１、ｘ２）＝１、その他の場合にｇ２（ｘ１、ｘ２）＝−１となるような関数とする。すなわち、ｇ２（ｘ１、ｘ２）＞０となる領域が棄却領域である。ただし、全てのサンプルに対してこの条件を満たすようなパラメータを設定することは、一般には不可能であるので、パラメータを引数にとり、条件を満たさない度合いを示すコスト関数ｃ（ｈ１、ｈ２）を定義し、この値が小さくなるように、機械学習によって学習する。ｃ（ｈ１、ｈ２）は、例えば、棄却画像データベースのサンプルでｇ２＝−１となるものの個数をｖ１、正読画像データベースのサンプルでｇ２＝１となるものの個数をｖ２とし、ｃ（ｈ１、ｈ２）＝ｖ１＋ｖ２（条件を満たさないサンプルの個数）などとする。例えば、ニューラルネットワークやＳＶＭなどを用いることができる。このようにして作成したｇ２のｇ２＝１とｇ２＝−１の境界線は、棄却値１または棄却値２の軸に平行で、正読画像データベースの分布と棄却画像データベースのサンプルを分けるような境界となる。図３、図４の例では、閾値１がｈ１、閾値２がｈ２を示す点線であるとすると、ｇ２は閾値１より左側でかつ、閾値２より下側の領域でｇ２＝−１、閾値１より右側か、または閾値２より上側の領域でｇ２＝１となる。 Here, an example of a method for determining whether two rejection values are high independence or low independence (high correlation) will be given. Two rejection values are set as x1 and x2, respectively. Two functions g1 (x1, x2) and g2 (x1, x2) are defined with these two rejection values as arguments.
The function g1 (x1, x2) is a quadratic function as in the third embodiment, and takes a negative value with respect to the sample of the correct image database and takes a positive value with respect to the sample of the reject image database. Based on the cost function c, it is set by machine learning. For example, as shown in FIGS. 3 and 4, g1 is a function in which a contour line where g1 = 0 is a boundary line, is negative in a lower left area from the boundary line, and is positive in an upper right area.
The function g2 (x1, x2) has two values h1 and h2 as parameters, and when x1> h1 or x2> h2, g2 (x1, x2) = 1, otherwise g2 (x1, x2) ) = − 1. That is, a region where g2 (x1, x2)> 0 is a rejection region. However, since it is generally impossible to set a parameter that satisfies this condition for all samples, a cost function c (h1, h2) that indicates the degree of not satisfying the condition by taking the parameter as an argument is used. Define and learn by machine learning so that this value becomes small. c (h1, h2) is, for example, the number of rejected image database samples with g2 = −1 being v1, the number of correctly read image database samples with g2 = 1 being v2, and c (h1, h2). ) = V1 + v2 (number of samples that do not satisfy the condition). For example, a neural network or SVM can be used. The boundary line between g2 = 1 and g2 = −1 of g2 created in this way is parallel to the axis of rejection value 1 or rejection value 2, and separates the distribution of the correct reading image database from the sample of the rejection image database. It becomes a boundary. 3 and 4, assuming that threshold 1 is a dotted line indicating h1 and threshold 2 is h2, g2 is a region to the left of threshold 1 and below threshold 2, g2 = -1, threshold 1 In the region on the right side or on the upper side of the threshold 2, g2 = 1.

上記の関数ｇ２（ｘ１、ｘ２）によるｇ２＝１とｇ２＝−１の境界は、棄却処理を直列に繋いだ場合の棄却領域の境界に相当する。一方、関数ｇ１（ｘ１、ｘ２）によるｇ１＝０の等高線は、実施例３の方法により棄却処理を並列に繋いだ場合の棄却領域の境界に相当する。
ここで、関数ｇ１により生成される棄却領域による精度と関数ｇ２により生成される棄却領域による精度を比較する。棄却画像データベースのサンプルでｇ２＝−１となるサンプルの個数をｖ１、正読画像データベースのサンプルでｇ２＝１となるサンプルの個数をｖ２、棄却画像データベースのサンプルでｇ１＜０となるサンプルの個数をｗ１、正読画像データベースのサンプルでｇ１≧０となるサンプルの個数をｗ２とする。ｖ１、ｗ１が誤受理の個数、ｖ２、ｗ２が誤棄却の個数に相当する。誤受理の個数ｐ１、誤棄却の個数ｐ２の場合の損失関数をｈ（ｐ１、ｐ２）とおく。ｈは、ｐ１、ｐ２の単調増加関数である。例えば、ｈ（ｐ１、ｐ２）＝ｐ１＋ｐ２などとする。この場合は、誤受理数と誤棄却数の和である。ｈの値が小さいほど、棄却の精度が良いとみなすことができる。
次に、ｈ（ｖ１、ｖ２）とｈ（ｗ１、ｗ２）を比較する。一般に、関数ｇ１による棄却領域のほうが精度が良く、ｈ（ｖ１、ｖ２）はｈ（ｗ１、ｗ２）より大きくなる。ここで、Ｄ＝ｈ（ｖ１、ｖ２）−ｈ（ｗ１、ｗ２）は、直列に繋いだ場合と並列に繋いだ場合の損失の差を表している。これが一定以上、大きい場合には、ｇ２による棄却領域では不十分であり、棄却値１と棄却値２の独立性が低いと判断する。逆に、Ｄ＝ｈ（ｖ１、ｖ２）−ｈ（ｗ１、ｗ２）が一定の値以上、小さい場合には、棄却値１と棄却値２の独立性が高いと判断する。
以上、説明を簡単にするために、２つの棄却値について説明したが、３つ以上の場合にも同様である。 The boundary between g2 = 1 and g2 = −1 by the function g2 (x1, x2) corresponds to the boundary of the rejection area when the rejection processing is connected in series. On the other hand, the contour line of g1 = 0 by the function g1 (x1, x2) corresponds to the boundary of the rejection area when the rejection processing is connected in parallel by the method of the third embodiment.
Here, the accuracy of the rejection region generated by the function g1 is compared with the accuracy of the rejection region generated by the function g2. The number of samples in the reject image database where g2 = −1 is v1, the number of samples in the correct image database where g2 = 1 is v2, and the number of samples in the reject image database where g1 <0. Is w1, and the number of samples in the correctly read image database satisfying g1 ≧ 0 is w2. v1 and w1 correspond to the number of false acceptances, and v2 and w2 correspond to the number of false rejections. The loss function in the case of the number of erroneous acceptance p1 and the number of false rejections p2 is set as h (p1, p2). h is a monotonically increasing function of p1 and p2. For example, h (p1, p2) = p1 + p2. In this case, it is the sum of the number of false acceptances and the number of false rejections. It can be considered that the smaller the value of h, the better the accuracy of rejection.
Next, h (v1, v2) and h (w1, w2) are compared. In general, the rejection region based on the function g1 has better accuracy, and h (v1, v2) is larger than h (w1, w2). Here, D = h (v1, v2) −h (w1, w2) represents a difference in loss between when connected in series and when connected in parallel. If this is greater than or equal to a certain value, it is determined that the rejection region by g2 is insufficient, and the independence between the rejection value 1 and the rejection value 2 is low. Conversely, when D = h (v1, v2) −h (w1, w2) is equal to or greater than a certain value, it is determined that the rejection value 1 and the rejection value 2 are highly independent.
In the above, for the sake of simplicity, two rejection values have been described, but the same applies to the case of three or more rejection values.

本実施例では、独立性が高い処理は並列に、独立性が低い処理は直列に配置する。並列に配置した場合の棄却値には、実施例３と同じく上記の関数ｇ１を用いることができる。また、実施例２と同じく、棄却効率が高いものほど先に配置する。
ｎ個の棄却値がある場合に、棄却値の構成法について例を挙げる。
図２２に、棄却値の構成処理のフローチャートを示す。この処理は、文字認識装置２０１のＣＰＵ２０６、又は、文字認識装置２０１以外の他の処理装置で実行するようにしてもよい。まず、ＣＰＵ２０６又は他の処理装置は、ｎ個の棄却値の中から最も独立性が低い（上記のＤの値が大きい）ペアを選定する。ＣＰＵ２０６又は他の処理装置は、この独立性を判定する値Ｄが予め定めた値より低い場合には、選定したペアは独立性が高いので、このｎ個の棄却値は直列に配置する。ＣＰＵ２０６又は他の処理装置は、選定したペアの独立性が低いと判定された場合には、選定したペアは並列につなぎ、これらの棄却値に基づく新たな棄却値を実施例３の方法と同様にして定める。この並列に繋がれた棄却値を１つの棄却値とみなすと、ｎ−１個の棄却値が存在する。同様にして、ＣＰＵ２０６又は他の処理装置は、ｎ−１個の棄却値の中から最も独立性が低いペアを選定する。選定したペアが独立性が高いと判定された場合には、ＣＰＵ２０６又は他の処理装置は、このｎ−１個の棄却値は直列に配置する。選定したペアの独立性が低いと判定された場合には、ＣＰＵ２０６又は他の処理装置は、選定したペアは並列につなぎ、これらの棄却値に基づく新たな棄却値を実施例３の方法と同様にして定める。このとき、もし、選定したペアを構成する棄却値（ｒ１、ｒ２とする）が複数の棄却値の並列から成っている場合には、ＣＰＵ２０６又は他の処理装置は、その棄却値を構成する元の棄却値に分解し、それらの棄却値を並列につなぎ、これらの棄却値に基づく新たな棄却値を実施例３の方法と同様にして定める。たとえば、ｒ１は、もともと２つの棄却値ｓ１、ｓ２を並列につなぐことで構成されていた場合、ＣＰＵ２０６又は他の処理装置は、ｒ１をもとの棄却値に分解し、ｓ１、ｓ２、ｒ２を並列につなぐ。以上のようにして、ＣＰＵ２０６又は他の処理装置は、最終的に、独立性が低いと判定されるペアがなくなるまで続ける。 In this embodiment, processes with high independence are arranged in parallel, and processes with low independence are arranged in series. As in the case of the third embodiment, the above function g1 can be used as the rejection value when arranged in parallel. Moreover, like Example 2, it arrange | positions earlier, so that a rejection efficiency is high.
When there are n rejection values, an example of the method of constructing the rejection values will be given.
FIG. 22 is a flowchart of the reject value configuration process. This processing may be executed by the CPU 206 of the character recognition device 201 or another processing device other than the character recognition device 201. First, the CPU 206 or another processing device selects a pair having the lowest independence (the value of D is large) from among the n rejection values. When the value 206 for determining the independence is lower than a predetermined value, the CPU 206 or another processing device places the n reject values in series because the selected pair is highly independent. When the CPU 206 or another processing device determines that the selected pair is less independent, the selected pair is connected in parallel, and a new rejection value based on these rejection values is the same as in the method of the third embodiment. Determine. When the rejection values connected in parallel are regarded as one rejection value, there are n-1 rejection values. Similarly, the CPU 206 or another processing device selects a pair having the lowest independence from the n-1 rejection values. When it is determined that the selected pair is highly independent, the CPU 206 or another processing device arranges the n-1 rejection values in series. If it is determined that the selected pair is less independent, the CPU 206 or another processing device connects the selected pairs in parallel, and sets a new rejection value based on these rejection values in the same manner as in the method of the third embodiment. Determine. At this time, if the rejection values (referred to as r1 and r2) constituting the selected pair are made up of a plurality of rejection values in parallel, the CPU 206 or another processing device is the element constituting the rejection value. These rejection values are decomposed and connected in parallel, and new rejection values based on these rejection values are determined in the same manner as in the method of the third embodiment. For example, if r1 was originally configured by connecting two reject values s1 and s2 in parallel, the CPU 206 or another processing device decomposes r1 into the original reject values, and converts s1, s2, and r2 into Connect in parallel. As described above, the CPU 206 or another processing apparatus continues until there are finally no pairs that are determined to have low independence.

図１には、棄却値１と棄却値２が並列に繋がれ、棄却値３、棄却値４、棄却値５が並列に繋がれ、前者のセットと後者のセットが直列に繋がれている構成を示している。
図１の場合の棄却の構成を表で示すと図２１の表２１０３のようになる。表の各行（横方向）は並列の並びを示し、合成する棄却関数とその合成関数、列方向（縦方向）は直列での並びを示す。図１の場合には、まず棄却値１と棄却値２が並列に繋がれ、棄却値３、棄却値４、棄却値５が並列に繋がれているため、最初の行には、棄却関数１と棄却関数２が、次の行には棄却関数１、棄却関数２、棄却関数３が並んでいる。合成関数ｆ１、ｆ２は、例えば、上記で説明したｇ１を作成した方法により作ることができる。
In FIG. 1, a rejection value 1 and a rejection value 2 are connected in parallel, a rejection value 3, a rejection value 4, and a rejection value 5 are connected in parallel, and the former set and the latter set are connected in series. Is shown.
The rejection configuration in the case of FIG. 1 is shown as a table 2103 in FIG. Each row (horizontal direction) in the table indicates a parallel arrangement, and a rejection function to be combined and its synthesis function, and a column direction (vertical direction) indicates a series arrangement. In the case of FIG. 1, first, the rejection value 1 and the rejection value 2 are connected in parallel, and the rejection value 3, the rejection value 4, and the rejection value 5 are connected in parallel. The rejection function 2 and the rejection function 1, the rejection function 2, and the rejection function 3 are arranged in the next line. The synthesis functions f1 and f2 can be created, for example, by the method of creating g1 described above.

上記の実施例２、実施例３、実施例４において、並列演算装置が使える場合には、並列に並んだ棄却関数同士は並列に計算してもよい。また、直列に並んでいる場合であっても、次の棄却関数を計算できる場合には、計算しておいてもよい。その場合には、次の棄却関数の計算結果が不要になった場合にはその結果を捨てればよい。
In the second embodiment, the third embodiment, and the fourth embodiment, when the parallel arithmetic device can be used, the rejection functions arranged in parallel may be calculated in parallel. Moreover, even if they are arranged in series, they may be calculated if the next rejection function can be calculated. In that case, when the calculation result of the next rejection function becomes unnecessary, the result may be discarded.

４．実施例の効果

本実施例によると複数の棄却指標を組み合わせた棄却方式を自動的に構成することができる。これによって、複数の棄却指標を組み合わせるための人的コストを削減できる。また、本実施例によると、正読率を高水準に維持したまま、誤読率を削減することができ、精緻かつ高速な棄却方式を構成することができる。

また、本実施例では、複数の棄却指標を、棄却指標同士の独立性を基準として、独立性の高いもの同士は直列に、独立性の低いもの同士は並列に構成することによって、高正読率、低誤読率、高速な棄却方法を低い人的コストで提供することができる。
4). Effects of the embodiment

According to the present embodiment, a rejection method combining a plurality of rejection indexes can be automatically configured. Thereby, the human cost for combining a plurality of rejection indicators can be reduced. Further, according to the present embodiment, the misreading rate can be reduced while maintaining the correct reading rate at a high level, and a precise and high-speed rejection method can be configured.

Further, in this embodiment, a plurality of rejection indicators are configured in such a way that highly independent ones are configured in series, and those having low independence are configured in parallel on the basis of the independence between the rejection indicators. Rate, low misreading rate, and fast rejection method can be provided at low human cost.

５．付記
なお、本発明は上記した実施例に限定されるものではなく、様々な変形例が含まれている。例えば、上記した実施例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、ある実施例の構成の一部を他の実施例の構成に置き換えることが可能であり、また、ある実施例の構成に他の実施例の構成を加えることも可能である。また、各実施例の構成の一部について、他の構成の追加・削除・置換をすることが可能である。
また、上記の各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、上記の各構成、機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することによりソフトウェアで実現してもよい。各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリや、ハードディスク、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の記録装置、または、ＩＣカード、ＳＤカード、ＤＶＤ等の記録媒体に置くことができる。
また、制御線や情報線は説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。実際には殆ど全ての構成が相互に接続されていると考えてもよい。

本発明の文字認識方法又は文字認識装置・システムは、その各手順をコンピュータに実行させるための文字認識プログラム、文字認識プログラムを記録したコンピュータ読み取り可能な記録媒体、文字認識プログラムを含みコンピュータの内部メモリにロード可能なプログラム製品、そのプログラムを含むサーバ等のコンピュータ、等により提供されることができる。
5. Additional remarks The present invention is not limited to the above-described embodiments, and includes various modifications. For example, the above-described embodiments have been described in detail for easy understanding of the present invention, and are not necessarily limited to those having all the configurations described. Further, a part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of one embodiment. Further, it is possible to add, delete, and replace other configurations for a part of the configuration of each embodiment.
Each of the above-described configurations, functions, processing units, processing means, and the like may be realized by hardware by designing a part or all of them with, for example, an integrated circuit. Each of the above-described configurations, functions, and the like may be realized by software by interpreting and executing a program that realizes each function by the processor. Information such as programs, tables, and files for realizing each function can be stored in a memory, a hard disk, a recording device such as an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, or a DVD.
Further, the control lines and information lines indicate what is considered necessary for the explanation, and not all the control lines and information lines on the product are necessarily shown. Actually, it may be considered that almost all the components are connected to each other.

The character recognition method or character recognition apparatus / system of the present invention includes a character recognition program for causing a computer to execute each procedure, a computer-readable recording medium storing the character recognition program, and an internal memory of the computer including the character recognition program Can be provided by a program product that can be loaded on the computer, a computer such as a server including the program, and the like.

２０１文字認識装置
２０２入力装置
２０３表示装置
２０４イメージ取得装置
２０５通信装置
２０６演算装置（ＣＰＵ）
２０７外部記憶装置（ＨＤＤ、メモリ） 201 character recognition device 202 input device 203 display device 204 image acquisition device 205 communication device 206 arithmetic device (CPU)
207 External storage device (HDD, memory)

Claims

A plurality of rejection value calculation units for calculating a rejection value by a preset rejection function for the recognition result of the character identified from the input image;
One or more rejection determination units for determining whether or not to reject the recognition result based on one or more rejection values calculated by any one or more of the plurality of rejection value calculation units, respectively,
With
Using the plurality of rejection value calculation units combined based on the correlation of the plurality of rejection value calculation units, the rejection determination unit makes a rejection determination of the recognition result based on a plurality of rejection values, and rejects By rejecting the determined recognition result, the recognition result that is not determined to be rejected is stored in a storage unit or displayed on a display unit.

The character recognition device according to claim 1,
Character recognition, characterized in that, when it is determined to be rejected based on the rejection value calculated by the previous rejection value calculation unit, the calculation of the rejection value by the subsequent rejection value calculation unit is skipped. apparatus.

The character recognition device according to claim 1,
A character recognition device characterized in that the rejection value calculation unit that calculates a rejection value having a high rejection capability or rejection rate is arranged in advance and performs calculation processing.

The character recognition device according to claim 1,
A character recognition device, characterized in that a calculation process is performed by arranging in series the rejection value calculation units that calculate a rejection value with high independence of a plurality of rejection values.

The character recognition device according to claim 1,
A character recognition device characterized in that a calculation process is performed by arranging in parallel the rejection value calculation units for calculating a rejection value having a low independence of a plurality of rejection values.

The character recognition device according to claim 1,
The rejection value calculation unit for calculating a rejection value with high independence of a plurality of rejection values is arranged in series, and the rejection value calculation unit for calculating a rejection value with low independence is arranged in parallel to perform calculation processing. A character recognition device characterized by being configured to perform.

The character recognition device according to claim 1,
The rejection function is a function for calculating a rejection value that takes a high value for the recognition result desired to be rejected and takes a low value for the recognition result not desired to be rejected. Character recognition device.

The character recognition device according to claim 1,
Rejected image database that collects image samples you want to reject in advance,
It has a correct image database that collects image samples you want to read correctly,
One or more threshold values for determining rejection are determined in comparison with a rejection value so that the rejection rate based on the correct image database is relatively small and the rejection rate based on the rejection image database is relatively large. Character recognition device.

The character recognition device according to claim 1,
The higher the rejection efficiency or the rejection value calculation cost based on the rejection value calculation cost, the higher the rejection efficiency calculation unit, the earlier the rejection value calculation unit, and based on the rejection value calculated by the rejection determination unit by the rejection determination unit If it is determined to be rejected, the character recognition device is characterized by omitting the process of calculating the reject value by the subsequent reject value calculating unit.

The character recognition device according to claim 1,
A character recognition device, wherein a new rejection function is determined based on a rejection value of each of the plurality of rejection value calculation units arranged in parallel, and a rejection determination is performed based on the new rejection function.

The character recognition device according to claim 1,
Rejected image database that collects image samples you want to reject in advance,
It has a correct image database that collects image samples you want to read correctly,
Judging the independence of multiple rejection values,
A function having the rejection value as an argument for discriminating between the image sample stored in the reject image database and the image sample stored in the correct image database is learned by a function based on an identification error, and the identification error by the function And the discrimination error when the rejection value is configured in series, and if the difference between the two is greater than or equal to a predetermined threshold, it is determined that the independence is low, otherwise A character recognition device characterized by determining that the independence is high.

The character recognition device according to claim 1,
A rejection value is calculated in parallel by the plurality of rejection value calculation units arranged in parallel, and / or a rejection value is calculated in parallel by the plurality of rejection value calculation units arranged in series. A character recognition device.

The character recognition device according to claim 1,
A document imaging unit for obtaining a document image by optically scanning the document;
A preprocessing unit that removes noise and background from the document image and binarizes to generate a binary image;
A layout analysis unit for analyzing the document structure and chart structure of the binary image;
A character string extraction unit that extracts an image of a character string unit from the binary image;
A character cutout unit that cuts out character-by-character images from each of the extracted character string images;
A character identification unit for recognizing a character in an image of each character unit cut out by the character cutout unit and outputting the recognition result;

A recognition result selection unit that selects the recognition result of each character string image based on the recognition result by the character identification unit and the rejection determination result by the rejection determination unit;
Based on the recognition result, a retry determination unit that determines whether to perform recognition reprocessing;
A post-recognition processing unit for storing and / or outputting the recognition result to a display device;
A character recognition device comprising:

A character recognition method,
For a recognition result of characters identified from the input image, using a plurality of rejection value calculation units for calculating a rejection value by a preset rejection function,
Based on one or more rejection values calculated by any one or more of the plurality of rejection value calculation units, respectively, using one or more rejection determination units to determine whether to reject the recognition result,
Using the plurality of rejection value calculation units combined based on the correlation of the plurality of rejection value calculation units, the rejection determination unit makes a rejection determination of the recognition result based on a plurality of rejection values, and rejects A character recognition method comprising: rejecting the determined recognition result, and storing the recognition result that is not determined to be rejected in a storage unit or displaying the recognition result on a display unit.

A character recognition program,
The processing unit uses a plurality of rejection value calculation units, a function of calculating a rejection value by a preset rejection function for the recognition result of the character identified from the input image,
The processing unit rejects the recognition result based on one or a plurality of rejection values calculated by any one or a plurality of the rejection value calculation units using one or a plurality of rejection determination units. A function to determine whether or not
The processing unit uses the plurality of rejection value calculation units combined based on the correlation of the plurality of rejection value calculation units, and the rejection determination unit performs rejection determination of the recognition result based on the plurality of rejection values. A character recognition program for causing a computer to execute a function of storing the recognition result that is not determined to be rejected in a storage unit or displaying it on a display unit by rejecting the recognition result determined to be rejected.