JP2010061471A

JP2010061471A - Character recognition device and program

Info

Publication number: JP2010061471A
Application number: JP2008227492A
Authority: JP
Inventors: Daisuke Tatsumi; 大祐辰巳
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2008-09-04
Filing date: 2008-09-04
Publication date: 2010-03-18

Abstract

<P>PROBLEM TO BE SOLVED: To reduce worker's load of work for determining whether the results of character recognition are correct or erroneous. <P>SOLUTION: In a character recognition device, character recognition processing is performed for the scanned document image by using a character image segmenting part 12 and a character recognizing part 13 to generate character code, next a character image generating part 14 generates character image based on the character code, and then a character image comparing part 15 compares the character image generated based on the character code with the original character image segmented from the document images to output similarity. A correctness or error determining part 16 determines whether the results of recognition are correct or erroneous based on the degree of similarity. A recognition result outputting part 17 takes out the document image including character parts being typically likely to be erroneously recognized and generates the output image data highlighting the character images being likely to be erroneously recognized based on coordinate data of the character images being likely to be erroneously recognized to show it to a user. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

この発明は、文字認識装置およびプログラムに関する。 The present invention relates to a character recognition device and a program.

紙文書から情報を抽出するために文字認識（以下ＯＣＲともいう）装置が使われ、大変便利である。しかし、ＯＣＲ結果の精度は完璧ではない。入力画像の画質、文字切り取り条件、認識処理過程での画像変換、ＯＣＲ辞書など、さまざまな要因でＯＣＲ結果が変化する。現状、ＯＣＲ結果が正しいかどうか判断する決定的な手段はなく、最終的には人間（オペレータ：正誤判定者）が元文書や元画像と照会・比較して判断せざるを得ない。そのため、オペレータは入力された文書（画像）とＯＣＲ結果とをすべてチェックする必要があり、オペレータにとって大変な負荷となっている。 A character recognition (hereinafter also referred to as OCR) device is used to extract information from a paper document, which is very convenient. However, the accuracy of OCR results is not perfect. The OCR result changes due to various factors such as the image quality of the input image, character cutting conditions, image conversion in the recognition process, OCR dictionary, and the like. At present, there is no definitive means for determining whether or not the OCR result is correct, and ultimately a human (operator: correctness / incorrectness determination person) is forced to make an inquiry / comparison with the original document or original image. Therefore, it is necessary for the operator to check all the input document (image) and the OCR result, which is a heavy load on the operator.

なお、この発明と関連する特許文献としては以下のものがある。 Patent documents related to the present invention include the following.

特許文献１は、全ページのサムネイル表示の中で認識できなかった手書き文字を含むファイルを別の色で表示し、どこに（どこのフォルダに）認識不可能な文字があるのかをわかりやすくすることを提案している。ただし、文字認識（ＯＣＲ）は、たとえ何らかの文字コードが得られてもその文字コードが正しいとは限らない。そのため、正確性を求めると、結局はオペレータがすべて確認するという必要があった。 Patent Document 1 displays a file containing handwritten characters that could not be recognized in the thumbnail display of all pages in a different color so that it is easy to understand where (in which folder) there are unrecognizable characters. Has proposed. However, in character recognition (OCR), even if some character code is obtained, the character code is not always correct. For this reason, when seeking accuracy, it was necessary for the operator to confirm all of the results.

特許文献２は、文字画像を符号化する際に、文字コードと文字画像との組をインデックスの下で登録辞書に登録し、符号化対象の文字画像を文字認識し、認識結果の文字コードに対応して登録辞書に登録されている文字画像を読み出し、その後、符号化対象の文字画像と突き合わせて、一致度合いを判定し、一致度合いが基準以上であれば該当するインデックスを用いて符号化対象の文字画像を符号化し、一致度合いが基準未満であれば、当該符号化対象の文字画像を新たなインデックスの下で登録辞書に登録した後、当該インデックスで符号化することを提案している。インデックスのストリームおよび登録辞書は復号側に送られて復号に用いられる。 In Patent Document 2, when a character image is encoded, a set of a character code and a character image is registered in a registration dictionary under an index, and the character image to be encoded is recognized as a character code. Correspondingly, the character image registered in the registration dictionary is read out, and then matched with the character image to be encoded to determine the matching degree. If the matching degree is equal to or higher than the reference, the corresponding index is used for encoding. If the matching degree is less than the standard, the encoding target character image is registered in a registration dictionary under a new index, and is then encoded using the index. The index stream and registration dictionary are sent to the decoding side and used for decoding.

引用文献３および引用文献４は、文字認識処理を経て文字画像と文字コードとの対応付けが完了したリストを用い、リスト中の文字画像と画像参照することによって文字認識することを提案している。 Citation 3 and Citation 4 propose to recognize a character by referring to a character image in the list using a list in which a character image and a character code are associated with each other through character recognition processing. .

これらの提案は、文字認識結果の正誤を判定するものではない。
特開２００６−１０６９０５号公報特開２００５−３０１６６２号公報特開平５−２０５００号公報特開２０００−９０２００号公報 These proposals do not determine whether the character recognition result is correct or incorrect.
JP 2006-106905 A Japanese Patent Laying-Open No. 2005-301662 JP-A-5-20500 JP 2000-90200 A

この発明は、作業者による、文字認識結果の正誤判定作業量を低減する文字認識技術を提供することを目的としている。 An object of the present invention is to provide a character recognition technique for reducing the amount of correctness determination work of a character recognition result by an operator.

請求項１の発明によれば、文字認識装置に：認識対象の文字画像に対して文字認識処理を実行して文字コードを生成する文字認識手段と；上記文字認識手段が生成した文字コードから文字画像を生成する文字画像生成手段と；上記認識対象の文字画像と上記文字画像生成手段が生成した文字画像とを比較する画像比較手段と；上記文字画像比較手段の比較出力に基づいて上記文字認識手段の認識結果の正誤を判定する判定手段とを設けるようにしている。 According to the first aspect of the present invention, the character recognition device includes: a character recognition unit that executes character recognition processing on a character image to be recognized to generate a character code; and a character from the character code generated by the character recognition unit. A character image generating means for generating an image; an image comparing means for comparing the character image to be recognized with the character image generated by the character image generating means; and the character recognition based on a comparison output of the character image comparing means. And determining means for determining the correctness of the recognition result of the means.

請求項２の発明によれば、請求項１の文字認識装置において、上記判定手段が、認識結果に誤りがあると判定した上記認識対象の文字画像を、上記認識対象の文字画像が含まれるページ画像において、他と区別して表示する表示手段をさらに有するようにしている。 According to a second aspect of the present invention, in the character recognition device according to the first aspect, the recognition target character image determined by the determination means to be erroneous in the recognition result is included in the page including the recognition target character image. The image further includes display means for displaying the image separately from the others.

請求項３の発明によれば、請求項１または２記載の文字認識装置において、上記画像比較手段は、上記文字認識手段が文字認識に採用する特徴量と異なる種類の特徴量を採用して上記認識対象の文字画像と上記文字画像生成手段が生成した文字画像とを比較するようにしている。 According to a third aspect of the present invention, in the character recognition device according to the first or second aspect, the image comparison unit adopts a feature amount of a type different from the feature amount employed by the character recognition unit for character recognition. The character image to be recognized is compared with the character image generated by the character image generating means.

請求項４の発明によれば、請求項１〜３のいずれかに記載の文字認識装置において、上記文字画像生成手段は、異なる種類の文字フォントの文字画像を生成し、上記画像比較手段は上記認識対象の文字画像と上記異なる種類の文字フォントの文字画像とを比較するようにしている。 According to a fourth aspect of the present invention, in the character recognition device according to any one of the first to third aspects, the character image generation unit generates a character image of a different type of character font, and the image comparison unit The character image to be recognized is compared with the character image of the different type of character font.

請求項５によれば、請求項４記載の文字認識装置において、上記画像比較手段が採用する上記文字フォントの種類を学習により絞り込むようにしている。 According to a fifth aspect of the present invention, in the character recognition device according to the fourth aspect, the type of the character font employed by the image comparison means is narrowed down by learning.

請求項６の発明によれば、請求項５記載の文字認識装置において、上記画像比較手段は上記認識対象の文字画像との相関が低い文字フォントの種類から除外して絞り込みを行うようにしている。 According to a sixth aspect of the present invention, in the character recognition device according to the fifth aspect, the image comparison means narrows down by excluding the character font type having a low correlation with the character image to be recognized. .

請求項７の発明によれば、請求項５または６記載の文字認識装置において、上記画像比較手段が採用する上記文字フォントの種類は文字のオブジェクト種別ごとに異なるようにしている。 According to a seventh aspect of the present invention, in the character recognition device according to the fifth or sixth aspect, the type of the character font employed by the image comparing means is different for each character object type.

請求項８の発明によれば、請求項５〜７のいずれかに記載の文字認識装置において、上記学習は、出現頻度に基づいて選択した文字の文字画像に対して行われるようにしている。 According to an eighth aspect of the present invention, in the character recognition device according to any of the fifth to seventh aspects, the learning is performed on a character image of a character selected based on the appearance frequency.

請求項９の発明によれば、請求項５〜７のいずれかに記載の文字認識装置上記学習は、上記文字認識手段の尤度に基づいて選択した文字の文字画像に対して行われるようにしている。 According to a ninth aspect of the present invention, the character recognition device according to any one of the fifth to seventh aspects is such that the learning is performed on a character image of a character selected based on the likelihood of the character recognition means. ing.

請求項１０の発明によれば、コンピュータを：認識対象の文字画像に対して文字認識処理を実行して文字コードを生成する文字認識手段；上記文字認識手段が生成した文字コードから文字画像を生成する文字画像生成手段；上記認識対象の文字画像と上記文字画像生成手段が生成した文字画像とを比較する画像比較手段；上記文字画像比較手段の比較出力に基づいて上記文字認識手段の認識結果の正誤を判定する判定手段として機能させるための文字認識用プログラムを実現するようにしている。 According to the invention of claim 10, the computer: character recognition means for generating a character code by executing character recognition processing on the character image to be recognized; generating a character image from the character code generated by the character recognition means A character image generating means for comparing; an image comparing means for comparing the character image to be recognized with the character image generated by the character image generating means; a recognition result of the character recognizing means based on a comparison output of the character image comparing means; A character recognition program for functioning as a determination means for determining correctness is realized.

この発明の上述の側面および他の側面は特許請求の範囲に記載され以下実施例を用いて詳述される。 These and other aspects of the invention are set forth in the appended claims and will be described in detail below with reference to examples.

請求項１の発明によれば、文字認識結果の正誤判定作業量を低減することができる。 According to the first aspect of the present invention, the correctness / incorrectness determination work amount of the character recognition result can be reduced.

請求項２の発明によれば、原稿画像中の認識誤りに関連する文字画像を即座に把握できる。 According to the invention of claim 2, it is possible to immediately grasp the character image related to the recognition error in the document image.

請求項３の発明によれば、認識の正誤判定において、文字認識と同様の原因で誤りが混入することを阻止できる。 According to the third aspect of the present invention, it is possible to prevent an error from being mixed for the same reason as the character recognition in the recognition correctness determination.

請求項４の発明によれば、原稿中の文字のフォント種類に左右されて文字認識の正誤判定が不正確になることを阻止できる。 According to the invention of claim 4, it is possible to prevent the accuracy of character recognition from being determined inaccurately depending on the font type of characters in the document.

請求項５の発明によれば、原稿中の文字のフォント種類に左右されて文字認識の正誤判定が不正確になることを阻止できるうえに、文字画像の突き合わせ工数を減少させることができる。 According to the fifth aspect of the present invention, it is possible to prevent the character recognition correct / incorrect determination from being inaccurate depending on the font type of the characters in the document, and to reduce the number of character image matching steps.

請求項６の発明によれば、原稿中の文字のフォント種類に左右されて文字認識の正誤判定が不正確になることを阻止できるうえに、文字画像の突き合わせ工数を減少させることができ、しかも、急激なフォント種類数の絞り込みによる誤判定を阻止できる。 According to the invention of claim 6, it is possible to prevent the character recognition correct / incorrect determination from being inaccurate depending on the font type of the characters in the document, and to reduce the matching man-hours of the character images. , It can prevent misjudgment due to drastic narrowing of the number of font types.

請求項７の発明によれば、原稿の領域ごとの文字のフォント種類に左右されて文字認識の正誤判定が不正確になることを阻止できる。 According to the seventh aspect of the present invention, it is possible to prevent the character recognition correctness / incorrectness determination from being inaccurate depending on the character font type for each area of the document.

請求項８の発明によれば、フォント種類の最適化の工数を低減できる。 According to the invention of claim 8, the man-hour for optimization of the font type can be reduced.

請求項９の発明によれば、フォント種類の最適化の工数を低減できる。 According to the invention of claim 9, the man-hour for optimization of the font type can be reduced.

請求項１０の発明によれば、文字認識結果の正誤判定作業量を低減することができる。 According to the invention of claim 10, it is possible to reduce the correctness determination work amount of the character recognition result.

以下、この発明の実施例について説明する。 Examples of the present invention will be described below.

まず、この発明の実施例１について説明する。 First, Embodiment 1 of the present invention will be described.

図１は、この発明の実施例１の文字認識システム４００を全体として示している。１つのコンピュータシステム、典型的にはパーソナルコンピュータ２００を用いて文字認識システム４００を実装しているけれども、通信ネットワークを介して接続された複数のコンピュータシステムを用いて文字認識システム４００を実現してもよい。また、種々の情報機器を用いて文字認識システム４００を実現してもよい。 FIG. 1 shows a character recognition system 400 according to a first embodiment of the present invention as a whole. Although the character recognition system 400 is implemented using one computer system, typically a personal computer 200, the character recognition system 400 may be realized using a plurality of computer systems connected via a communication network. Good. Further, the character recognition system 400 may be realized using various information devices.

図１において、画像読取装置１００およびパーソナルコンピュータ２００が通信ネットワーク３００に接続されている。画像読取装置１００は、原稿を光学的に走査（以下ではスキャンともいう）して原稿画像を取得し、典型的には自動原稿送り機構を具備する。記録媒体２０１や通信媒体を用いて、パーソナルコンピュータ２００に文字認識用のプログラムをインストールして文字認識システム４００を実現する。パーソナルコンピュータ２００は、周知のＣＰＵ、主メモリ、バス、入出力装置等を含み、そのハードウェア資源およびプログラムを協働させて文字認識システム４００の各種機能を実現する。各種機能は図において機能ブロックとして示す。 In FIG. 1, an image reading apparatus 100 and a personal computer 200 are connected to a communication network 300. The image reading apparatus 100 optically scans a document (hereinafter also referred to as scanning) to acquire a document image, and typically includes an automatic document feeding mechanism. The character recognition system 400 is realized by installing a character recognition program in the personal computer 200 using the recording medium 201 or the communication medium. The personal computer 200 includes a known CPU, main memory, bus, input / output device, and the like, and realizes various functions of the character recognition system 400 by cooperating hardware resources and programs thereof. Various functions are shown as functional blocks in the figure.

文字認識システム４００は、画像入力部１０、画像記憶部１１、文字画像切出部１２、文字認識部１３、文字画像生成部１４、文字画像比較部１５、正誤判定部１６、認識結果出力部１７、表示部１８等を含んで構成される。 The character recognition system 400 includes an image input unit 10, an image storage unit 11, a character image cutout unit 12, a character recognition unit 13, a character image generation unit 14, a character image comparison unit 15, a correctness determination unit 16, and a recognition result output unit 17. The display unit 18 and the like are included.

画像入力部１０は、画像読取装置１００から通信ネットワーク３００を介して受け取った原稿画像を画像記憶部１１に記憶する。原稿画像は例えばページ単位に管理される。画像入力部１０は、ページ単位またはページ部分単位の画像データを文字画像切出部１２に供給する。文字画像切出部１２は、画像データから各文字画像を切り出し、文字画像のデータおよび座標データを文字認識部１３に供給する。文字認識部１３は、任意の手法で文字画像を分析して文字コード（文字識別子）を生成するものであり、標準テンプレートとのパターンマッチング、種々の特徴量、典型的には方向寄与度、ストローク特徴量等を用いて文字認識を行う。文字認識結果（文字コード）は、切り出し位置の座標データとともに認識結果出力部１７に供給される。文字認識部１３からの文字認識結果（文字コード）は文字画像生成部１４にも供給される。文字画像生成部１４はフォントデータを保持し、文字コードに対応するフォントデータ（輪郭データまたは画素データ等）から文字画像データを生成して出力する。フォントデータから生成された文字画像データは、文字画像比較部１５に送られる。文字画像比較部１５は、原稿画像から切り出された文字画像と、認識結果の文字コードから生成された文字画像とを比較する。文字画像比較部１５の比較手法は文字認識部１３の文字認識手法と異なる手法で比較を行う。比較手法は、これに限定されないが、画素の分布、黒画素の接点の位置関係、線分等の特徴量に基づく。例えば、手書き数字の文字認識で、図３に示すような黒画素の分布（所定領域に黒画素がない）により文字認識を行う場合には、図４に示すようなテンプレートマッチングで比較を行う。文字画像比較部１５の比較結果出力は類似度（相関度または一致度ともいう）として出力される。 The image input unit 10 stores the document image received from the image reading apparatus 100 via the communication network 300 in the image storage unit 11. The document image is managed in units of pages, for example. The image input unit 10 supplies image data in page units or page part units to the character image cutout unit 12. The character image cutout unit 12 cuts out each character image from the image data and supplies the character image data and the coordinate data to the character recognition unit 13. The character recognition unit 13 generates a character code (character identifier) by analyzing a character image by an arbitrary method, and performs pattern matching with a standard template, various feature amounts, typically direction contributions, strokes Character recognition is performed using feature quantities and the like. The character recognition result (character code) is supplied to the recognition result output unit 17 together with the coordinate data of the cutout position. The character recognition result (character code) from the character recognition unit 13 is also supplied to the character image generation unit 14. The character image generation unit 14 holds font data, generates character image data from font data (contour data, pixel data, or the like) corresponding to the character code, and outputs the character image data. The character image data generated from the font data is sent to the character image comparison unit 15. The character image comparison unit 15 compares the character image cut out from the document image with the character image generated from the character code as the recognition result. The comparison method of the character image comparison unit 15 performs comparison using a method different from the character recognition method of the character recognition unit 13. The comparison method is not limited to this, but is based on a feature amount such as a pixel distribution, a positional relationship between black pixel contacts, and a line segment. For example, when character recognition is performed using a black pixel distribution (no black pixels in a predetermined area) as shown in FIG. 3 for character recognition of handwritten numerals, comparison is performed using template matching as shown in FIG. The comparison result output from the character image comparison unit 15 is output as a similarity (also referred to as a correlation or coincidence).

正誤判定部１６は、文字画像比較部１５からの相関度または一致度が、予め定めた閾値を超える場合には文字認識が正しいと判別し、そうでない場合には誤りであると判別する。 The correctness / incorrectness determination unit 16 determines that the character recognition is correct when the degree of correlation or the degree of coincidence from the character image comparison unit 15 exceeds a predetermined threshold, and determines that the character recognition is incorrect otherwise.

認識結果出力部１７は、認識が間違った文字画像の位置画像およびページ単位またはその一部の原稿画像を受け取って、認識が間違った文字画像の部分をハイライトした原稿画像を生成して表示部１８に供給する。表示部１８は、認識が間違った文字画像の部分をハイライトした原稿画像をユーザに提示する。認識結果出力部１７は、典型的には、文字コード、座標位置、認識の正誤のフラグを含むレコードを文字画像ごとに保持して、これに基づいて、認識が間違った文字画像の部分をハイライトした原稿画像を生成するが、これに限定されない。 The recognition result output unit 17 receives a position image of a character image that is incorrectly recognized and a document image of a page unit or a part thereof, generates a document image that highlights the character image portion that is incorrectly recognized, and displays a display unit. 18 is supplied. The display unit 18 presents to the user a document image that highlights a portion of the character image that is incorrectly recognized. The recognition result output unit 17 typically holds a record including a character code, a coordinate position, and a recognition correct / incorrect flag for each character image, and based on this, a portion of the character image that has been recognized incorrectly is high. Although the written original image is generated, the present invention is not limited to this.

図２は、図１の実施例の動作例を説明するものである。 FIG. 2 illustrates an operation example of the embodiment of FIG.

主に図２を参照する。まずスキャンした原稿画像（文書画像ともいう）を文字画像切出部１２および文字認識部１３を用いて文字認識処理し、文字コードを生成する（Ｓ１０）。つぎに文字画像生成部１４が文字コードに基づいて文字画像を生成する（Ｓ１１）。つぎに文字画像比較部１５が、文字コードに基づいて生成された文字画像と、原稿画像から切り出したオリジナルの文字画像とを比較して類似度を出力する（Ｓ１２）。正誤判定部１６は類似度に基づいて認識結果の正誤を判定する（Ｓ１３）。認識結果出力部１７は、典型的には誤認識のある文字の部分を含む原稿画像を取り出し、当該誤認識のある文字画像の座標データを基づいて、誤認識のある文字画像をハイライトさせた出力画像データを生成してユーザに表示する（Ｓ１５）。なお、この例では、誤りのある文書、ページ、ページ部分のみを表示して訂正作業に不必要な原稿画像部分が表示されないようにして訂正作業量を低減しているけれども、認識の誤りが判定されないページ等も表示されてもよい。 Reference is mainly made to FIG. First, a scanned document image (also referred to as a document image) is subjected to character recognition processing using the character image cutout unit 12 and the character recognition unit 13 to generate a character code (S10). Next, the character image generation unit 14 generates a character image based on the character code (S11). Next, the character image comparison unit 15 compares the character image generated based on the character code with the original character image cut out from the document image, and outputs the similarity (S12). The correctness determination unit 16 determines the correctness of the recognition result based on the similarity (S13). The recognition result output unit 17 typically takes out a document image including a misrecognized character part, and highlights the misrecognized character image based on the coordinate data of the misrecognized character image. Output image data is generated and displayed to the user (S15). In this example, only the erroneous document, page, and page portion are displayed so that the manuscript image portion unnecessary for the correction operation is not displayed to reduce the amount of correction work, but the recognition error is determined. Pages that are not displayed may also be displayed.

図３および図４は、簡略化して事例を示す。 3 and 4 show a simplified example.

図３では、予め定められた領域に画素があるかどうかで手書き数字の文字画像を文字認識する。図３（ａ）に梨地で示す部分に黒画素がない場合には「６」と文字認識するようになっている。図３（ｂ）は「６」を正しく「６」と認識する例を示す。ただし、図３（ｃ）に示す「５」の文字画像の場合、梨地部分に黒画素がないので、「６」と誤認識される。 In FIG. 3, a character image of a handwritten numeral is recognized based on whether or not there is a pixel in a predetermined area. In the case where there is no black pixel in the portion indicated by the satin in FIG. 3A, the character is recognized as “6”. FIG. 3B shows an example in which “6” is correctly recognized as “6”. However, in the case of the character image “5” shown in FIG. 3C, since there is no black pixel in the satin portion, it is erroneously recognized as “6”.

図４は文字画像の比較手法を示す。この例ではテンプレートマッチングを行って類似度（相関度）を求める。「６」の切出文字画像は「６」のテンプレートと相関が大きいので、認識結果は正しいと判定される。これに対して、「５」の切出文字画像は「６」のテンプレートと相関が小さいので認識結果は誤りと判定される。そして、図５に示すように認識の誤りがある「５」の文字画像部分が他の部分と別の表示属性でハイライト表示される。 FIG. 4 shows a method for comparing character images. In this example, template matching is performed to obtain the similarity (correlation). Since the extracted character image “6” has a large correlation with the template “6”, it is determined that the recognition result is correct. On the other hand, since the extracted character image “5” has a small correlation with the template “6”, the recognition result is determined to be an error. Then, as shown in FIG. 5, the character image portion “5” having a recognition error is highlighted with a display attribute different from that of the other portions.

なお、認識結果の文字コードが、座標位置データを伴っていれば、原稿画像の各文字画像と認識結果の各文字コードとを対応させることができ、例えば、ハイライト表示された部分をポインティングデバイスで指定するとその座標データに基づいて認識結果のテキスト表示において該当部分が同様にハイライト表示され、作業者がテキスト表示において訂正を行うことができる。 If the character code of the recognition result is accompanied by the coordinate position data, each character image of the document image can be associated with each character code of the recognition result. For example, the highlighted portion is displayed as a pointing device. Is designated in the text display of the recognition result on the basis of the coordinate data, the corresponding portion is similarly highlighted, and the operator can correct the text display.

また、認識結果の文字コードが正誤のフラグを伴っていれば、認識結果を表示したときに認識誤りのある部分をハイライトさせることができ、この部分を例えばポインティングすることにより、原稿画像において該当部分を同様にハイライトさせることができる。 Also, if the character code of the recognition result is accompanied by a correct / incorrect flag, it is possible to highlight a portion having a recognition error when the recognition result is displayed, and by pointing this portion, for example, the corresponding in the original image. Parts can be highlighted as well.

この実施例においては、複写機など画像入力機能を持つ機器を用いて紙文書をスキャンする。スキャンした文書画像に対し文字認識機能を持つ装置を使ってＯＣＲ処理をすることで、文字コードを得る。得られた文字コードから文字画像を生成する。一方、スキャンした文書画像から文字画像を切り出す。こうして得られた２種類の文字画像（ＯＣＲ結果の文字コードから生成した文字画像・スキャンした文書画像から切り出した文字画像）とを比較する。比較した結果、両者の類似度が高ければ、ＯＣＲ装置の文字認識結果は正しいと判定される。逆に、類似度が低ければ、ＯＣＲ装置の文字認識結果は誤りの可能性ありと判定される。オペレータ（正誤判定者）への提示は、典型的には、誤りの可能性ありと判定された文字を含む文書のみとし、同時に、誤りの可能性ありと判定された文字を明示する。従来、オペレータは、すべての文書においてすべての領域で文字認識結果の正誤を確認する必要があったが、この実施例の処理では、オペレータは、明示された文字のみを確認するだけでよい。 In this embodiment, a paper document is scanned using a device having an image input function such as a copying machine. A character code is obtained by performing OCR processing on the scanned document image using a device having a character recognition function. A character image is generated from the obtained character code. On the other hand, a character image is cut out from the scanned document image. The two types of character images thus obtained (a character image generated from the character code of the OCR result and a character image cut out from the scanned document image) are compared. As a result of the comparison, if the similarity between the two is high, it is determined that the character recognition result of the OCR device is correct. Conversely, if the degree of similarity is low, it is determined that there is a possibility of error in the character recognition result of the OCR device. The presentation to the operator (right / wrong judgment person) is typically only a document including a character that is determined to have an error, and at the same time, the character that has been determined to have an error is clearly indicated. Conventionally, the operator has been required to confirm the correctness of the character recognition result in all the regions in all the documents. However, in the processing of this embodiment, the operator need only confirm the specified characters.

つぎにこの発明の実施例２の文字認識システム４０１について説明する。 Next, a character recognition system 401 according to the second embodiment of the present invention will be described.

実施例２は、図１の実施例１の文字認識システム４００の構成に加えて、文字画像生成部１４により生成される文字画像を調整する生成制御部１９と、原稿画像から切り出した文字画像を種々のオペレーションを施す文字画像変形部２０を設けている。 In the second embodiment, in addition to the configuration of the character recognition system 400 of the first embodiment shown in FIG. 1, a generation control unit 19 that adjusts a character image generated by the character image generation unit 14, and a character image cut out from a document image A character image deformation unit 20 for performing various operations is provided.

図６は実施例２の文字認識システム４０１を全体として示しており、この図において図１と対応する箇所には対応する符号を付した。 FIG. 6 shows the character recognition system 401 of the second embodiment as a whole. In this figure, portions corresponding to those in FIG.

この実施例では、複数種類のフォントデータを用意し、切り出した文字画像を各種のフォントセットの各文字種の文字画像と突き合わせるようにして原稿の文字フォントに左右されることなく精度よく文字認識結果の正誤を判定できるようにしている。また、文字認識の当初では幅広く各種のフォント種類の文字画像と突き合わせる一方で、その過程で一致する尤度が大きいフォント種類を絞り込み、徐々に突き合わせの作業量を低減するようにしている。 In this embodiment, a plurality of types of font data are prepared, and the character recognition result is accurately obtained without depending on the character font of the document by matching the cut-out character image with the character image of each character type of various font sets. It is possible to judge the correctness of At the beginning of character recognition, while matching a wide variety of character images of various font types, a font type having a high likelihood of matching in the process is narrowed down to gradually reduce the amount of matching work.

また、切り出した文字画像に対して２値化閾値処理、細線化、膨張／収縮処理等を行って、適正な文字画像の比較を行えるようにしている。 In addition, binarization threshold processing, thinning, expansion / contraction processing, and the like are performed on the cut-out character image so that proper character images can be compared.

以下、詳細に説明する。 Details will be described below.

図６において、生成制御部１９は、文字画像生成部１４が生成する文字画像のフォント種類を指示する。生成制御部１９は、認識結果の正誤およびどのフォント種別での比較による正誤かを示す正誤データを正誤判定部１６から受け取り、生成するフォント種類の適合化（学習）を行う。この点については図７〜図９を参照して後に詳述する。生成するフォント種類の適合化により文字画像の比較に使用するフォント種類の数を絞り込んで突き合わせ処理量を低減させる。 In FIG. 6, the generation control unit 19 instructs the font type of the character image generated by the character image generation unit 14. The generation control unit 19 receives correct / incorrect data indicating whether the recognition result is correct and whether the font type is correct or incorrect from the correct / incorrect determination unit 16, and adapts (learns) the generated font type. This point will be described later in detail with reference to FIGS. By adapting the font types to be generated, the number of font types used for comparing character images is narrowed down to reduce the matching processing amount.

文字画像変形部２０は、正誤判定部１６から認識結果の正誤を表す正誤データを受け取って、切り出した文字画像に変形を加えて比較の最適化を行う。例えば、切り出した文字画像が太字の場合には通常では一致度が小さくなる恐れがあるが、これを細線化することにより十分の一致度を維持できるようにし、間違って認識が誤りであると判定しないようにできる。また、原稿のスキュー等を考慮して文字画像を回転処理させてもよい。 The character image deforming unit 20 receives correct / incorrect data indicating the correctness of the recognition result from the correct / incorrect determination unit 16 and applies deformation to the cut out character image to optimize comparison. For example, if the cut-out character image is bold, the degree of coincidence may normally be small, but by thinning this, it is possible to maintain a sufficient degree of coincidence and erroneously determine that the recognition is wrong You can avoid it. Further, the character image may be rotated in consideration of the skew of the document.

つぎに図７を参照してフォントの最適化全般について説明する。 Next, the overall font optimization will be described with reference to FIG.

図７は複数のフォント種類のフォントデータから生成した文字画像を用いて比較を行う例を示す。この例では、最初は、すべてのフォントの文字画像を生成して順次に文字画像の比較を行う。図の例では、切り出した文字画像を文字認識して「あ」の文字コードを得て、これに対する各種フォントの「あ」の文字画像を出力して順次に突き合わせる。 FIG. 7 shows an example in which comparison is performed using character images generated from font data of a plurality of font types. In this example, first, character images of all fonts are generated, and character images are sequentially compared. In the example shown in the figure, the character image of the clipped character image is recognized to obtain the character code “A”, the character images “A” of various fonts corresponding thereto are output and sequentially matched.

文字認識の進捗に合わせて、１文字の認識ごと、あるいは所定数の文字の認識ごとに（認識ステージともいう）、使用するフォント数を漸次に低減させてよい。例えば、所望フォント数、典型的には１になるまで認識ステージごとにフォント数を半分に減らしてゆく。その際、類似度が大きな順にフォントを残す。 In accordance with the progress of character recognition, the number of fonts to be used may be gradually reduced every time one character is recognized or every time a predetermined number of characters are recognized (also referred to as a recognition stage). For example, the number of fonts is reduced by half for each recognition stage until the desired number of fonts, typically 1, is reached. At that time, fonts are left in descending order of similarity.

出現頻度の大きな文字のみを基準にしてフォントの最適化を行うようにしてもよい。 You may make it optimize a font only on the character with a high appearance frequency.

図８は、レイアウトを解析してオブジェクトを判定して文字画像が本文、小節見出し、タイトル等のいずれに該当するかを判別してそのオブジェクト種類ごとにフォントの最適化を行う例を示す。この例でも、出現頻度の大きな文字のみを基準にしてフォントの最適化を行うようにしてもよい。 FIG. 8 shows an example in which an object is determined by analyzing a layout to determine whether a character image corresponds to a body text, a bar title, a title, or the like, and font optimization is performed for each object type. In this example as well, the font may be optimized based only on characters having a high appearance frequency.

図９は、文字認識の確信度（尤度）が大きな文字画像を基準にしてフォントの最適化を行う例を示す。この例では、確信度が大きな文字とその座標位置を記憶しておき、これを基準にしてフォント種類の最適化（絞り込み）を行う。 FIG. 9 shows an example in which font optimization is performed on the basis of a character image having a high certainty (likelihood) for character recognition. In this example, a character with a high certainty factor and its coordinate position are stored, and font type optimization (narrowing) is performed based on this character.

図１０は文字画像の比較例を示す。この例では、「あ」という文字画像を切り出し、これに対して文字認識結果が、それぞれ「あ」、「め」、「お」である場合を示しており、切り出した文字画像と「あ」、「め」、または「お」の文字コードから生成した文字画像とを比較する。この比較は、文字認識で採用した手法と異なる手法で類似度を算出する。典型的な算出手法は上述のとおり画素の分布、接点の位置関係、線分等に基づく。ところで、認識結果が正しい「あ」の場合でも、切り出した文字画像の体裁によっては類似度が小さくなり、誤って誤認識と判定することもありえる。このため、切り出した文字画像を文字画像変形部２０により前処理して類似度が誤って小さくならないようにする。前処理は、例えば、２値化処理、細線化、膨張／収縮処理、回転等であるがこれに限定されない。 FIG. 10 shows a comparative example of character images. In this example, the character image “a” is cut out, and the character recognition results are “a”, “me”, and “o”, respectively. , “Me”, or “O” character images generated from character codes are compared. In this comparison, the similarity is calculated by a method different from the method used in character recognition. As described above, a typical calculation method is based on pixel distribution, contact positional relationship, line segment, and the like. By the way, even when the recognition result is correct “A”, depending on the appearance of the cut-out character image, the degree of similarity may be small, and it may be erroneously determined as erroneous recognition. For this reason, the character image that has been cut out is preprocessed by the character image deforming unit 20 so that the similarity is not accidentally reduced. Pre-processing includes, for example, binarization processing, thinning, expansion / contraction processing, rotation, and the like, but is not limited thereto.

なお、この発明は特許請求の範囲の記載に基づいて決定されるものであり、実施例の具体的な構成、課題、および効果には限定されない。この発明は上述の実施例に限定されるものではなくその趣旨を逸脱しない範囲で種々変更が可能である。以下、この発明の技術的特徴の一部を列挙しておく。もちろん、この発明は以下の技術的特徴に限定されない。 In addition, this invention is determined based on description of a claim, and is not limited to the specific structure of the Example, a subject, and an effect. The present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the spirit of the present invention. Hereinafter, some of the technical features of the present invention will be listed. Of course, the present invention is not limited to the following technical features.

文字画像の比較処理で用いる画像特徴量は、典型的には、ＯＣＲ処理で用いた特徴量と異なるものを用いるが、同一であってもよい。 The image feature amount used in the character image comparison processing is typically different from the feature amount used in the OCR processing, but may be the same.

文字認識結果が正しいと判断された入力文字画像（スキャン画像から切り出した文字画像）を順次重ね合わせていき、その後の比較・判定は、この重ね合わせた文字画像を用いてもよい。 The input character images (character images cut out from the scanned images) determined to have the correct character recognition result may be sequentially superimposed, and the superimposed character images may be used for subsequent comparison / determination.

ＯＣＲ後の文字コードから画像を生成するとき、装置内（ＰＣなど）にあるすべてのフォントに対して文字画像を作成し、比較文字数を増やしていきながら、類似度の低いフォントから順に候補として落としていき、最終的に１種類のフォントに絞る。 When generating images from character codes after OCR, create character images for all fonts in the device (PC, etc.), increase the number of comparison characters, and drop them as candidates in descending order of similarity. Continue to focus on one type of font.

文字認識をしたとき、出現文字頻度情報を取得し、出現頻度の高い文字を使ってフォントを決定する。出現頻度の高い文字を重なるとノイズの少ない文字画像が得られるので、文字コードから生成した文字画像との類似度算出の精度が高い。 When character recognition is performed, appearance character frequency information is acquired, and a font is determined using characters having a high appearance frequency. Since a character image with less noise can be obtained by overlapping characters with high appearance frequency, the accuracy of similarity calculation with a character image generated from a character code is high.

文字認識をしたとき、認識確信度の高い文字を使ってフォントを決定する。認識確信度が最も高かった文字コードを使ってフォントが決定できない場合、認識確信度２番目、３番目、…の文字コードを使ってフォントを絞り込む。 When character recognition is performed, the font is determined using characters with high recognition confidence. If the font cannot be determined using the character code having the highest recognition certainty, the font is narrowed down using the second, third,.

スキャンした文書に対してオブジェクト解析を行い、タイトル・章節・本文など、オブジェクト種別ごとに候補となるフォントを切り替えて判定する。同じ文字コードでもタイトルのときはボールドフォント、本文のときは明朝フォントのように、各オブジェクトで最適と推察されるフォントを用いて比較判定を行う。 Object analysis is performed on the scanned document, and a candidate font such as a title, a chapter, or a body is switched and determined for each object type. Even with the same character code, comparison and determination are performed using a font that is presumed to be optimal for each object, such as bold font for titles and Mincho font for text.

出現頻度の低い文字コードについては、対応する入力文字画像（スキャンした文書から切り出した文字画像）も少ない。そこで、切り出した文字画像を複数枚複製し、複製した文字画像に対し異なる画像処理を施してから文字認識を行う。複数の処理を経ても文字認識結果として出力される文字コードが同一であれば（ロバストであれば）、文字認識結果は正しいと判断する。 For character codes with a low appearance frequency, the corresponding input character images (character images cut out from scanned documents) are also few. Therefore, a plurality of cut character images are duplicated, and character recognition is performed after different image processing is performed on the duplicated character images. If the character codes output as the character recognition result are the same even after a plurality of processes (if it is robust), it is determined that the character recognition result is correct.

出現頻度の低い文字コードについては、１つの文字コードから故意に画像歪みを付加した文字画像を複数枚作成する。それぞれの作成文字画像と入力文字画像とを参照し、いずれの作成文字画像においても類似度が低い場合は、文字認識結果が誤りであると判断する。 For character codes with a low appearance frequency, a plurality of character images intentionally added with image distortion are created from one character code. Each created character image and input character image are referred to, and if the similarity is low in any created character image, it is determined that the character recognition result is incorrect.

画像比較・正誤判定を一度実施後、誤りと判定された場合、文字コードから生成した文字画像あるいは入力画像から切り出した文字画像、あるいは両者に対し、ノイズ除去や二値化閾値変更、細線化、膨張・収縮など異なる画像処理方法・画像処理パラメータを施し、再度画像比較・正誤判定を行う。いずれにおいても類似度が低い場合、文字認識結果は誤りであると判断される。 If it is determined that there is an error after image comparison / correction determination is performed once, the character image generated from the character code or the character image cut out from the input image, or both, noise removal, binarization threshold change, thinning, Different image processing methods and image processing parameters such as expansion and contraction are applied, and image comparison / correction determination is performed again. In any case, when the similarity is low, it is determined that the character recognition result is an error.

スキャン時に、標準解像度と高解像度のように、複数の解像度の画像を用意する。画像比較・正誤判定を一度実施後、誤りと判定された場合、高解像度のスキャン画像から文字を切り出して、再度、画像比較・正誤判定を実施する。（ＯＣＲ処理は省略できる）。フォント画像からはノイズの少ない文字画像を任意の大きさで作成できるので、高解像度の方が類似度算出に有利である。 At the time of scanning, images of multiple resolutions are prepared, such as standard resolution and high resolution. If it is determined that there is an error after image comparison / correction determination is performed once, characters are cut out from the high-resolution scan image, and image comparison / correction determination is performed again. (OCR processing can be omitted). Since a character image with less noise can be created from a font image in an arbitrary size, higher resolution is advantageous for calculating similarity.

上記処理は１枚（１ページ）の文書単位で処理していき、次（ページ）の文書に対して処理を始めるときは、前文書で用いた方法・パラメータを優先的に用いて実行する。 The above processing is performed in units of one document (one page), and when processing is started for the next (page) document, the method and parameters used in the previous document are preferentially executed.

１枚の文書内に誤りの可能性ありと判定された文字が全くない場合、その文書は表示しない。すなわち誤りの可能性のある文書だけを表示する。 If there is no character that is determined to be erroneous in one document, the document is not displayed. That is, only documents with a possibility of error are displayed.

１枚の文書内に誤りの可能性ありと判定された文字の割合が多い場合、その１枚の文書だけを元にして再処理を試みる。再処理実行時、文字コードの出現頻度情報や入力文字画像の重ね合わせ、オブジェクト種別ごとのフォント指定などを変更する。元処理の結果と再処理の結果と、文字認識が正しいと判定された文字の割合が高い方の結果を表示する。 If there is a large percentage of characters that are determined to be erroneous in one document, re-processing is attempted based only on that one document. When reprocessing is executed, character code appearance frequency information, superimposition of input character images, font designation for each object type, and the like are changed. The result of the original process, the result of the reprocess, and the result having the higher ratio of characters determined to be correct are displayed.

再処理しても文字認識が正しいと判定されることが同程度の場合、元処理・再処理いずれでも正しいと認識された文字のみを文字認識結果が正しいものを判断し、それ以外の文字を誤りの可能性ありとして表示する。 If the character recognition is judged to be correct even after reprocessing, it is judged that the character recognition result is correct only for characters recognized as correct in both the original processing and reprocessing, and other characters are determined. Display as possible error.

１枚の文書内に誤りの可能性ありと判定された文字の割合が多い場合、過去、異なる判定条件がある場合、その異なる条件を用いて再処理を試みる。再処理の結果、状況が好転しなければ、異なるパラメータを探索しなおす。 If the ratio of characters determined to be erroneous in a single document is large, and there are different determination conditions in the past, reprocessing is attempted using the different conditions. If the situation does not improve as a result of the reprocessing, a different parameter is searched again.

この発明の実施例１の構成を説明するブロック図である。It is a block diagram explaining the structure of Example 1 of this invention. 実施例１の動作例を説明する図である。FIG. 6 is a diagram for explaining an operation example of the first embodiment. 実施例１の文字認識の簡易的な例を説明する図である。It is a figure explaining the simple example of the character recognition of Example 1. FIG. 実施例１の文字画像の比較の簡易的な例を説明する図である。It is a figure explaining the simple example of the comparison of the character image of Example 1. FIG. 実施例１の正誤判定結果の表示例を説明する図である。It is a figure explaining the example of a display of the correct / incorrect determination result of Example 1. FIG. この発明の実施例２の構成を説明するブロック図である。It is a block diagram explaining the structure of Example 2 of this invention. 実施例２におけるフォントの決定方法を説明する図である。FIG. 10 is a diagram illustrating a font determination method according to the second embodiment. 実施例２における他のフォントの決定方法を説明する図である。It is a figure explaining the determination method of the other font in Example 2. FIG. 実施例２における他のフォントの決定方法を説明する図である。It is a figure explaining the determination method of the other font in Example 2. FIG. 実施例２における文字画像の比較および文字画像の前処理を説明する図である。It is a figure explaining the comparison of the character image in Example 2, and the pre-processing of a character image.

Explanation of symbols

１０画像入力部
１１画像記憶部
１２文字画像切出部
１３文字認識部
１４文字画像生成部
１５文字画像比較部
１６正誤判定部
１７認識結果出力部
１８表示部
１９生成制御部
２０文字画像変形部
１００画像読取装置
２００パーソナルコンピュータ
２０１記録媒体
３００通信ネットワーク
４００文字認識システム
４０１文字認識システム DESCRIPTION OF SYMBOLS 10 Image input part 11 Image memory | storage part 12 Character image cutting part 13 Character recognition part 14 Character image generation part 15 Character image comparison part 16 Correct / error determination part 17 Recognition result output part 18 Display part 19 Generation control part 20 Character image deformation | transformation part 100 Image reading apparatus 200 Personal computer 201 Recording medium 300 Communication network 400 Character recognition system 401 Character recognition system

Claims

Character recognition means for generating a character code by executing character recognition processing on a character image to be recognized;
Character image generation means for generating a character image from the character code generated by the character recognition means;
Image comparing means for comparing the character image to be recognized and the character image generated by the character image generating means;
A character recognition apparatus comprising: a determination unit that determines whether the recognition result of the character recognition unit is correct based on a comparison output of the character image comparison unit.

2. The display unit according to claim 1, further comprising: a display unit configured to display the recognition target character image, which is determined to have an error in the recognition result, as distinguished from others in the page image including the recognition target character image. Character recognition device.

The image comparison means employs a feature quantity of a type different from the feature quantity used by the character recognition means for character recognition, and compares the character image to be recognized with the character image generated by the character image generation means. Item 3. The character recognition device according to Item 1 or 2.

The character image generation unit generates a character image of a different type of character font, and the image comparison unit compares the character image to be recognized with the character image of the different type of character font. The character recognition apparatus in any one.

The character recognition device according to claim 4, wherein the type of the character font employed by the image comparison means is narrowed down by learning.

The character recognition apparatus according to claim 5, wherein the image comparison unit performs narrowing down by excluding the character font type having a low correlation with the character image to be recognized.

The character recognition apparatus according to claim 5 or 6, wherein a type of the character font employed by the image comparison unit is different for each character object type.

The character recognition device according to claim 5, wherein the learning is performed on a character image of a character selected based on the appearance frequency.

The character recognition device according to claim 5, wherein the learning is performed on a character image of a character selected based on the likelihood of the character recognition means.

Computer
Character recognition means for generating a character code by executing character recognition processing on a character image to be recognized;
Character image generation means for generating a character image from the character code generated by the character recognition means;
Image comparison means for comparing the character image to be recognized and the character image generated by the character image generation means;
A character recognition program for functioning as a determination unit that determines whether the recognition result of the character recognition unit is correct based on the comparison output of the character image comparison unit.