JP2023046687A

JP2023046687A - Information processing device, information processing method and program

Info

Publication number: JP2023046687A
Application number: JP2021155411A
Authority: JP
Inventors: 頌二田中; Shoji Tanaka
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2021-09-24
Filing date: 2021-09-24
Publication date: 2023-04-05

Abstract

To improve specification accuracy of a character region.SOLUTION: An information processing device extracts a character string region from document image data, determines whether characters included in a unit character region are handwritten characters or printed characters for each unit character region, executes OCR (handwritten character OCR) using a handwritten character recognition dictionary or OCR (printed character OCR) using a printed character recognition dictionary according to a determination result, specifies a character region (hereinafter, a correction object region) being the object of character region re-generation, selects a reference character region being a unit character region of the printed characters necessary for re-generation of the unit character region, determines the size of search from the selected reference character region, and searches for the printed characters in the correction object region on the basis of the search size determined by the reference character region to re-generate the unit character region.SELECTED DRAWING: Figure 7

Description

本発明は、画像から文字を抽出する文字認識技術に関する。 The present invention relates to character recognition technology for extracting characters from images.

従来、文字を含む原稿をスキャンして得られたスキャン画像に対して文字認識処理を行い、コンピュータが利用可能な文字コードに変換する技術である、ＯＣＲ（ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅｃｏｇｎｉｔｉｏｎ）処理が広く知られている。ＯＣＲ処理を用いることで、一般的なオフィスで実施されている経費精算業務に代表される紙媒体の帳票をデジタルデータに変換する作業を自動化することが可能になり、データ入力業務における生産性の向上が期待されている。 2. Description of the Related Art Conventionally, OCR (Optical Character Recognition) processing, which is a technology for performing character recognition processing on a scanned image obtained by scanning a document containing characters and converting it into a computer-usable character code, is widely known. there is By using OCR processing, it becomes possible to automate the work of converting paper forms, which are typified by expense reimbursement work performed in general offices, into digital data, improving productivity in data entry work. Improvement is expected.

経費精算業務において取り扱われる領収書においては、印字された定型のフォーマットに手書き文字が記入されることが多く、活字と手書き文字とが混在するものが多い。例えば店舗で発行される領収書は、予め印刷された金額、日付、但し書きなどの記入欄に対して手書きで文字が記入される。 2. Description of the Related Art Receipts handled in expense reimbursement operations often include handwritten characters in a standard printed format, and many include a mixture of printed characters and handwritten characters. For example, in a receipt issued at a store, characters are written by hand in pre-printed entry fields such as amount, date, and proviso.

一般に、活字と手書き文字とでは文字認識の処理が異なるため、活字と手書き文字とが混在する文書画像に対しては、活字と手書き文字とを正しく判別し、それぞれに対し適切な文字認識処理を実施する必要がある。 In general, the character recognition process differs between printed characters and handwritten characters. Therefore, for a document image containing both printed characters and handwritten characters, it correctly distinguishes between printed characters and handwritten characters, and performs appropriate character recognition processing for each. Need to implement.

特許文献１では、まず文字の形状によって手書き活字判定を行い、活字と判定された文字については活字認識が行われ、手書き文字と判定された文字については手書き文字認識が行われる。そして、そのそれぞれの文字認識の信頼度が所定の閾値よりも高ければその結果を採用し、信頼度が所定の閾値よりも低ければ、もう一方の文字認識手段を用いて文字認識を行い、２つの文字認識結果のうち信頼度の高い方を選択する。これにより、活字と手書き文字を含む文書画像に対する文字認識処理結果の信頼度を高めている。 In Japanese Patent Application Laid-Open No. 2002-200003, handwritten type determination is first performed based on the shape of the character, type recognition is performed for the typed characters, and handwritten character recognition is performed for the characters determined as handwritten characters. If the reliability of each character recognition is higher than a predetermined threshold, the result is adopted, and if the reliability is lower than the predetermined threshold, character recognition is performed using the other character recognition means. Select the one with the highest confidence among the two character recognition results. This increases the reliability of the character recognition processing result for the document image including printed characters and handwritten characters.

特開２００１－１４３０２０号公報Japanese Patent Application Laid-Open No. 2001-143020 特開２０１０－２１８１０６号公報Japanese Patent Application Laid-Open No. 2010-218106

しかしながら、特許文献１の方法では、文字領域の特定、すなわち文字切りが失敗した場合、一文字ごとに正しく手書き活字判定ができず、文字認識精度が低下するという課題がある。例えば、図８に示す「発行日年月日」の活字と、「２０２４」、「６」、「２１」の手書き文字とが記載された文字列では、距離が近い手書き文字の「４」と活字の「年」とを一文字とする間違った文字切りがなされる場合がある。このように文字領域の特定に失敗すると、特許文献１の方法では、「４年」に対応する文字領域に対して手書き活字判定を行うことになり、手書きと判定しても、活字と判定しても、その後の文字認識処理は失敗することになる。 However, the method of Patent Document 1 has a problem that if character area specification, that is, character segmentation fails, handwritten type determination cannot be made correctly for each character, and character recognition accuracy decreases. For example, in the character string shown in FIG. 8, in which the typed characters "issue date year month day" and the handwritten characters "2024", "6", and "21" are described, the handwritten character "4" and the handwritten character "4" which are close to each other In some cases, the characters are erroneously cut with the "year" of the printed character as one character. If the identification of the character area fails in this way, the method of Patent Document 1 will perform handwritten and printed character determination for the character area corresponding to "4 years". However, subsequent character recognition processing will fail.

そこで本発明は、文字領域の特定精度を向上させることを目的とする。 SUMMARY OF THE INVENTION Accordingly, it is an object of the present invention to improve the accuracy of specifying a character area.

本発明は、情報処理装置であって、文書を読み取って得られた読み取り画像に含まれる複数の文字からなる文字列領域から前記複数の文字のそれぞれに対応する単位文字領域の候補を抽出する抽出手段と、前記単位文字領域の候補に対して手書き文字用又は活字用の文字認識処理を行い、文字認識結果およびその信頼度を得る認識手段と、前記単位文字領域のうち、前記文字認識結果の信頼度が所定の閾値以上の単位文字領域の候補を除く単位文字領域の候補を、基準となる単位文字領域の候補を用いて補正する補正手段と、を備え、前記基準となる単位文字領域の候補は、前記認識手段が行った前記活字用の文字認識処理により得られた文字認識結果の信頼度が所定の閾値以上である単位文字領域の候補である、ことを特徴とする。 The present invention is an information processing apparatus for extracting unit character area candidates corresponding to each of a plurality of characters from a character string area composed of a plurality of characters contained in a read image obtained by reading a document. means for performing character recognition processing for handwritten characters or printed characters on said unit character region candidates to obtain character recognition results and their reliability; correction means for correcting unit character area candidates excluding unit character area candidates whose reliability is equal to or higher than a predetermined threshold, using the reference unit character area candidates, and The candidate is characterized in that it is a candidate of a unit character area for which the reliability of the character recognition result obtained by the character recognition processing for printed characters performed by the recognition means is equal to or higher than a predetermined threshold.

本発明によれば、文字領域の特定精度を向上させることが出来る。 According to the present invention, it is possible to improve the accuracy of specifying a character area.

文字認識装置を含むシステム構成を示す図である。1 is a diagram showing a system configuration including a character recognition device; FIG. 情報処理装置のＵＩを実現する操作パネルの一例を示す図である。FIG. 3 is a diagram showing an example of an operation panel that implements the UI of the information processing apparatus; 第１の実施形態の文字認識装置を実現するためのソフトウエア構成を示す図である。2 is a diagram showing a software configuration for realizing the character recognition device of the first embodiment; FIG. 文字認識処理を含むシステムの処理プロセスを説明する図である。It is a figure explaining the processing process of a system containing character recognition processing. 処理対象とする文書の一例を示す図である。FIG. 4 is a diagram showing an example of a document to be processed; 文書画像からの項目抽出を実現する画面の一例を示す図である。FIG. 10 is a diagram showing an example of a screen for realizing item extraction from a document image; 文字認識のプロセスを説明する図である。It is a figure explaining the process of character recognition. 手書き活字混在文字列における文字切り出しの結果を示す図である。FIG. 10 is a diagram showing the result of character segmentation in a character string mixed with handwritten and printed characters; 文字領域再生成のプロセスを説明する図である。FIG. 10 is a diagram illustrating a process of character area regeneration; （ａ）は文字領域再生成の対象領域内で、活字領域を探索する様子を示す図であり、（ｂ）は文字領域再生成の結果の示す図である。(a) is a diagram showing how a character area is searched for in a target area for character area regeneration, and (b) is a diagram showing the result of character area regeneration. （ａ）は第２の実施形態における手書き活字判定の結果を示す図であり、（ｂ）は、第２の実施形態における文字領域再生成の対象領域を生成する様子を示す図である。(a) is a diagram showing the result of handwritten type determination in the second embodiment, and (b) is a diagram showing how a target area for character area regeneration is generated in the second embodiment.

以下、本発明の実施形態について図面に基づいて説明する。なお、実施形態は本発明を限定するものではなく、また、実施形態で説明されている全ての構成が本発明の課題を解決するため必須の手段であるとは限らない。 BEST MODE FOR CARRYING OUT THE INVENTION An embodiment of the present invention will be described below with reference to the drawings. The embodiments do not limit the present invention, and not all configurations described in the embodiments are essential means for solving the problems of the present invention.

＜第１の実施形態＞
［システム構成］
図１は、第１の実施形態に係る情報処理システムを示す図である。情報処理システムは、読み取り装置１００と、情報処理装置１１０とを有している。読み取り装置１００は、スキャナ１０１と、読み取り装置側通信部１０２とを有している。スキャナ１０１は、文書の読み取りを行い、スキャンした文書画像データを生成する。読み取り装置側通信部１０２は、ネットワークを介して外部装置と通信を行う。 <First embodiment>
[System configuration]
FIG. 1 is a diagram showing an information processing system according to the first embodiment. The information processing system has a reading device 100 and an information processing device 110 . The reading device 100 has a scanner 101 and a reading device side communication section 102 . A scanner 101 reads a document and generates scanned document image data. The reading device side communication unit 102 communicates with an external device via a network.

情報処理装置１１０は、システム制御部１１１と、ＲＯＭ１１２と、ＲＡＭ１１３と、ＨＤＤ１１４と、表示部１１５と、入力部１１６と、情報処理装置側通信部１１７とを有している。システム制御部１１１は、ＲＯＭ１１２に記憶された制御プログラムを読み出して各種処理を実行する。ＲＡＭ１１３は、システム制御部１１１の主メモリ、ワークエリア等の一時記憶領域として用いられる。ＨＤＤ１１４は、各種データや各種プログラム等を記憶する。なお、後述する情報処理装置１１０の機能や処理は、システム制御部１１１がＲＯＭ１１２またはＨＤＤ１１４に格納されているプログラムを読み出し、このプログラムを実行することにより実現されるものである。 The information processing apparatus 110 has a system control section 111 , a ROM 112 , a RAM 113 , an HDD 114 , a display section 115 , an input section 116 and an information processing apparatus side communication section 117 . The system control unit 111 reads control programs stored in the ROM 112 and executes various processes. The RAM 113 is used as a main memory of the system control unit 111 and a temporary storage area such as a work area. The HDD 114 stores various data, various programs, and the like. Functions and processes of the information processing apparatus 110, which will be described later, are realized by the system control unit 111 reading a program stored in the ROM 112 or the HDD 114 and executing the program.

情報処理装置側通信部１１７は、ネットワークを介して外部装置との通信処理を行う。 The information processing device side communication unit 117 performs communication processing with an external device via a network.

表示部１１５は、情報処理装置１１０上で処理される各種情報を表示する。表示部１１５としては、複合機（ＭＦＰ：ＭｕｌｔｉＦｕｎｃｔｉｏｎＰｅｒｉｐｈｅｒａｌ）に搭載される図２に示すような操作パネル２０１、もしくはＰＣのディスプレイ等であってもよい。なお表示部１１５は情報処理装置１１０に内蔵されていなくてもよく、情報処理装置１１０に接続された外部ディスプレイでもよい。入力部１１６は、キーボードやマウスを有し、ユーザによる各種操作を受け付ける。なお、表示部１１５と入力部１１６は、タッチパネルのように一体に設けられてもよい。また、表示部１１５は、プロジェクタによる投影を行うものであってもよく、入力部１１６は、投影された画像に対する指先の位置を、カメラで認識するものであってもよい。 The display unit 115 displays various information processed on the information processing device 110 . The display unit 115 may be an operation panel 201 as shown in FIG. 2 installed in an MFP (Multi Function Peripheral), a display of a PC, or the like. Note that the display unit 115 may not be built in the information processing device 110 and may be an external display connected to the information processing device 110 . The input unit 116 has a keyboard and a mouse, and receives various operations by the user. Note that the display unit 115 and the input unit 116 may be provided integrally like a touch panel. Further, the display unit 115 may perform projection using a projector, and the input unit 116 may recognize the position of the fingertip with respect to the projected image using a camera.

本実施形態においては、読み取り装置１００のスキャナ１０１が帳票等の紙文書を読み取り、スキャン画像データを生成する。スキャン画像データは、読み取り装置側通信部１０２により情報処理装置１１０に送信される。情報処理装置１１０では、情報処理装置側通信部１１７がスキャンした文書画像データを受信し、当該画像をＨＤＤ１１４などの記憶装置に記憶する。なお、表示部１１５と入力部１１６の一部機能が読み取り装置１００にあってもよい。 In this embodiment, the scanner 101 of the reading device 100 reads a paper document such as a form and generates scan image data. The scanned image data is transmitted to the information processing device 110 by the reader-side communication unit 102 . The information processing apparatus 110 receives the document image data scanned by the information processing apparatus side communication unit 117 and stores the image in a storage device such as the HDD 114 . Part of the functions of the display unit 115 and the input unit 116 may be provided in the reading device 100 .

［ＵＩ］
図２は、本実施形態における情報処理装置１１０の表示部１１５に表示されるＵＩ（ＵｓｅｒＩｎｔｅｒｆａｃｅ）を示す図である。操作パネル２０１は、ＭＦＰ等において表示部１１５を実現する構成例である。操作パネル２０１は、タッチパネル２０２及び物理キーからなるテンキー２０３を備える。図２では、タッチパネル２０２の左上にログイン中のユーザＩＤ、メインメニューなどが表示される。 [UI]
FIG. 2 is a diagram showing a UI (User Interface) displayed on the display unit 115 of the information processing apparatus 110 according to this embodiment. The operation panel 201 is a configuration example that implements the display unit 115 in an MFP or the like. The operation panel 201 includes a touch panel 202 and a numeric keypad 203 composed of physical keys. In FIG. 2, the logged-in user ID, the main menu, and the like are displayed on the upper left of the touch panel 202 .

本実施形態におけるＵＩは、図６に示すような処理対象の文書画像データからの情報抽出結果をユーザに提供するための一手段としても機能するものであり、タッチパネル２０２上で提供される。文書画像データからの情報抽出結果の表示は、タッチパネル２０２に限定されず、ＰＣのディスプレイを用いて実行しても良い。 The UI in this embodiment also functions as a means for providing the user with information extraction results from the document image data to be processed as shown in FIG. 6, and is provided on the touch panel 202 . The display of the information extraction result from the document image data is not limited to the touch panel 202, and may be executed using the display of the PC.

［ソフトウエア構成］
図３は、情報処理装置１１０上で文字認識装置３００を実現するソフトウエア構成を示す図である。文字認識装置３００は、処理結果提供部３０１の有する各手段（３０３）と文字認識結果生成部３０２の有する各手段（３０４～３１２）から構成される。 [Software configuration]
FIG. 3 is a diagram showing a software configuration for realizing the character recognition device 300 on the information processing device 110. As shown in FIG. The character recognition apparatus 300 comprises means (303) of the processing result providing unit 301 and means (304 to 312) of the character recognition result generating unit 302. FIG.

処理結果提供部３０１は、文字認識結果生成部３０２の処理結果をユーザに提示する表示制御部であって、例えば前述のタッチパネル２０２やＰＣのディスプレイに表示するユーザインターフェースを制御する。処理結果提供部３０１は、文字認識結果生成部３０２が有する項目抽出手段３０７によって得られた文字認識結果を表示部１１５に表示させる認識結果表示手段３０３を含む。 The processing result providing unit 301 is a display control unit that presents the processing result of the character recognition result generating unit 302 to the user, and controls the user interface displayed on the above-described touch panel 202 or PC display, for example. The processing result providing unit 301 includes a recognition result display unit 303 that causes the display unit 115 to display the character recognition result obtained by the item extraction unit 307 of the character recognition result generation unit 302 .

文字認識結果生成部３０２は、情報処理装置１１０に入力された文書画像データを対象として文字認識を実行し、文字認識結果を生成、さらには抽出対象の項目名に対応する項目値を処理対象の文書画像データから抽出する装置である。文字認識結果生成部３０２は、画像処理手段３０４、手書き文字認識手段３０５、活字認識手段３０６、項目抽出手段３０７、文字列領域抽出手段３０８、および文字領域切り出し手段３０９を含む。さらに文字認識結果生成部３０２は、文字認識の精度を向上させるための手段として、手書き活字判定手段３１０、基準文字選択手段３１１、文字領域再生手段３１２を含む。 The character recognition result generating unit 302 executes character recognition on the document image data input to the information processing apparatus 110, generates a character recognition result, and converts the item value corresponding to the item name to be extracted to the object to be processed. It is a device for extracting from document image data. Character recognition result generator 302 includes image processing means 304 , handwritten character recognition means 305 , printed character recognition means 306 , item extraction means 307 , character string area extraction means 308 and character area extraction means 309 . Further, the character recognition result generation unit 302 includes handwritten type determination means 310, reference character selection means 311, and character area reproduction means 312 as means for improving the accuracy of character recognition.

画像処理手段３０４は、入力された文書画像データに対して文字認識処理が実行できるように前処理を行う。文字列領域抽出手段３０８は、文書画像データから文字列文字領域を抽出する。文字領域切り出し手段３０９は、抽出された文字列領域から認識対象の一文字ごとの単位文字領域を切り出す。 The image processing means 304 pre-processes the input document image data so that character recognition processing can be executed. A character string region extraction unit 308 extracts a character string region from the document image data. A character area clipping means 309 clips a unit character area for each character to be recognized from the extracted character string area.

手書き文字認識手段３０５は、切り出された単位文字領域について手書き文字用の認識辞書を用いて文字コードに変換（文字認識処理）を行う。活字認識手段３０６は、切り出された単位文字領域について活字用の認識辞書を用いて文字コードに変換（文字認識処理）を行う。項目抽出手段３０７は、文字認識結果として得られた文字列からユーザが必要とする項目を特定する。 The handwritten character recognition means 305 converts the extracted unit character region into a character code using a handwritten character recognition dictionary (character recognition processing). The type recognition means 306 converts the extracted unit character area into a character code using a recognition dictionary for type (character recognition processing). The item extracting means 307 identifies items required by the user from the character string obtained as the result of character recognition.

手書き活字判別手段３１０は、文字領域切り出し手段３０９によって切り出された単位文字領域ごとにその領域に存在する文字が手書き文字か活字かを判別する手段である。基準文字選択手段３１１は、各文字領域において、切り出された単位文字領域の中から基準文字領域を選択する手段である。この基準文字領域は、手書き活字判別手段３１０によって手書き文字もしくは活字と確定的に判定できなかった単位文字領域から、活字領域を特定するのに用いられる。この活字領域特定の詳細な処理のステップについては、図７～図１０を用いて後述する。文字領域再生成手段３１２は、手書き活字判別手段３１０によって手書き文字もしくは活字と確定的に判定できなかった単位文字領域を対象として、基準文字領域に基づき単位文字領域を再生成する手段である。この再生成の具体的なステップについては後に図９を用いて説明する。 The handwritten character discrimination means 310 is a means for discriminating whether characters existing in each unit character area extracted by the character area extraction means 309 are handwritten characters or printed characters. The reference character selection means 311 is a means for selecting a reference character area from the cut out unit character areas in each character area. This reference character area is used to specify a printed character area from unit character areas that could not be definitively determined to be a handwritten character or printed character by the handwritten and printed character discrimination means 310 . The detailed processing steps for specifying the type area will be described later with reference to FIGS. 7 to 10. FIG. The character area regenerating means 312 is a means for regenerating a unit character area based on a reference character area for a unit character area that could not be definitively determined to be a handwritten character or a printed character by the handwritten character determining means 310 . Specific steps for this regeneration will be described later with reference to FIG.

［処理フロー］
図４は、読み取り装置１００によって文書を読み取ることで得られた文書画像データが情報処理装置１１０に入力され、文書画像データに対する文字認識処理の結果をユーザが確認および修正し、確定した文字列を登録する処理を示したフローチャートである。 [Processing flow]
In FIG. 4, document image data obtained by reading a document with the reading device 100 is input to the information processing device 110, the user checks and corrects the result of character recognition processing on the document image data, and confirms and corrects the character string. It is the flow chart which showed the processing which registers.

まずＳ４００において、情報処理装置１１０が、読み取り装置１００によって文書を読み取ることで得られた文書画像データを取得する。読み取り対象の文書は例えば、図５に示すような領収書５００である。 First, in S<b>400 , the information processing apparatus 110 acquires document image data obtained by reading a document with the reading apparatus 100 . A document to be read is, for example, a receipt 500 as shown in FIG.

Ｓ４０１において、画像処理手段３０４が、文書画像データに含まれる文字列の記述方向を検知することによって文字領域の傾きを補正する。 In S401, the image processing unit 304 corrects the inclination of the character area by detecting the description direction of the character string included in the document image data.

Ｓ４０２において、画像処理手段３０４が、グレースケールの文書画像データに対してある閾値を持って二値化を行う。 In S402, the image processing unit 304 binarizes the grayscale document image data with a certain threshold value.

Ｓ４０３において、画像処理手段３０４が、文書画像データから文字認識に不要な罫線を除去する。これらＳ４０１～Ｓ４０３までのステップで行われる処理は、Ｓ４０４の文字認識を正確に実行するための前処理となる。 In S403, the image processing unit 304 removes ruled lines unnecessary for character recognition from the document image data. The processing performed in steps S401 to S403 is preprocessing for accurately executing character recognition in S404.

Ｓ４０４において、これらの前処理が適用された文書画像データに対して文字認識処理が行われる。ここでは、まず文字列領域抽出手段３０８による文字列領域の抽出、文字領域切り出し手段３０９による単位文字ごとの領域切り出し、手書き活字判別手段３１０による文字領域ごとの手書き活字判定を行う。そして、手書き文字認識手段３０５、活字認識手段３０６による文字認識処理、基準文字選択手段３１１による基準文字領域の選択および文字領域再生成手段３１２による単位文字領域の再生成を含む。最終的には文書画像データに含まれる文字領域に対する文字認識結果として文字コードが得られる。 In S404, character recognition processing is performed on the document image data to which these preprocessings have been applied. First, character string area extraction means 308 extracts a character string area, character area extraction means 309 extracts an area for each unit character, and handwritten character determination means 310 performs handwritten character determination for each character area. Character recognition processing by handwritten character recognition means 305 and printed character recognition means 306, selection of a reference character area by reference character selection means 311, and regeneration of unit character areas by character area regeneration means 312 are included. Finally, a character code is obtained as a character recognition result for the character area included in the document image data.

Ｓ４０５において、項目抽出手段３０７が、ユーザが必要とする項目値を文字認識結果として得られた文字列内から抽出する。図５に示す領収書を例に取ると、項目値とは例えば「電話番号」や「金額」といった、予め設定された後のＳ４０７でシステムに登録する項目を指す。図６は、該項目抽出の際にユーザに提示されるＵＩ画面の一例を示したものである。ＵＩ画面６００は、処理対象の文書画像データのプレビュー表示領域６０１、抽出対象となる項目名６０２、プレビュー表示領域６０１に表示された文書画像データから抽出された項目値６０３である。項目値の抽出処理は、例えばユーザがプレビュー表示領域６０１に表示された文書画像データ６０１に対して、抽出対象の項目値が記載されている位置を指示することによって実現される。もしくは、項目名６０２に設定された項目に関連する項目値に相当する文字列を文書画像データから抽出された文字列内で検索し、予め定義された項目名と項目値の位置関係に基づき、検索された項目名に対応する位置から必要な項目値を抽出する。これによりユーザの指示なしに自動で抽出することも可能である。例えば、項目名６０２が「金額」の項目値“￥１１，２８６”を抽出する場合、処理対象の文書の種別（この場合は領収書）が「領収金額」であった場合、そこに含まれる“金額”などの文字列を文書画像データにおいて検索する。その結果、文書画像データにおいて“金額”と文字認識された文字領域が見つかると、情報処理装置１１０において予め定義された“金額”の文字領域とそれに対応する項目値が記載された文字領域との位置関係に基づき、項目値に相当する文字列を検索する。この場合は、「“金額”文字列の右側に領収金額の項目値が存在する」というルールに基づいて、“￥１１，２８６”の文字列が項目値として抽出される。以上に示した項目値抽出の方法はあくまで一例であり、ユーザが所望する情報が抽出される方法であれば他の方法を用いてもよい。 In S405, the item extraction unit 307 extracts the item value required by the user from the character string obtained as the character recognition result. Taking the receipt shown in FIG. 5 as an example, item values refer to items such as "telephone number" and "money amount" that are registered in the system in step S407 after being set in advance. FIG. 6 shows an example of a UI screen presented to the user when extracting the item. A UI screen 600 includes a preview display area 601 of document image data to be processed, item names 602 to be extracted, and item values 603 extracted from the document image data displayed in the preview display area 601 . The item value extraction process is realized, for example, by the user specifying the position where the item value to be extracted is described in the document image data 601 displayed in the preview display area 601 . Alternatively, a character string corresponding to an item value related to the item set in the item name 602 is searched within the character string extracted from the document image data, and based on a predefined positional relationship between the item name and the item value, Extract the required item value from the position corresponding to the retrieved item name. This makes it possible to extract automatically without a user's instruction. For example, when extracting the item value "¥11,286" whose item name 602 is "amount", if the type of document to be processed (a receipt in this case) is "receipt amount", A character string such as "amount of money" is searched in the document image data. As a result, when a character area recognized as "amount" is found in the document image data, a character area of "amount" defined in advance in the information processing apparatus 110 and a character area in which the item value corresponding thereto is described. Search for a string that corresponds to the item value based on the positional relationship. In this case, the character string “¥11,286” is extracted as the item value based on the rule that “the item value of the receipt amount exists on the right side of the “money amount” character string”. The method of extracting the item value shown above is merely an example, and other methods may be used as long as the method extracts the information desired by the user.

Ｓ４０５において、項目値６０３として抽出された文字認識結果である文字列は、認識結果表示手段３０３によって表示部に１１５に表示される。図６に示すように、項目名６０２のそれぞれに対し、対応する１つの文字列を項目値６０３として表示する。その後、ユーザは抽出された項目値６０３として抽出された文字列について確認および修正を行い、確認修正が完了したことを示すチェックボックス６０４にチェックを入れる。 In S405, the character string that is the character recognition result extracted as the item value 603 is displayed on the display unit 115 by the recognition result display means 303. FIG. As shown in FIG. 6, one character string corresponding to each item name 602 is displayed as an item value 603 . After that, the user checks and corrects the character string extracted as the extracted item value 603, and checks a check box 604 indicating that the confirmation and correction are completed.

Ｓ４０６において、認識結果表示手段３０３が、全ての項目にチェックが入ったことが検知されると、「次へ」ボタン６０５が有効化されたＵＩ画面を表示させる。ユーザが「次へ」ボタンを押下する、つまり全ての項目の確認修正が終了すると（Ｓ４０６が真）、Ｓ４０７に移行する。 In S406, when the recognition result display unit 303 detects that all the items are checked, the UI screen with the "Next" button 605 enabled is displayed. When the user presses the "next" button, that is, when all items have been confirmed and corrected (S406 is true), the process proceeds to S407.

Ｓ４０７において、システムへのデータ登録が行われて全ての処理が終了する。 In S407, data is registered in the system, and all processing ends.

［文字認識処理］
図４のＳ４０４における文字認識処理の詳細について、図７に示す処理ステップおよび図８の具体例を参照しながら説明する。 [Character recognition processing]
Details of the character recognition processing in S404 of FIG. 4 will be described with reference to the processing steps shown in FIG. 7 and the specific example of FIG.

まずＳ７００において、文字列領域抽出手段３０８が、文書画像データ内から文字列領域を抽出する。文字列領域の抽出は公知の技術を用いて実現される。例えば、文字部を構成する黒画素の文字列の記載方向と垂直な方向への射影ヒストグラムを求め、その形状や変化量から分割位置を決めて文字列領域とすることができる。この結果、図５に示すような文書からは、図８に示すような「２０２４年６月２４日」という文字列領域が得られる。 First, in S700, the character string region extracting means 308 extracts a character string region from within the document image data. Extraction of the character string region is realized using a known technique. For example, a projection histogram in a direction perpendicular to the writing direction of the character string of black pixels forming the character portion can be obtained, and division positions can be determined based on the shape and amount of change to form the character string region. As a result, a character string area of "June 24, 2024" as shown in FIG. 8 is obtained from the document as shown in FIG.

Ｓ７０１において、文字領域切り出し手段３０９が、単位文字領域の候補（以降、単に単位文字領域という）の切り出しを行う。これは公知の技術によるもので、例えば、文字部を構成する黒画素の水平方向、または垂直方法への射影ヒストグラムを求め、その形状や変化量から文字列パターンを線形に分割する。その後、切り出したパターンを矩形で囲み、その矩形の面積や縦横比から判断して切り出す方法である。また、文字同士が繋がった接触部分(ＣｏｎｎｅｃｔｅｄＣｏｍｐｏｎｅｎｔｓ)に着目して、その形状や接続状態から分割する手法などである。以上のような手法を用いて、一文字であると判定した領域ごとに矩形で分割し、それぞれを単位文字領域とする。図８に示す破線で描かれた矩形は、単位文字領域の切り出しを行った結果の一例である。この文字列は、図５に示す領収書の右上に記載されている「発行日」に対応する項目値を示す文字列である。８０１では、手書き文字である「４」と活字である「年」が近接しており、Ｓ７０１の単位文字領域の切り出しに失敗し、２つの文字が１つの文字領域に含まれてしまった例を示している。 In S701, the character region clipping unit 309 clips a candidate unit character region (hereinafter simply referred to as a unit character region). This is based on a known technique. For example, a projection histogram of black pixels forming a character portion in the horizontal direction or the vertical direction is obtained, and the character string pattern is linearly divided based on the shape and amount of change. After that, the extracted pattern is surrounded by a rectangle, and the area and the aspect ratio of the rectangle are used to judge and extract the pattern. Another method is to pay attention to the contact portions (connected components) where characters are connected to each other, and divide them based on their shapes and connection states. Using the method described above, each area determined to be one character is divided into rectangles, and each area is defined as a unit character area. Rectangles drawn with dashed lines shown in FIG. 8 are examples of the result of cutting out the unit character area. This character string is a character string indicating the item value corresponding to the "issuance date" written on the upper right of the receipt shown in FIG. In 801, the handwritten character "4" and the printed character "year" are close to each other, and the extraction of the unit character area in S701 fails, resulting in two characters being included in one character area. showing.

Ｓ７０２において、手書き活字判別手段３１０が、単位文字領域ごとにそこに含まれる文字が手書き文字であるか、活字であるかを判定する。この手書き活字判定に関しても公知の技術を使用すればよく、例えば、文字の画像特徴と幾何特徴からスコアリングを行い、Ｓ７０１で切り出した単位文字領域ごとに手書き文字か活字かを判定する方法がある。なお、単位文字行ごとに手書き文字で構成される文字行か活字で構成される文字行かを判定する公知技術が存在する。しかし本発明では、単位文字領域ごとに手書き文字か活字かを判定することを目的としているので、Ｓ７０２においてこの手法を用いることが出来ない。ただし、ここでの手書き活字判定方法は単位文字領域ごとに手書き文字か活字かを判定する手法であれば他の手法を用いても構わない。図８に示す例では、上記手書き活字判定により、文字列の左側から「２」、「０」、「２」、「６」、「２」、「１」が手書き文字と判定され、「月」、「日」が活字と判定されている。手書き文字と活字を含んだ矩形８０１内の「４年」については、手書き文字、活字、不明のいずれかに判定される。 In S702, the handwritten and printed character determination means 310 determines whether the characters included in each unit character area are handwritten characters or printed characters. Known techniques may also be used for this handwritten character determination. For example, there is a method in which scoring is performed from the character image features and geometric features, and whether each unit character region cut out in S701 is determined as a handwritten character or a printed character. . There is a known technique for determining whether a character line is composed of handwritten characters or printed characters for each unit character line. However, in the present invention, since the object is to determine whether each unit character area is a handwritten character or a printed character, this method cannot be used in S702. However, as the method for determining handwritten and printed characters here, other methods may be used as long as they are methods for determining whether each unit character area is a handwritten character or a printed character. In the example shown in FIG. 8, as a result of the handwritten character determination, "2", "0", "2", "6", "2", and "1" from the left side of the character string are determined to be handwritten characters. ” and “日” are determined to be printed characters. "4 years" in a rectangle 801 containing handwritten characters and printed characters is determined to be either handwritten characters, printed characters, or unknown.

Ｓ７０３において、手書き文字認識手段３０５が、Ｓ７０２において手書き文字と判定された文字領域に対して手書き文字用の認識辞書を用いたＯＣＲ（手書き文字ＯＣＲ）を実行する。 In S703, the handwritten character recognition unit 305 performs OCR (handwritten character OCR) using a handwritten character recognition dictionary on the character area determined to be a handwritten character in S702.

Ｓ７０４において、活字認識手段３０６が、Ｓ７０２において活字と判定された文字領域に対して活字用の認識辞書を用いたＯＣＲ（活字ＯＣＲ）を実行する。 In S704, the printed character recognition unit 306 executes OCR (printed character OCR) using a recognition dictionary for printed characters on the character area determined to be printed in S702.

Ｓ７０３、Ｓ７０４の結果、各単位文字領域の文字画像に対する認識結果として、単位文字領域ごとの文字コードおよび認識結果の信頼度を取得する。図８に示す例では、「２」、「０」、「２」、「６」、「２」、「１」に対して手書き文字ＯＣＲが実行され、「月」、「日」に対して活字ＯＣＲが実行される。「４年」を含む矩形８０１に対しては、Ｓ７０２で手書き文字と判定された場合は手書き文字ＯＣＲ、活字と判定された場合は活字ＯＣＲが実行される。一方「不明」と判定された場合は、この領域に対するＯＣＲ処理をスキップする、もしくは手書き文字ＯＣＲと活字ＯＣＲの両方を実行する。 As a result of S703 and S704, the character code for each unit character area and the reliability of the recognition result are acquired as the recognition result for the character image of each unit character area. In the example shown in FIG. 8, handwritten character OCR is performed for "2", "0", "2", "6", "2" and "1", and for "month" and "day" Type OCR is performed. For the rectangle 801 containing "4 years", handwritten character OCR is performed if it is determined to be handwritten characters in S702, and printed character OCR is performed if it is determined to be printed characters. On the other hand, if it is determined to be "unknown", the OCR processing for this area is skipped, or both handwritten character OCR and printed character OCR are performed.

Ｓ７０５において、文字領域再生成手段３１２が、単位文字領域の再生成の対象となる文字領域（以下、補正対象領域）を特定する。ここでは、Ｓ７０２で手書き文字とも活字とも判定されなかった文字領域、すなわち手書き活字判定が「不明」となった文字領域、またはＳ７０３およびＳ７０４におけるＯＣＲ結果の信頼度が所定の閾値よりも低い文字領域を補正対象領域として特定する。すなわち、ここで特定される文字領域は、手書き文字らしくもなく活字らしくもない文字領域である。図８に示す例では、矩形８０１の文字領域がＳ７０２において「不明」と判定されるか、Ｓ７０３またはＳ７０４で認識結果の信頼度が所定の閾値よりも低くなるため、補正対象領域として特定される。なお、手書き文字ＯＣＲおよび活字ＯＣＲの信頼度の傾向は異なるため、それぞれ別の閾値を設定する。 In S705, the character area regenerating unit 312 identifies a character area (hereinafter referred to as a correction target area) to be regenerated as a unit character area. Here, the character area determined as neither the handwritten character nor the printed character in S702, that is, the character area in which the handwritten character determination is "unknown", or the character area in which the reliability of the OCR result in S703 and S704 is lower than a predetermined threshold value. is specified as a correction target region. That is, the character area identified here is a character area that does not look like handwritten characters or printed characters. In the example shown in FIG. 8, the character area of the rectangle 801 is determined to be "unknown" in S702, or the reliability of the recognition result is lower than the predetermined threshold in S703 or S704, so it is specified as the correction target area. . Since handwritten character OCR and printed character OCR have different reliability trends, different thresholds are set for each.

Ｓ７０６において、文字領域再生成手段３１２が補正対象領域を特定しなかった場合、つまり全ての単位文字領域の認識結果の信頼度が所定の閾値以上と判定された場合は、文字切りに誤りがないと判定し、Ｓ７１１に移行する。 In S706, if the character area regenerating unit 312 did not specify the correction target area, that is, if it is determined that the reliability of the recognition result of all the unit character areas is equal to or higher than the predetermined threshold, there is no error in character cutting. , and the process proceeds to S711.

一方、Ｓ７０６において、文字領域再生成手段３１２が補正対象領域を特定した場合、その補正対象領域に対して単位文字領域の再生成処理を行うために、Ｓ７０７に移行する。 On the other hand, if the character area regenerating means 312 specifies the correction target area in S706, the process proceeds to S707 in order to perform unit character area regeneration processing for the correction target area.

Ｓ７０７において、基準文字選択手段３１１が、文字領域の再生成で必要となる基準文字領域の選択を行う。ここでの基準文字領域とは、補正対象領域内で活字を探索するサイズ(以下、探索サイズ)を設定する際の基準となる文字領域であり、Ｓ７０４における活字ＯＣＲにおいて信頼度が高かった文字領域（以下、基準文字領域候補）の中から選択される。基準文字領域は、例えば、補正対象領域を含む同じ文字列に存在する基準文字領域候補の中から選択してもよい。図８に示す例では、基準文字領域候補は、「月」、「日」である。補正対象領域を含む文字列は、文字列領域抽出手段３０８によって抽出された文字列領域であり、図７におけるＳ７００で抽出された文字列領域である。 In S707, the reference character selection means 311 selects a reference character area necessary for regenerating the character area. Here, the reference character area is a character area that serves as a reference when setting a size for searching for characters in the correction target area (hereinafter referred to as a search size). (hereinafter referred to as reference character area candidates). The reference character area may be selected, for example, from reference character area candidates existing in the same character string including the correction target area. In the example shown in FIG. 8, the reference character area candidates are "month" and "day". The character string including the correction target area is the character string area extracted by the character string area extracting means 308, and is the character string area extracted in S700 in FIG.

Ｓ７０８において、文字領域再生成手段３１２が、選択された基準文字領域から探索のサイズを決定する。探索サイズは例えば、基準文字領域の幅と高さのそれぞれの最大値および最小値に基づいて決定する。活字でも文字コードによって文字の幅や高さに幅があることを考慮して、最大値と最小値の間で幅と高さを変化させた複数の探索サイズを決定する。最大値と最小値の差が大きい場合、例えば小文字のアルファベットのような文字高さのレベルが文字コードによって大きく異なる場合は、その最大値と最小値それぞれの周辺で幅と高さを変化させた複数の探索サイズを決定する。 At S708, the character area regenerator 312 determines the size of the search from the selected reference character area. The search size is determined, for example, based on the maximum and minimum width and height of the reference character area. Considering that the width and height of characters vary depending on the character code, multiple search sizes are determined by varying the width and height between the maximum and minimum values. When the difference between the maximum and minimum values is large, for example, when the character height level varies greatly depending on the character code, such as lowercase alphabet, the width and height are changed around the maximum and minimum values respectively. Determine multiple search sizes.

Ｓ７０９において、文字領域再生成手段３１２が、基準文字領域によって決定された探索サイズに基づいて、補正対象領域において活字の探索を行い、単位文字領域を再生成する。この補正対象領域における活字探索処理の詳細については後述する。 In S709, the character area regenerating unit 312 searches for characters in the correction target area based on the search size determined by the reference character area, and regenerates the unit character area. The details of the type search processing in this correction target area will be described later.

Ｓ７１０において、手書き文字認識手段３０５および活字認識手段３０６が、Ｓ７０９で再生成された単位文字領域に対してＯＣＲを行う。ただし、Ｓ７０９において活字領域とされた文字領域１００２については、活字ＯＣＲを行うか、活字領域に対して行ったＳ７０９での活字ＯＣＲ結果を再利用する。一方、補正対象領域のうち活字領域以外の文字領域１００３については、手書き文字である可能性が高いとし、手書き文字ＯＣＲを実行する。Ｓ７０９の結果、補正対象領域において再生成された単位文字領域ごとのＯＣＲ結果である文字コードを得ることが出来る。 In S710, the handwritten character recognition unit 305 and the printed character recognition unit 306 perform OCR on the unit character area regenerated in S709. However, for the character area 1002 determined as the character area in S709, the character OCR is performed, or the character OCR result performed on the character area in S709 is reused. On the other hand, the character area 1003 other than the printed character area in the correction target area is highly likely to be handwritten characters, and handwritten character OCR is performed. As a result of S709, a character code, which is an OCR result for each unit character area regenerated in the correction target area, can be obtained.

Ｓ７１１において、項目抽出手段３０７は、Ｓ７０３、Ｓ７０４、およびＳ７１０の手書き文字ＯＣＲ結果、活字ＯＣＲ結果をそれぞれの単位文字領域の座標の序列に基づいて統合し、Ｓ４０４の文字認識処理を終了する。このＯＣＲ結果の統合は、手書き文字ＯＣＲおよび活字ＯＣＲそれぞれの認識結果と文字領域の座標に基づいて行う。 In S711, the item extracting unit 307 integrates the handwritten character OCR results and the printed character OCR results in S703, S704, and S710 based on the order of the coordinates of the respective unit character regions, and ends the character recognition processing in S404. This integration of OCR results is performed based on the recognition results of handwritten character OCR and printed character OCR and the coordinates of the character area.

ここでＳ７０９の補正対象領域における活字探索処理の詳細について説明する。図９に、本実施形態における活字探索処理を説明するフローチャートを示す。図１０（ａ）に図８に示す例における活字探索処理中の様子を示し、図１０（ｂ）にその活字探索処理の結果を示す。探索範囲の初期位置は、周辺の基準文字領域と同一高さ、かつ、単位文字領域を再生成する補正対象領域の右端または左端であり、図１０（ａ）では探索範囲１０００は補正対象領域の右端を初期探索範囲とした例を示している。 Here, details of the type search processing in the correction target area in S709 will be described. FIG. 9 shows a flowchart for explaining the type search processing in this embodiment. FIG. 10(a) shows a state during the type search process in the example shown in FIG. 8, and FIG. 10(b) shows the result of the type search process. The initial position of the search range is the same height as the surrounding reference character area, and is the right end or left end of the correction target area for regenerating the unit character area. An example in which the right end is the initial search range is shown.

まずＳ９００において、Ｓ７０８で定めた探索サイズで補正対象領域内に設定した探索範囲に対して活字ＯＣＲを行い、文字コードおよび認識の信頼度を得る。 First, in S900, type OCR is performed on the search range set in the correction target area with the search size determined in S708, and the character code and recognition reliability are obtained.

Ｓ９０１において、活字ＯＣＲの認識の信頼度が所定の閾値以上であるか否かを判定する。活字ＯＣＲの信頼度が所定の閾値より低い場合、その探索範囲には活字が含まれていないと判定し、Ｓ９０２に移行し、信頼度が所定の閾値以上の場合、Ｓ９０４に移行する。 In S901, it is determined whether or not the reliability of the printed character OCR recognition is equal to or higher than a predetermined threshold. If the reliability of the printed character OCR is lower than the predetermined threshold, it is determined that the search range does not include printed characters, and the process proceeds to S902.

Ｓ９０２において、補正対象領域内で探索範囲を水平方向に移動（シフト）させる余地が存在するか否かを判定する。補正対象領域内に探索範囲をシフトさせる余地が存在する場合、Ｓ９０３に移行し、シフトさせる余地が存在しない場合、活字探索処理を終了する。 In S902, it is determined whether there is room for moving (shifting) the search range in the horizontal direction within the correction target area. If there is room to shift the search range within the correction target area, the process proceeds to S903, and if there is no room to shift, the type search process ends.

Ｓ９０３において、探索範囲のシフトを行う。探索範囲のシフトは、予め定めた固定幅（例えば５ピクセル）ごとに行う。図１０（ａ）に示す探索範囲１００１は、探索範囲１０００を水平方向左側に数ピクセル移動させた後の様子を示している。探索範囲の位置を移動後、Ｓ９００に戻る。 In S903, the search range is shifted. The search range is shifted by a predetermined fixed width (for example, 5 pixels). A search range 1001 shown in FIG. 10(a) shows a state after the search range 1000 is horizontally moved to the left by several pixels. After moving the position of the search range, the process returns to S900.

このように補正対象領域内に探索範囲をシフトさせる余地がなくなる（Ｓ９０２でＮＯ）、または活字ＯＣＲの信頼度が所定の閾値以上となる（Ｓ９０１でＹＥＳ）まで、探索範囲のシフト（Ｓ９０３）、探索範囲に対する活字ＯＣＲ（Ｓ９００）を繰り返す。 Until there is no more room to shift the search range into the correction target area (NO in S902), or until the reliability of the printed character OCR reaches or exceeds a predetermined threshold (YES in S901), the search range is shifted (S903), The type OCR (S900) for the search range is repeated.

またこの探索処理に時間を要するため、探索範囲のピクセル単位のシフトを行わず、探索範囲を補正対象領域の右端と左端とに限定してもよい。 Also, since this search process takes time, the search range may be limited to the right and left ends of the correction target area without shifting the search range in units of pixels.

Ｓ９０４において、活字ＯＣＲの信頼度が高いと判定した探索範囲で囲まれる文字領域を新たな活字の単位文字領域であるとし、補正対象領域をこの新たな活字の単位文字領域とそれ以外の領域とに分割する。図１０（ｂ）は、活字探索分割の結果を示したものである。活字ＯＣＲの信頼度が所定の閾値以上である探索範囲１００２を活字の単位文字領域として切り出し、その領域の画素をマスクした状態で残りの文字領域１００３を別の単位文字領域とする。すなわち、補正対象領域を、活字の単位文字領域とその他の残りの領域からなる別の単位文字領域に分割することで、新たな２つの単位文字領域を再生成し、活字探索処理を終了する。 In step S904, the character area surrounded by the search range determined to have high OCR reliability is set as the new character unit character area, and the correction target area is the new character unit character area and other areas. split into FIG. 10(b) shows the result of the type search division. A search range 1002 in which the reliability of the type OCR is equal to or higher than a predetermined threshold is cut out as a unit character area of the type, and the remaining character area 1003 is set as another unit character area while the pixels in that area are masked. That is, by dividing the area to be corrected into another unit character area consisting of a character unit area and the remaining area, two new unit character areas are regenerated, and the character search processing is terminated.

なお、本実施形態では、Ｓ９００～Ｓ９０３のサイクルにおいて、探索範囲をシフトさせて、その都度、探索範囲に対して活字ＯＣＲを行い、活字ＯＣＲの結果の信頼度が所定の閾値以上になるまで繰り返す構成としたが、この方法に限定されない。例えば、探索範囲の候補領域を補正対象領域内から一括で取得し、それら候補領域に対して活字ＯＣＲを一括で実行し、最も信頼度の高い領域を新たな活字の単位文字領域とするようにしてもよい。 In the present embodiment, the search range is shifted in the cycle of S900 to S903, and printed OCR is performed on the search range each time, and this is repeated until the reliability of the printed OCR result reaches or exceeds a predetermined threshold. However, it is not limited to this method. For example, candidate regions in the search range are collectively acquired from within the correction target region, type OCR is executed collectively for these candidate regions, and the region with the highest reliability is set as a unit character region of new type. may

以上により、本実施形態における文字認識処理が実現される。上記のように、手書き活字判定、手書き文字ＯＣＲおよび活字ＯＣＲの結果の信頼度が所定の閾値未満の単位文字領域を、新たな単位文字領域を再生成する補正対象領域として特定する。活字ＯＣＲの結果の信頼度が所定の閾値以上である活字の単位文字領域である基準文字領域からサイズを決定した探索範囲ごとに補正対象領域内に対して活字ＯＣＲを行い、その活字ＯＣＲの結果の信頼度を参照しながら新たな活字の単位文字領域を探索する。活字探索処理の結果に基づき、補正対象領域から新たな活字の単位文字領域とその他の残りの領域からなる別の単位文字領域を再生成する。これにより、文字領域の特定精度が向上し、文字認識精度も向上させることが出来る。 As described above, the character recognition processing in this embodiment is realized. As described above, a unit character area for which the reliability of handwritten character determination, handwritten character OCR, and printed character OCR is less than a predetermined threshold is specified as a correction target area for regenerating a new unit character area. Character OCR is performed for each search range whose size is determined from a reference character area, which is a unit character area of a character whose reliability is equal to or higher than a predetermined threshold, and the correction target area is subjected to the character OCR result. A unit character area of a new type is searched while referring to the reliability of . Based on the result of the character search processing, another character unit area is regenerated from the area to be corrected, which consists of a new character unit character area and other remaining areas. As a result, it is possible to improve the accuracy of specifying the character area and also improve the accuracy of character recognition.

＜第２の実施形態＞
第１の実施形態では、文書画像データから抽出された文字列領域に関して単位文字ごとの切り出しを行い、切り出された単位文字領域ごとに手書き文字か活字かを判定した。一方、本実施形態では、細線部ごとに手書き活字判定を行う方法を用いる。手書き文字と活字が混在した画像から細線部を抽出し、抽出した細線部を文字ストロークに分解し、文字ストローク内の画素値のヒストグラムに基づき、その文字ストロークが手書き文字の一部か活字の一部かを判定する（特許文献２参照）。この文字ストロークごとの手書き活字判定を用いる場合は、手書き活字判定の後に文字切りを行い、手書き文字の文字ストロークを含む単位文字領域には手書き文字ＯＣＲを実行し、活字文字の文字ストロークを含む単位文字領域には活字ＯＣＲを実行する。つまり、本実施形態では実施形態１におけるＳ７０１とＳ７０２の順序が逆で、手書き活字判定の後に単位文字領域の切り出しを行うフローとなる。 <Second embodiment>
In the first embodiment, a character string region extracted from document image data is cut out for each unit character, and whether it is a handwritten character or a printed character is determined for each cut out unit character region. On the other hand, in the present embodiment, a method of performing handwritten character determination for each thin line portion is used. It extracts thin lines from an image containing both handwritten and printed characters, decomposes the extracted thin lines into character strokes, and determines whether the character strokes are part of handwritten characters or printed characters based on a histogram of pixel values within the character strokes. It is determined whether it is part or not (see Patent Document 2). When handwritten character determination for each character stroke is used, character cut is performed after handwritten character determination, handwritten character OCR is performed on a unit character area including the character stroke of the handwritten character, and a unit including the character stroke of the printed character is performed. Type OCR is performed on the character area. That is, in the present embodiment, the order of S701 and S702 in the first embodiment is reversed, and the flow is such that the unit character region is cut out after handwritten type determination.

図１１（ａ）は、活字の一部を手書きの一部と誤判定した際の様態を示したものであり、「年」の一部の文字ストロークが手書き文字の一部と誤判定されている。ここで、矩形１１００には手書き文字の一部と判定された画素が含まれ、矩形１１０１には活字の一部と判定された画素が含まれている。この場合、矩形１１００に対する手書き文字ＯＣＲの結果も、矩形１１０１に対する活字ＯＣＲの結果もともに信頼度の低いものになる。第１の実施形態ではＳ７０５において文字切り後の単位文字領域に対して手書き活字判定を行い、その判定結果が「不明」もしくは手書き文字ＯＣＲおよび活字ＯＣＲの信頼度が低い場合、その単位文字領域を補正対象領域とした。本実施形態では、手書き文字ＯＣＲ結果または活字ＯＣＲ結果の信頼度が低い単位文字領域の中で、単位文字領域間の距離が閾値以下または重なっている単位文字領域を特定し、それら単位文字領域に存在する画素を囲む矩形を補正対象領域とする。図１１（ｂ）に補正対象領域となる矩形１１０３を示す。図１１（ａ）に示す矩形１１００、１１０１、つまり手書き文字の一部と判定された画素を含む単位文字領域および活字の一部と判定された画素を含む単位文字領域は、ともにＯＣＲ結果の信頼度は低く、かつ、図１１（ｂ）に示すように重なっている。そのため、これら矩形１１００、１１０１に外接する矩形１１０３を補正対象領域とする。 FIG. 11(a) shows a state in which part of printed characters is erroneously determined to be part of handwritten characters. there is Here, a rectangle 1100 includes pixels determined to be part of a handwritten character, and a rectangle 1101 includes pixels determined to be part of a printed character. In this case, both the handwritten character OCR result for the rectangle 1100 and the printed character OCR result for the rectangle 1101 have low reliability. In the first embodiment, in S705, handwritten character determination is performed on the unit character region after character cutting. This is the area to be corrected. In the present embodiment, among unit character regions with low reliability of handwritten character OCR results or printed character OCR results, unit character regions where the distance between unit character regions is equal to or less than a threshold value or where the unit character regions overlap each other are specified, and these unit character regions are identified. A rectangle surrounding existing pixels is set as a correction target area. FIG. 11B shows a rectangle 1103 as a correction target area. Rectangles 1100 and 1101 shown in FIG. 11(a), that is, unit character areas including pixels determined to be part of handwritten characters and unit character areas including pixels determined to be part of printed characters are both reliable for OCR results. The degree is low and overlaps as shown in FIG. 11(b). Therefore, a rectangle 1103 circumscribing these rectangles 1100 and 1101 is set as a correction target area.

以上が本実施形態における単位文字領域を再生成する補正対象領域の特定である。その後は実施形態１と同様、基準文字領域を決定し、基準文字領域から計算された探索サイズに基づいて補正対象領域から活字を探索し、単位文字領域を再生成する。 The above is the identification of the correction target area for regenerating the unit character area in this embodiment. Thereafter, as in the first embodiment, a reference character area is determined, characters are searched from the correction target area based on the search size calculated from the reference character area, and the unit character area is regenerated.

以上の処理によって、文字ストローク単位の手書き活字判定を行う場合でも、文字領域の特定精度が向上し、文字認識精度を向上させることが出来る。 By the above-described processing, even when handwritten character determination is performed in units of character strokes, the accuracy of identifying character regions can be improved, and the character recognition accuracy can be improved.

（その他の実施例）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other examples)
The present invention supplies a program that implements one or more functions of the above-described embodiments to a system or device via a network or a storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by processing to It can also be implemented by a circuit (for example, ASIC) that implements one or more functions.

１１０情報処理装置
３００文字認識装置
３０１処理結果提供装置
３０２文字認識結果生成装置 110 Information processing device 300 Character recognition device 301 Processing result providing device 302 Character recognition result generating device

Claims

extracting means for extracting unit character region candidates corresponding to each of the plurality of characters from a character string region composed of a plurality of characters included in a read image obtained by reading a document;
Recognition means for performing character recognition processing for handwritten characters or printed characters on the candidates of the unit character regions, and obtaining character recognition results and reliability thereof;
correction means for correcting unit character area candidates, excluding unit character area candidates whose reliability of the character recognition result is equal to or higher than a predetermined threshold, from among the unit character area candidates, using reference unit character area candidates; ,
with
The reference unit character region candidate is a unit character region candidate for which the reliability of the character recognition result obtained by the character recognition processing for printed characters performed by the recognition means is equal to or higher than a predetermined threshold.
An information processing device characterized by:

The correction means performs the character recognition processing for printed characters on the candidates of the unit character regions to be corrected, using the candidates of the unit character regions serving as the reference, and the reliability of the character recognition result obtained by performing the character recognition processing. correcting the unit character area to be corrected to a plurality of new unit character area candidates including a new type unit character area candidate,
The information processing apparatus according to claim 1, characterized by:

The correction means sets a search range according to the size of the reference unit character area candidate, and sets the character for printing for each search range with respect to the correction target unit character area candidate. Determining a candidate for a unit character region of the new type based on the reliability of the character recognition result obtained by performing the recognition process;
3. The information processing apparatus according to claim 2, characterized by:

The correction means performs character recognition for printed characters while moving the search range in the horizontal direction at the same height as the reference unit character area candidates for the candidate unit character areas to be corrected. process,
4. The information processing apparatus according to claim 3, characterized by:

The correction means determines the search range based on the maximum and minimum values of the width and height of the candidates for the reference unit character area.
5. The information processing apparatus according to claim 3, wherein:

The correction means sets a search range having the highest reliability of a character recognition result obtained by performing character recognition processing for printed characters for each of the search ranges as a candidate for the unit character area of the new printed character.
6. The information processing apparatus according to any one of claims 3 to 5, characterized in that:

The correction means determines a search range in which the reliability of the result of character recognition processing for printed characters performed for each search range is equal to or higher than a predetermined threshold value as a candidate for the unit character area of the new printed character.
6. The information processing apparatus according to any one of claims 3 to 5, characterized in that:

the candidate for the unit character region to be corrected and the candidate for the reference unit character region are included in the same character string region;
8. The information processing apparatus according to any one of claims 1 to 7, characterized by:

The unit character region candidate to be corrected is one unit character region candidate,
9. The information processing apparatus according to any one of claims 1 to 8, characterized by:

The extraction means is
character cutting means for cutting out candidates for the unit character region from the character string region;
determination means for determining whether a character represented by the candidate for the unit character region is a handwritten character or a printed character;
including
The recognition means performs character recognition processing for handwritten characters on candidate unit character regions determined to be handwritten characters, and character for printed characters on candidates for unit character regions determined to be printed characters. perform recognition processing,
10. The information processing apparatus according to claim 9, characterized by:

The candidates for the unit character region to be corrected are the candidates for the plurality of unit character regions out of the candidates for the unit character region excluding the candidates for the unit character region whose reliability of the character recognition result is equal to or higher than a predetermined threshold. A plurality of unit character region candidates whose distance between is equal to or less than a predetermined threshold,
9. The information processing apparatus according to any one of claims 1 to 8, characterized by:

The extraction means is
determining means for determining whether each part constituting a character in the character string area is a handwritten character or a printed character;
character cutting means for cutting out candidates for the unit character region from the character string region based on the determination result of the portion;
including
The recognition means performs character recognition processing for handwritten characters on candidate unit character regions determined to be handwritten characters, and character for printed characters on candidates for unit character regions determined to be printed characters. perform recognition processing,
12. The information processing apparatus according to claim 11, characterized by:

a step of extracting unit character region candidates corresponding to each of the plurality of characters from a character string region composed of a plurality of characters included in a read image obtained by reading a document;
a step of performing character recognition processing for handwritten characters or printed characters on the candidates for the unit character region, and obtaining a character recognition result and its reliability;
a step of correcting unit character region candidates, excluding unit character region candidates whose reliability of the character recognition result is equal to or higher than a predetermined threshold, from among the unit character region candidates, using a reference unit character region candidate;
with
The reference unit character region candidate is a unit character region candidate for which the reliability of the character recognition result obtained by the character recognition processing for printed characters performed by the recognition means is equal to or higher than a predetermined threshold.
An information processing method characterized by:

A program for causing a computer to function as the information processing apparatus according to any one of claims 1 to 11.