JP2021144673A

JP2021144673A - Image processing apparatus, image processing method and program

Info

Publication number: JP2021144673A
Application number: JP2020148383A
Authority: JP
Inventors: 崇宮内; Takashi Miyauchi
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2020-03-12
Filing date: 2020-09-03
Publication date: 2021-09-24

Abstract

To extract an index of an extraction target even in a case where a position of a text block of a scan image is different from that of a registered document.SOLUTION: An image formation apparatus 100 comprises an image processing unit 305 including: acquisition means which acquires image data of a document; detection means which detects a text block in an image indicated by the image data; specification means which specifies a document corresponding to the image as a registration document on the basis of a prescribed rule; estimation means which estimates a text block of an item being a processing target in the image on the basis of a partial layout including the text block of the item of the processing target and at least one text block other than the text block of the item of the processing target out of the text blocks prescribed in the registration document; and extraction means which extracts a character string in the estimated text block as a character string of the item of the processing target.SELECTED DRAWING: Figure 4

Description

本開示は、画像に含まれるインデックスを抽出する技術に関する。 The present disclosure relates to a technique for extracting an index contained in an image.

帳票等の紙文書を画像読み取り装置でスキャンすることにより得られたスキャン画像に含まれる所望の項目の文字列（以下、インデックスという）を抽出する方法がある。文書の内容からインデックスを抽出するには、ＯＣＲ処理が必要となる。しかし、スキャン画像全体に対してＯＣＲ処理を実行すると処理負荷が増し、ユーザの待ち時間の増加することがある。 There is a method of extracting a character string (hereinafter referred to as an index) of a desired item included in a scanned image obtained by scanning a paper document such as a form with an image reading device. OCR processing is required to extract the index from the contents of the document. However, when the OCR process is executed on the entire scanned image, the processing load increases and the waiting time of the user may increase.

特許文献１には、文書の種類ごとにインデックスが含まれる領域の情報を予め登録し、登録されているインデックスの領域に対して部分的にＯＣＲ処理を行い、スキャン画像からインデックスを抽出する方法が開示されている。 Patent Document 1 describes a method in which information on an area including an index is registered in advance for each type of document, OCR processing is partially performed on the registered index area, and an index is extracted from a scanned image. It is disclosed.

特開２０１９−１２８７１５号公報Japanese Unexamined Patent Publication No. 2019-128715

しかしながら、同じ種類の文書であっても、記載される内容によってインデックスが含まれる文字列領域（以下、テキストブロックという）の位置がずれていることがある。このため、登録されているインデックスの領域に対して部分的にＯＣＲ処理を行っても、インデックスの抽出に失敗してしまうことがある。 However, even for the same type of document, the position of the character string area (hereinafter referred to as a text block) including the index may be shifted depending on the contents described. Therefore, even if the OCR process is partially performed on the registered index area, the index extraction may fail.

本開示の技術は、スキャン画像のテキストブロックの位置が、登録されている位置とずれている場合であっても、抽出対象のインデックスを抽出することを目的とする。 The technique of the present disclosure aims to extract an index to be extracted even when the position of the text block of the scanned image deviates from the registered position.

本開示の画像処理装置は、原稿の画像データを取得する取得手段と、前記画像データが示す画像内のテキストブロックを検出する検出手段と、文書群ごとのテキストブロックのレイアウトを規定している情報の中から、所定のルールに基づき、前記画像に対応する文書を登録文書として特定する特定手段と、前記登録文書に規定されているテキストブロックのうち、処理対象の項目に対応するテキストブロックと、前記処理対象の項目に対応するテキストブロック以外の少なくとも１つのテキストブロックと、を含むレイアウトである部分レイアウトに基づき、前記画像内における前記処理対象の項目に対応するテキストブロックを推定する推定手段と、前記推定されたテキストブロックにおける文字列を前記処理対象の項目に対応する文字列として抽出する抽出手段と、を有することを特徴とする。 The image processing apparatus of the present disclosure defines an acquisition means for acquiring image data of a document, a detection means for detecting a text block in an image indicated by the image data, and a layout of a text block for each document group. Among the above, a specific means for specifying the document corresponding to the image as a registered document based on a predetermined rule, a text block corresponding to an item to be processed among the text blocks specified in the registered document, and a text block corresponding to the item to be processed. An estimation means for estimating a text block corresponding to the item to be processed in the image based on a partial layout having a layout including at least one text block other than the text block corresponding to the item to be processed. It is characterized by having an extraction means for extracting a character string in the estimated text block as a character string corresponding to the item to be processed.

本開示の技術によれば、スキャン画像のテキストブロックの位置が登録されている文書と異なる場合であっても、抽出対象のインデックスを抽出することができる。 According to the technique of the present disclosure, even if the position of the text block of the scanned image is different from the registered document, the index to be extracted can be extracted.

システムの構成例を示す図である。It is a figure which shows the configuration example of a system. 画像形成装置のハードウェア構成例を示す図である。It is a figure which shows the hardware configuration example of the image forming apparatus. 画像形成装置の機能構成を示す図である。It is a figure which shows the functional structure of an image forming apparatus. スキャン画像のファイル生成処理のフローチャートである。It is a flowchart of a file generation process of a scanned image. インデックス抽出処理のフローチャートである。It is a flowchart of index extraction processing. ブロックセレクション処理の例を示す図である。It is a figure which shows the example of a block selection process. インデックス抽出ルールの例を示す図である。It is a figure which shows the example of the index extraction rule. インデックスブロック推定処理のフローチャートである。It is a flowchart of index block estimation processing. ペアブロックの決定方法を説明する図である。It is a figure explaining the determination method of a pair block. 部分パターンの例を示す図である。It is a figure which shows the example of a partial pattern. Ｙ候補位置の決定処理を説明する図である。It is a figure explaining the determination process of the Y candidate position. Ｙ方向のシフト量のヒストグラムの例を示す図である。It is a figure which shows the example of the histogram of the shift amount in the Y direction. 部分パターンの一致度の算出を説明する図である。It is a figure explaining the calculation of the degree of coincidence of a partial pattern. 部分パターンの一致度の算出を説明する図である。It is a figure explaining the calculation of the degree of coincidence of a partial pattern. 部分パターン範囲の決定方法を説明する図である。It is a figure explaining the method of determining a partial pattern range. インデックスブロック推定処理のフローチャートである。It is a flowchart of index block estimation processing. 部分パターンの例を示す図である。It is a figure which shows the example of a partial pattern. ＸＹ候補位置群の例を示す図である。It is a figure which shows the example of the XY candidate position group. 類似位置群の例を示す図である。It is a figure which shows the example of the similar position group. 類似位置群とＸＹ候補位置群の対応付けを説明する図である。It is a figure explaining the correspondence between a similar position group and an XY candidate position group.

以下、添付図面を参照して実施形態を詳しく説明する。なお、以下の実施形態は特許請求の範囲に係る本開示の技術を限定するものでなく、また本実施形態で説明されている特徴の組み合わせの全てが本開示の技術の解決手段に必須のものとは限らない。 Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. It should be noted that the following embodiments do not limit the technology of the present disclosure according to the claims, and all combinations of features described in the present embodiment are essential for solving the technology of the present disclosure. Not necessarily.

＜実施形態１＞
本実施形態の画像形成装置は、文書原稿をスキャンして、得られたスキャン画像の先頭ページの画像に含まれる所定の項目の文字列を組み合わせてファイル名を生成する。そして生成したファイル名をそのスキャン画像のファイル名としてユーザにレコメンドする。しかしながら、スキャン画像から所定の項目の文字列を抽出するには処理負荷が増すことがある。 <Embodiment 1>
The image forming apparatus of the present embodiment scans a document document and generates a file name by combining character strings of predetermined items included in the image of the first page of the obtained scanned image. Then, the generated file name is recommended to the user as the file name of the scanned image. However, the processing load may increase in order to extract the character string of a predetermined item from the scanned image.

このため、文書の種類ごとに所定の項目のテキストブロックの位置情報を登録しておく。そしてスキャン画像の文書の種類を特定して、特定された文書における登録されたテキストブロックの位置に基づき、スキャン画像から所定の項目の文字列を抽出することが考えられる。しかしながらこの場合も、同じ文書の種類であっても、記載内容の変更等によりスキャンされた画像におけるテキストブロックの位置は登録されている位置と異なってしまうことがある。 Therefore, the position information of the text block of a predetermined item is registered for each type of document. Then, it is conceivable to specify the document type of the scanned image and extract the character string of a predetermined item from the scanned image based on the position of the registered text block in the specified document. However, even in this case, even if the document type is the same, the position of the text block in the scanned image may be different from the registered position due to a change in the description content or the like.

例えば、図１１（ａ）の文書が登録されており、テキストブロック１００３の位置を示す情報が発行元会社名を示す文字列が含まれる領域の情報として登録されているものとする。一方、図１１（ｂ）は、図１１（ａ）と同じ種類の文書をスキャンして得られたスキャン画像であるが、表構造内の項目行数が増えており、抽出されるべき発行元会社名のテキストブロック１１０１が、図１１（ａ）と比較して下方向にシフトしている。このため図１１（ｂ）のスキャン画像を得るためにスキャンされた文書が図１１（ａ）と同じ種類であると特定できても、図１１（ｂ）の画像の発行元会社名を示す文字列の抽出に失敗することがある。なお、図１１（ｃ）の説明については後述する。 For example, it is assumed that the document of FIG. 11A is registered and the information indicating the position of the text block 1003 is registered as the information of the area including the character string indicating the issuing company name. On the other hand, FIG. 11B is a scanned image obtained by scanning the same type of document as in FIG. 11A, but the number of item lines in the table structure is increasing, and the publisher to be extracted is The text block 1101 of the company name is shifted downward as compared with FIG. 11A. Therefore, even if it can be identified that the document scanned to obtain the scanned image of FIG. 11 (b) is of the same type as that of FIG. 11 (a), the character indicating the issuing company name of the image of FIG. 11 (b). Column extraction may fail. The description of FIG. 11 (c) will be described later.

このため実施形態では、スキャン画像に含まれる項目のテキストブロックを抽出するために、スキャンされた文書原稿と同じ種類の文書における項目を示すテキストブロックと、それ以外の少なくとも１つのテキストブロックとのレイアウトを用いる。本実施形態では、そのレイアウトとの一致度が高い領域をスキャン画像から探索して、探索された結果に基づきスキャン画像に含まれる項目のテキストブロックを推定する方法を説明する。 Therefore, in the embodiment, in order to extract the text block of the item included in the scanned image, the layout of the text block indicating the item in the document of the same type as the scanned document manuscript and at least one other text block. Is used. In the present embodiment, a method of searching the scanned image for a region having a high degree of matching with the layout and estimating the text block of the item included in the scanned image based on the searched result will be described.

なお、本実施形態では、画像内の座標は例えば、原点が左上で、縦方向がＹ方向、文字列が連続する横方向がＸ方向に延びる座標系が用いられる。テキストブロックの位置は、例えば、左上座標値が夫々の位置として保持される。 In the present embodiment, for example, a coordinate system is used in which the origin is in the upper left, the vertical direction is in the Y direction, and the horizontal direction in which the character strings are continuous extends in the X direction. As for the position of the text block, for example, the upper left coordinate value is held as each position.

［システム構成］
図１は、本実施形態を適用可能なシステムの全体構成を示す図である。本実施形態のシステム１０５は、画像形成装置１００および端末１０１を有する。図１に示すように、画像形成装置１００はＬＡＮ１０２に接続され、Ｉｎｔｅｒｎｅｔ１０３等を介してＰＣなどの端末１０１等と通信可能になっている。なお、本実施形態においては、端末１０１は無くてもよく、画像形成装置１００のみの構成だけでもよい。 [System configuration]
FIG. 1 is a diagram showing an overall configuration of a system to which this embodiment can be applied. The system 105 of this embodiment has an image forming apparatus 100 and a terminal 101. As shown in FIG. 1, the image forming apparatus 100 is connected to the LAN 102 and can communicate with a terminal 101 or the like such as a PC via the Internet 103 or the like. In this embodiment, the terminal 101 may not be provided, or only the image forming apparatus 100 may be configured.

画像形成装置１００は、表示・操作部１２３（図２参照）、スキャナ部１２２（図２参照）及び、プリンタ部１２１（図２参照）等を有する複合機（ＭＦＰ）である。画像形成装置１００は、スキャナ部１２２を用いて文書原稿をスキャンするスキャン端末として利用することが可能である。また、タッチパネルやハードボタンなどの表示・操作部１２３を有し、ファイル名や格納先のレコメンド結果を表示したり、ユーザからの指示を受け付けたりするためのユーザインタフェースの表示を行う。 The image forming apparatus 100 is a multifunction device (MFP) having a display / operation unit 123 (see FIG. 2), a scanner unit 122 (see FIG. 2), a printer unit 121 (see FIG. 2), and the like. The image forming apparatus 100 can be used as a scanning terminal for scanning a document document using the scanner unit 122. In addition, it has a display / operation unit 123 such as a touch panel and a hard button, and displays a file name and a recommendation result of a storage destination, and displays a user interface for receiving an instruction from a user.

［画像形成装置のハードウェア構成］
図２は、画像形成装置１００のハードウェア構成を示すブロック図である。本実施形態の画像形成装置１００は、表示・操作部１２３、スキャナ部１２２、プリンタ部１２１、及び制御部１１０を有する。 [Hardware configuration of image forming apparatus]
FIG. 2 is a block diagram showing a hardware configuration of the image forming apparatus 100. The image forming apparatus 100 of the present embodiment includes a display / operation unit 123, a scanner unit 122, a printer unit 121, and a control unit 110.

制御部１１０は、ＣＰＵ１１１、記憶装置１１２（ＲＯＭ１１８，ＲＡＭ１１９，ＨＤＤ１２０）、プリンタＩ／Ｆ部１１３、ネットワークＩ／Ｆ部１１４、スキャナＩ／Ｆ部１１５、表示・操作Ｉ／Ｆ部１１６を有する。また、制御部１１０ではこの各部がシステムバス１１７を介して互いに通信可能に接続されている。制御部１１０は、画像形成装置１００全体の動作を制御する。 The control unit 110 includes a CPU 111, a storage device 112 (ROM 118, RAM 119, HDD 120), a printer I / F unit 113, a network I / F unit 114, a scanner I / F unit 115, and a display / operation I / F unit 116. Further, in the control unit 110, these units are connected to each other so as to be able to communicate with each other via the system bus 117. The control unit 110 controls the operation of the entire image forming apparatus 100.

ＣＰＵ１１１は、記憶装置１１２に記憶された制御プログラムを読み出し実行することにより、後述のフローチャートにおける読取制御や画像処理、表示制御などの各処理を実行する手段として機能する。 The CPU 111 functions as a means for executing each process such as reading control, image processing, and display control in the flowchart described later by reading and executing the control program stored in the storage device 112.

記憶装置１１２は、制御プログラム、画像データ、メタデータ、設定データ及び、処理結果データ等を格納し保持する。記憶装置１１２には、不揮発性メモリであるＲＯＭ１１８、揮発性メモリであるＲＡＭ１１９及び、大容量記憶領域であるＨＤＤ１２０などがある。ＲＯＭ１１８は、制御プログラムなどを保持する不揮発性メモリであり、ＣＰＵ１１１はその制御プログラムを読み出し制御を行う。ＲＡＭ１１９は、ＣＰＵ１１１の主メモリ、ワークエリア等の一時記憶領域として用いられる揮発性メモリである。 The storage device 112 stores and holds a control program, image data, metadata, setting data, processing result data, and the like. The storage device 112 includes a ROM 118 which is a non-volatile memory, a RAM 119 which is a volatile memory, an HDD 120 which is a large capacity storage area, and the like. The ROM 118 is a non-volatile memory that holds a control program and the like, and the CPU 111 reads and controls the control program. The RAM 119 is a volatile memory used as a temporary storage area such as a main memory and a work area of the CPU 111.

ネットワークＩ／Ｆ部１１４は、制御部１１０（画像形成装置１００）を、システムバス１１７を介してＬＡＮ１０２に接続する。ネットワークＩ／Ｆ部１１４は、ＬＡＮ１０２上の外部装置に画像データを送信したり、ＬＡＮ１０２上の外部装置から各種情報を受信したりする。 The network I / F unit 114 connects the control unit 110 (image forming apparatus 100) to the LAN 102 via the system bus 117. The network I / F unit 114 transmits image data to an external device on the LAN 102, and receives various information from the external device on the LAN 102.

スキャナＩ／Ｆ部１１５は、スキャナ部１２２と制御部１１０とを、システムバス１１７を介して接続する。スキャナ部１２２は、文書原稿を読み取ってスキャン画像データを生成し、スキャナＩ／Ｆ部１１５を介してスキャン画像データを制御部１１０に入力する。なお、スキャナ部１２２は、原稿フィーダを備え、トレイに置かれた複数の原稿を１枚ずつフィードして、連続的に読み取ることを可能とする。 The scanner I / F unit 115 connects the scanner unit 122 and the control unit 110 via the system bus 117. The scanner unit 122 reads the document document to generate scanned image data, and inputs the scanned image data to the control unit 110 via the scanner I / F unit 115. The scanner unit 122 is provided with a document feeder, and can feed a plurality of documents placed on the tray one by one and continuously read them.

表示・操作Ｉ／Ｆ部１１６は、表示・操作部１２３と制御部１１０とを、システムバス１１７を介して接続する。表示・操作部１２３には、タッチパネル機能を有する液晶表示部、ハードボタンなどが備えられている。 The display / operation I / F unit 116 connects the display / operation unit 123 and the control unit 110 via the system bus 117. The display / operation unit 123 is provided with a liquid crystal display unit having a touch panel function, hard buttons, and the like.

プリンタＩ／Ｆ部１１３は、プリンタ部１２１と制御部１１０とを、システムバス１１７を介して接続する。プリンタ部１２１は、ＣＰＵ１１１で生成された画像データをプリンタＩ／Ｆ部１１３を介して受信し、当該受信した画像データを用いて記録紙へのプリント処理が行われる。以上のように、本実施形態に係る画像形成装置１００では、上記のハードウェア構成によって、画像処理機能を提供することが可能である。 The printer I / F unit 113 connects the printer unit 121 and the control unit 110 via the system bus 117. The printer unit 121 receives the image data generated by the CPU 111 via the printer I / F unit 113, and prints on the recording paper using the received image data. As described above, the image forming apparatus 100 according to the present embodiment can provide an image processing function by the above hardware configuration.

［画像形成装置の機能構成］
図３は、画像形成装置１００の機能構成を示すブロック図である。なお、図３では画像形成装置１００が有する諸機能のうち、文書原稿をスキャンして電子化（ファイル化）し、保存を行うまでの処理に関わる機能に絞った機能を示す。 [Functional configuration of image forming apparatus]
FIG. 3 is a block diagram showing a functional configuration of the image forming apparatus 100. Note that FIG. 3 shows a function focusing on the functions related to the processing of scanning the document document, digitizing it (file-making), and saving it, among the various functions of the image forming apparatus 100.

表示制御部３０１は、表示・操作部１２３のタッチパネルに、各種のユーザ操作を受け付けるためのユーザインタフェース画面（ＵＩ画面）を表示する。各種のユーザ操作には、例えば、スキャン設定、スキャンの開始指示、ファイル名設定、ファイルの保存指示などがある。 The display control unit 301 displays a user interface screen (UI screen) for accepting various user operations on the touch panel of the display / operation unit 123. Various user operations include, for example, scan settings, scan start instructions, file name settings, file save instructions, and the like.

スキャン制御部３０２は、ＵＩ画面でなされたユーザ操作（例えば「スキャン開始」ボタンの押下）に応じて、スキャン設定の情報と共にスキャン実行部３０３に対しスキャン処理の実行を指示する。スキャン実行部３０３は、スキャン制御部３０２からのスキャン処理の実行指示に従い、スキャナＩ／Ｆ部１１５を介してスキャナ部１２２に文書原稿の読み取り動作を実行させ、スキャン画像データを生成する。生成したスキャン画像データは、スキャン画像管理部３０４によってＨＤＤ１２０に保存される。 The scan control unit 302 instructs the scan execution unit 303 to execute the scan process together with the scan setting information in response to the user operation (for example, pressing the "scan start" button) performed on the UI screen. The scan execution unit 303 causes the scanner unit 122 to execute the scanning operation of the document document via the scanner I / F unit 115 in accordance with the execution instruction of the scan process from the scan control unit 302, and generates the scanned image data. The generated scan image data is stored in the HDD 120 by the scan image management unit 304.

画像処理部３０５は、スキャン画像データに対して、テキストブロックの検出処理、ＯＣＲ処理（文字認識処理）、類似文書の判定処理といった画像解析処理の他、回転や傾き補正といった画像加工処理を行う。画像処理部３０５によって、画像形成装置１００は画像処理装置としても機能する。スキャン画像から検出される文字列領域は「テキストブロック」とも呼ばれる。なお画像処理の詳細については後述する。 The image processing unit 305 performs image analysis processing such as text block detection processing, OCR processing (character recognition processing), and determination processing of similar documents, as well as image processing processing such as rotation and tilt correction, on the scanned image data. The image processing unit 305 also causes the image forming apparatus 100 to function as an image processing apparatus. The character string area detected from the scanned image is also called a "text block". The details of image processing will be described later.

図３の各部の機能は、画像形成装置１００のＣＰＵがＲＯＭに記憶されているプログラムコードをＲＡＭに展開し実行することにより実現される。または、図３の各部の一部または全部の機能をＡＳＩＣや電子回路等のハードウェアで実現してもよい。 The functions of each part of FIG. 3 are realized by the CPU of the image forming apparatus 100 expanding the program code stored in the ROM into the RAM and executing the program code. Alternatively, some or all the functions of each part of FIG. 3 may be realized by hardware such as an ASIC or an electronic circuit.

［スキャン画像のファイル生成処理のフローチャート］
画像形成装置１００が文書原稿を読み取り、文書原稿の先頭ページのスキャン画像に対して画像処理を行い、スキャン画像に含まれる文字列を利用してファイル名を生成し、表示・操作部１２３を通じてユーザにレコメンドする処理の全体について説明する。 [Flowchart of scanned image file generation process]
The image forming apparatus 100 reads the document document, performs image processing on the scanned image of the first page of the document document, generates a file name using the character string included in the scanned image, and the user passes through the display / operation unit 123. The whole process of recommending to is described.

図４のフローチャートで示される一連の処理は、画像形成装置１００のＣＰＵがＲＯＭに記憶されているプログラムコードをＲＡＭに展開し実行することにより行われる。また、図４におけるステップの一部または全部の機能をＡＳＩＣや電子回路等のハードウェアで実現してもよい。なお、各処理の説明における記号「Ｓ」は、当該フローチャートにおけるステップであることを意味し、以後のフローチャートにおいても同様とする。 The series of processes shown in the flowchart of FIG. 4 is performed by the CPU of the image forming apparatus 100 expanding the program code stored in the ROM into the RAM and executing it. Further, some or all the functions of the steps in FIG. 4 may be realized by hardware such as an ASIC or an electronic circuit. The symbol "S" in the description of each process means that the step is a step in the flowchart, and the same applies to the subsequent flowcharts.

Ｓ４００においてスキャン制御部３０２は、表示・操作部１２３を介してユーザのスキャン指示を受け付けると、スキャン実行部３０３に、スキャナ部１２２の原稿フィーダのトレイから複数の文書原稿を１枚ずつ読み取り（スキャン）を実行させる。そして、スキャン制御部３０２は、スキャンの結果得られた画像（スキャン画像とよぶ）の画像データを取得する。 In S400, when the scan control unit 302 receives a user's scan instruction via the display / operation unit 123, the scan execution unit 303 reads a plurality of document documents one by one from the tray of the document feeder of the scanner unit 122 (scanning). ) Is executed. Then, the scan control unit 302 acquires image data of an image (called a scanned image) obtained as a result of scanning.

Ｓ４０１において画像処理部３０５は、Ｓ４００で取得した画像データを解析し、スキャン画像に含まれるインデックスを抽出する処理（インデックス抽出処理）を行う。「インデックス」とは、文書のタイトル、管理ナンバー、会社名などの所定の項目の文字列である。本実施形態ではインデックスは、スキャン画像を保存する際のファイル名またはメタデータとして使用される。本ステップのインデックス抽出処理の詳細については、図５を用いて後述する。 In S401, the image processing unit 305 analyzes the image data acquired in S400 and performs a process (index extraction process) of extracting an index included in the scanned image. The "index" is a character string of a predetermined item such as a document title, a management number, and a company name. In this embodiment, the index is used as a file name or metadata when storing the scanned image. The details of the index extraction process in this step will be described later with reference to FIG.

インデックスの使用方法はファイル名の生成またはメタデータの抽出に限られない。フォルダパスなどの他のプロパティ情報を設定するために用いられてもよい。つまり、ファイル名およびメタデータは、スキャン画像データに関するプロパティ（属性）として設定される情報の一種である。 Index usage is not limited to filename generation or metadata extraction. It may be used to set other property information such as a folder path. That is, the file name and metadata are a kind of information set as properties (attributes) related to the scanned image data.

Ｓ４０２において表示制御部３０１は、Ｓ４０１で抽出されたインデックスを用いてファイル名を生成し、生成されたファイル名およびメタデータを、表示・操作部１２３に表示させてユーザに提示（レコメンド）する。また、表示制御部３０１は、ユーザによる確認または提示したファイル名の修正を受け付ける。表示制御部３０１は表示・操作部１２３を介してユーザから確認または修正を受け付けると、提示したファイル名または修正された場合は修正後のファイル名がスキャン画像のファイル名として決定される。ユーザが表示・操作部１２３を介して修正した場合は、インデックス抽出ルールが更新される。インデックス抽出ルールについては後述する。 In S402, the display control unit 301 generates a file name using the index extracted in S401, displays the generated file name and metadata on the display / operation unit 123, and presents (recommends) the user. In addition, the display control unit 301 accepts confirmation or correction of the presented file name by the user. When the display control unit 301 receives confirmation or correction from the user via the display / operation unit 123, the presented file name or, if corrected, the corrected file name is determined as the file name of the scanned image. When the user modifies it via the display / operation unit 123, the index extraction rule is updated. The index extraction rule will be described later.

Ｓ４０３において画像処理部３０５は、Ｓ４００で取得した画像データからファイルを作成し、Ｓ４０２で決定されたファイル名を設定する。本実施形態では、一例として、ファイル形式としてＰＤＦ（ＰｏｒｔａｂｌｅＤｏｃｕｍｅｎｔＦｏｒｍａｔ）化してスキャン画像を保存するものとして説明する。ＰＤＦの場合には、画像データをページに分け保存することが可能であり、Ｓ４００において複数の文書原稿をスキャンした場合には、各文書原稿に対応する画像データを別々のページとして１つのファイルに保存される。 In S403, the image processing unit 305 creates a file from the image data acquired in S400, and sets the file name determined in S402. In the present embodiment, as an example, it will be described as a file format in which the scanned image is stored in PDF (Portable Document Format). In the case of PDF, it is possible to divide the image data into pages and save them. When a plurality of document manuscripts are scanned in S400, the image data corresponding to each document manuscript is put into one file as separate pages. It will be saved.

Ｓ４０４においてスキャン画像管理部３０４は、Ｓ４０３で作成したファイルを、ＬＡＮ１０２を通じて所定の送信先に送信する。 In S404, the scan image management unit 304 transmits the file created in S403 to a predetermined destination via LAN 102.

［インデックス抽出処理（Ｓ４０１）について］
図５は、Ｓ４０１のインデックス抽出処理の詳細を示すフローチャートである。インデックス抽出処理の詳細について図５を用いて説明する。インデックス抽出処理では、画像データの１ページに対して、向きの補正を行い、文書の種類を特定し、文書の種類に応じたインデックス抽出を行う処理を行う。 [About index extraction processing (S401)]
FIG. 5 is a flowchart showing details of the index extraction process of S401. The details of the index extraction process will be described with reference to FIG. In the index extraction process, the orientation of one page of image data is corrected, the type of the document is specified, and the index is extracted according to the type of the document.

Ｓ５００において画像処理部３０５は、画像データからスキャン画像の傾きの角度を検出し、検出した傾きだけ逆方向に画像を回転することでスキャン画像の傾きを補正する。傾き補正の対象となる傾きは、例えば、文書原稿のスキャン時にスキャナ部１２２の原稿フィーダ内のローラの摩耗などが原因でまっすぐに文書原稿が読み取られないことで発生する。または、スキャンされた文書原稿が印刷時にまっすぐ印刷されなかったために発生する。 In S500, the image processing unit 305 detects the tilt angle of the scanned image from the image data, and corrects the tilt of the scanned image by rotating the image in the opposite direction by the detected tilt. The tilt to be corrected for tilt occurs because, for example, when scanning a document document, the document document cannot be read straight due to wear of a roller in the document feeder of the scanner unit 122. Or, it occurs because the scanned document document was not printed straight at the time of printing.

傾きの角度の検出方法として、まず、画像データ内に含まれるオブジェクトを検出し、水平方向あるいは鉛直方向に隣り合うオブジェクト群を連結する。そして、連結されたオブジェクト群の中心位置を結んだ角度が、水平方向または鉛直方向からどれだけ傾いているかを導出して傾きを求める。なお、傾きの検出方法はこの方法に限られない。他にも例えば、画像データ内に含まれるオブジェクトの中心座標を取得し、０．１度単位で中心座標群を回転させて、中心座標群が水平方向あるいは垂直方向に並ぶ割合がもっとも高い角度をスキャン画像の傾きとして求める方法でもよい。スキャン画像の傾きを補正することによって、以降に行われる、回転補正、ブロックセレクション処理、およびＯＣＲ処理のそれぞれの処理精度を上げることができる。 As a method of detecting the tilt angle, first, objects included in the image data are detected, and a group of adjacent objects in the horizontal direction or the vertical direction are connected. Then, how much the angle connecting the center positions of the connected objects is tilted from the horizontal direction or the vertical direction is derived to obtain the tilt. The method of detecting the inclination is not limited to this method. In addition, for example, the center coordinates of the objects included in the image data are acquired, the center coordinate group is rotated in units of 0.1 degrees, and the angle at which the center coordinate groups are lined up in the horizontal or vertical direction is the highest. It may be a method of obtaining the inclination of the scanned image. By correcting the inclination of the scanned image, it is possible to improve the processing accuracy of each of the subsequent rotation correction, block selection processing, and OCR processing.

Ｓ５０１において画像処理部３０５は、Ｓ５００の処理の結果得られた傾き補正後のスキャン画像に対して、画像内の文字が正立する向きになるように、９０度単位で画像を回転補正する。回転補正の方法は、例えば、傾き補正後のスキャン画像を基準画像として、基準画像と、基準画像を９０回転した画像と、基準画像を１８０度回転した画像と、基準画像を２７０度回転した画像と、の４枚の画像を用意する。そして、それぞれの画像に対し、高速処理可能な簡易的なＯＣＲ処理を実行して、一定値以上の確信度で認識された文字の数が最も多い画像を回転補正後の画像とする方法がある。ただし、回転補正の方法はこの方法に限るものではない。なお以降のスキャン画像とは、特に断りが無い限りＳ５００およびＳ５０１で補正されたスキャン画像のことを指すものとする。 In S501, the image processing unit 305 rotates and corrects the scanned image obtained as a result of the processing of S500 in units of 90 degrees so that the characters in the image are oriented upright. The rotation correction method is, for example, using the scanned image after tilt correction as a reference image, a reference image, an image obtained by rotating the reference image by 90 degrees, an image obtained by rotating the reference image by 180 degrees, and an image obtained by rotating the reference image by 270 degrees. And, prepare four images. Then, there is a method in which a simple OCR process capable of high-speed processing is executed for each image, and the image having the largest number of characters recognized with a certain degree of certainty or higher is used as the image after rotation correction. .. However, the rotation correction method is not limited to this method. The subsequent scanned images shall refer to the scanned images corrected by S500 and S501 unless otherwise specified.

Ｓ５０２において画像処理部３０５は、スキャン画像に対しブロックセレクション処理を実行する。ブロックセレクション処理とは、画像を前景領域と背景領域に分類した上で、前景領域をテキストブロックとそれ以外のブロックに分割して、テキストブロックを検出する処理である。 In S502, the image processing unit 305 executes a block selection process on the scanned image. The block selection process is a process of classifying an image into a foreground area and a background area, dividing the foreground area into a text block and other blocks, and detecting the text block.

具体的には、白黒に二値化されたスキャン画像に対し輪郭線追跡を行って、黒画素輪郭で囲まれる画素の塊を抽出する。そして、面積が所定の大きさよりも大きい黒画素の塊については、内部にある白画素に対しても輪郭線追跡を行い白画素の塊を抽出し、さらに一定の大きさ以上の面積の白画素の塊の内部から再帰的に黒画素の塊を抽出する。こうして得られた黒画素の塊を前景領域と決定する。決定された前景領域は、大きさ及び形状で分類し異なる属性を持つ領域に分類する。例えば、縦横比が１に近く大きさが一定の範囲の前景領域を文字相当の画素塊とし、さらに近接する文字が整列良くグループ化され得る領域は文字列の領域（ＴＥＸＴ）と決定する。扁平な画素塊は線領域（ＬＩＮＥ）と決定する。一定大きさ以上でかつ矩形の白画素塊を整列よく内包する黒画素塊の占める範囲を表領域（ＴＡＢＬＥ）と決定する。不定形の画素塊が散在している領域を写真領域（ＰＨＯＴＯ）と決定する。そして、それ以外の形状の画素塊を図画領域（ＰＩＣＴＵＲＥ）と決定する。こうしてオブジェクトの属性毎に領域分割されたものの中から、文字属性を持つと決定された前景領域（ＴＥＸＴ）がテキストブロックとして検出される。 Specifically, contour line tracking is performed on a scanned image binarized to black and white, and a pixel block surrounded by a black pixel contour is extracted. Then, for a black pixel block having an area larger than a predetermined size, the outline of the white pixel inside is also traced to extract the white pixel block, and then the white pixel having an area larger than a certain size is extracted. A mass of black pixels is recursively extracted from the inside of the mass. The mass of black pixels thus obtained is determined as the foreground region. The determined foreground area is classified according to size and shape, and is classified into areas having different attributes. For example, a foreground region having an aspect ratio close to 1 and a constant size is defined as a pixel block corresponding to a character, and an region in which adjacent characters can be grouped in a well-aligned manner is determined to be a character string region (TEXT). A flat pixel block is determined as a line region (LINE). The table area (TABLE) is determined to be the range occupied by the black pixel clusters having a certain size or larger and containing the rectangular white pixel clusters in a well-aligned manner. A region in which irregular pixel clusters are scattered is determined as a photographic region (PHOTO). Then, a pixel block having a shape other than that is determined as a drawing area (PICTURE). The foreground area (TEXT) determined to have the character attribute is detected as a text block from the area divided for each attribute of the object.

図６は、ブロックセレクション処理の結果の一例を示す図である。図６（ａ）は回転補正後のスキャン画像を示す。図６（ｂ）は図６（ａ）のスキャン画像に対するブロックセレクション処理の結果を示しており、点線で示した矩形が前景領域を表している。なお、図６（ｂ）では、全ての前景領域の属性が決定されているが、属性については一部の前景領域に対してのみ表示している。本ステップで検出された各テキストブロックの情報（属性と各ブロックの位置およびサイズを示す情報）は、後続処理である、ＯＣＲ処理および類似度計算等で用いられる。 FIG. 6 is a diagram showing an example of the result of the block selection process. FIG. 6A shows a scanned image after rotation correction. FIG. 6B shows the result of the block selection processing for the scanned image of FIG. 6A, and the rectangle shown by the dotted line represents the foreground region. In FIG. 6B, the attributes of all the foreground areas are determined, but the attributes are displayed only for a part of the foreground areas. The information of each text block (information indicating the attribute and the position and size of each block) detected in this step is used in the subsequent processing such as OCR processing and similarity calculation.

本ステップのブロックセレクション処理ではテキストブロックだけを検出する。その理由は、文字列の位置はスキャン画像の構造をよく表現し、インデックス情報と密接に関連するためである。したがって、写真領域や表領域等の他の属性を持つと判定されたブロックの情報を後続の処理で利用することを排除するものではない。 In the block selection process of this step, only the text block is detected. The reason is that the position of the character string expresses the structure of the scanned image well and is closely related to the index information. Therefore, it does not exclude the use of the information of the block determined to have other attributes such as the photo area and the table area in the subsequent processing.

Ｓ５０３において画像処理部３０５は、ＨＤＤ１２０からインデックス抽出ルールを取得しＲＡＭ１１９に展開する。 In S503, the image processing unit 305 acquires an index extraction rule from the HDD 120 and expands it into the RAM 119.

図７は、インデックス抽出ルール（以下単に、抽出ルールとよぶ）の一部を示す図である。図７は、抽出ルールに含まれる帳票ＩＤとして「０００１」が付与され登録されている抽出ルールのレコードを示している。抽出ルールでは、登録されている文書１つについて、「文書ＩＤ」と、「サムネイル」と、「文書識別情報」と、「インデックス情報」との各データが、レコード単位で対応付けられている。抽出ルールは登録済み文書の数だけこれらの組み合わせ（レコード）を保持する。文書ＩＤは、文書の種類を表すユニークなＩＤである。 FIG. 7 is a diagram showing a part of an index extraction rule (hereinafter, simply referred to as an extraction rule). FIG. 7 shows a record of an extraction rule in which "0001" is assigned and registered as a form ID included in the extraction rule. In the extraction rule, each data of "document ID", "thumbnail", "document identification information", and "index information" is associated with each registered document in record units. The extraction rule keeps these combinations (records) as many as the number of registered documents. The document ID is a unique ID that represents the type of document.

文書識別情報は、登録されている文書のスキャン画像に対してブロックセレクション処理を実行した結果得られるテキストブロックの位置およびサイズの情報である。文書識別情報は、文書の種類を特定するための情報であり後述する文書マッチングで使用される。 The document identification information is information on the position and size of a text block obtained as a result of executing a block selection process on a scanned image of a registered document. The document identification information is information for specifying the type of document and is used in document matching described later.

インデックス情報は、スキャン画像に含まれるインデックスを抽出するための情報である。インデックスは、ファイルに付与するファイル名またはメタデータを決定するために使用される。インデックス情報は、具体的には、登録されている文書内における、それぞれの項目の文字列（インデックス）が含まれるテキストブロックの座標およびサイズの情報が含まれる。図７の「インデックス情報」の画像７０１はそれぞれの項目が含まれるテキストブロックの位置およびサイズを画像上の座標に配置して図示したものである。また、インデックス情報にはファイル名を生成するために用いられるインデックスとその順番を示す情報、メタデータとして付与するための情報が含まれる。 The index information is information for extracting an index included in the scanned image. The index is used to determine the filename or metadata to give to the file. Specifically, the index information includes information on the coordinates and size of the text block including the character string (index) of each item in the registered document. The image 701 of the "index information" of FIG. 7 is illustrated by arranging the positions and sizes of the text blocks including the respective items at the coordinates on the image. In addition, the index information includes information indicating the index used to generate the file name and its order, and information to be assigned as metadata.

インデックス情報の「ファイル名ルール」には、タイトル（title）、発行元会社名（sender）、帳票番号（number）の項目のインデックスを、セパレータであるアンダースコアでつなげてファイル名を生成することが示されている。また、「メタデータ」には合計金額（total_price）の項目のインデックスをメタデータとして利用することが示されている。つまり、所定の項目のインデックスを抽出することで、ユーザにレコメンドするファイル名の生成、およびメタデータの抽出をすることができる。 In the "file name rule" of index information, the index of the item of title (title), issuer company name (sender), and form number (number) can be connected with an underscore which is a separator to generate a file name. It is shown. In addition, "metadata" indicates that the index of the item of total amount (total_price) is used as metadata. That is, by extracting the index of a predetermined item, it is possible to generate a file name recommended to the user and extract metadata.

なお、本実施形態では、抽出されたインデックスをファイル名またはメタデータとして利用する例を示しているが、他のプロパティ情報であるファイルの送信先のフォルダ情報を決定するためのルールを保持してもよい。その場合も、インデックスを用いて生成されたプロパティ情報がＳ４０２でユーザにレコメンドされて、Ｓ４０３でプロパティ情報がスキャン画像のファイルに設定される。 In this embodiment, an example of using the extracted index as a file name or metadata is shown, but a rule for determining the folder information of the destination of the file, which is other property information, is retained. May be good. Also in that case, the property information generated by using the index is recommended to the user in S402, and the property information is set in the scanned image file in S403.

また、登録されている文書の抽出ルールとして、図７の「サムネイル」に示したように、登録された文書に対応するスキャン画像のサムネイルを一緒に保持してもよい。 Further, as a rule for extracting the registered document, as shown in the “thumbnail” of FIG. 7, the thumbnail of the scanned image corresponding to the registered document may be held together.

Ｓ５０４において画像処理部３０５は、スキャン画像に対して文書マッチングを実行する。文書マッチングでは、スキャン画像を得るためにスキャンされた文書（入力文書）と同じ種類の文書が、抽出ルールに登録されている文書群にあるかどうかを判定する。そして、入力文書と同じ種類の文書が登録されていると判定された場合、その種類を特定する処理である。 In S504, the image processing unit 305 executes document matching on the scanned image. In document matching, it is determined whether or not a document of the same type as the document (input document) scanned to obtain a scanned image is in the document group registered in the extraction rule. Then, when it is determined that a document of the same type as the input document is registered, it is a process of specifying the type.

本実施形態では、まず、スキャン画像と、抽出ルールに登録されている夫々の文書と、を１対１で比較し、含まれるテキストブロックの形状および配置がどれだけ類似しているかを表す類似度の算出を行う。類似度の算出の方法として、例えば、スキャン画像のテキストブロック全体と、登録されている文書のテキストブロック全体で位置合わせを行う。そして、スキャン画像の各テキストブロックと登録されている文書の各テキストブロックとが重なる面積の総和の二乗（値Ａとする）を求める。さらにスキャン画像のテキストブロックの面積の総和と登録されている文書のテキストブロックの面積の総和との積（値Ｂとする）を求める。そして、値Ａを値Ｂで割った値を類似度とする方法がある。この類似度の算出を、スキャン画像と抽出ルールに登録されている全ての文書との間で行う。 In the present embodiment, first, the scanned image and each document registered in the extraction rule are compared on a one-to-one basis, and the degree of similarity indicating how similar the shapes and arrangements of the included text blocks are. Is calculated. As a method of calculating the similarity, for example, the entire text block of the scanned image and the entire text block of the registered document are aligned. Then, the square of the sum of the areas where each text block of the scanned image and each text block of the registered document overlap (value A) is obtained. Further, the product (value B) of the total area of the text blocks of the scanned image and the total area of the text blocks of the registered document is obtained. Then, there is a method in which the value obtained by dividing the value A by the value B is used as the similarity. This similarity is calculated between the scanned image and all the documents registered in the extraction rule.

そして、所定値以上の類似度であり、かつ、最も類似度が高い、抽出ルールに登録されている文書が、スキャンされた入力文書と同じ種類の文書と特定される。また、抽出ルールに、類似度が所定値以上の文書が無かった場合は、入力文書と同じ種類の文書は、抽出ルールには登録されていないと判定される。 Then, the document registered in the extraction rule having a degree of similarity equal to or higher than a predetermined value and having the highest degree of similarity is identified as a document of the same type as the scanned input document. If there is no document having a similarity equal to or higher than a predetermined value in the extraction rule, it is determined that the document of the same type as the input document is not registered in the extraction rule.

Ｓ５０５において画像処理部３０５は、Ｓ５０４で実行した文書マッチングの結果、入力文書と同じ種類の文書が抽出ルールに登録されていたかを判定する。入力文書が登録済み文書でなかった場合（Ｓ５０５がＮＯ）、本フローチャートの処理を終了する。登録済み文書でなかった場合は、前述したように新たにＩＤが付されて、Ｓ５０２で検出したテキストブロックのレイアウト情報等が抽出ルールに登録される。この場合、Ｓ４０２ではファイル名およびメタデータのユーザにレコメンドはされずに、表示制御部３０１は、ユーザによるファイル名の入力を受け付ける。表示制御部３０１は表示・操作部１２３を介してユーザから入力を受け付けると、入力されたファイル名がスキャン画像のファイル名として決定される。 In S505, the image processing unit 305 determines whether a document of the same type as the input document is registered in the extraction rule as a result of the document matching executed in S504. If the input document is not a registered document (NO in S505), the processing of this flowchart ends. If it is not a registered document, a new ID is added as described above, and the layout information of the text block detected in S502 is registered in the extraction rule. In this case, S402 does not recommend the file name and the metadata to the user, and the display control unit 301 accepts the input of the file name by the user. When the display control unit 301 receives an input from the user via the display / operation unit 123, the input file name is determined as the file name of the scanned image.

入力文書と同じ種類の文書が登録されている場合（Ｓ５０５がＹＥＳ）、Ｓ５０６において画像処理部３０５は、Ｓ５０４で入力文書と同じ種類と特定された抽出ルールの文書と同じ文書ＩＤを、スキャン画像に付与する。 When a document of the same type as the input document is registered (YES in S505), the image processing unit 305 scans the image with the same document ID as the document of the extraction rule specified in S504 as the same type as the input document in S506. Give to.

Ｓ５０７において画像処理部３０５は、Ｓ５０６で付与された文書ＩＤに紐づいた抽出ルールに基づいて、スキャン画像内における抽出対象（処理対象）の項目のインデックスのテキストブロックを推定するインデックスブロック推定処理を実行する。タイトル、発行元会社名、帳票番号等の項目を示す文字列（インデックス）が含まれるテキストブロックをインデックスブロックと呼ぶことがある。インデックスブロック推定処理の詳細については、後述する。 In S507, the image processing unit 305 performs an index block estimation process for estimating the text block of the index of the item to be extracted (processed target) in the scanned image based on the extraction rule associated with the document ID assigned in S506. Run. A text block containing a character string (index) indicating items such as a title, a issuing company name, and a form number may be called an index block. The details of the index block estimation process will be described later.

Ｓ５０８において画像処理部３０５は、Ｓ５０７で推定された夫々の項目のインデックスブロック群に対して、部分的なＯＣＲを実行し、各インデックスブロックに対応する文字列をインデックスとして抽出する。 In S508, the image processing unit 305 executes partial OCR on the index block group of each item estimated in S507, and extracts the character string corresponding to each index block as an index.

［インデックスブロック推定処理（Ｓ５０７）について］
図８は、Ｓ５０７のインデックスブロック推定処理のフローチャートである。インデックスブロック推定処理の詳細について図８を用いて説明する。なお、以下、登録文書とは、Ｓ５０３で取得した抽出ルールにおいて登録されている文書のうち、Ｓ５０６でスキャン画像に付与された文書ＩＤに対応する文書のことをいう。本フローチャートの説明では、登録文書は図７の文書ＩＤ「０００１」の文書であるものとして説明する。 [About index block estimation process (S507)]
FIG. 8 is a flowchart of the index block estimation process of S507. The details of the index block estimation process will be described with reference to FIG. Hereinafter, the registered document refers to a document corresponding to the document ID given to the scanned image in S506 among the documents registered in the extraction rule acquired in S503. In the description of this flowchart, the registered document will be described as assuming that the document has the document ID “0001” of FIG.

Ｓ８００において画像処理部３０５は、抽出ルールから、Ｓ５０６で付与された文書ＩＤに紐づいた文書識別情報を取得する。そして、画像処理部３０５は、スキャン画像内の全体のテキストブロックと、登録文書の全体のテキストブロックとで全体の位置合わせを行う。 In S800, the image processing unit 305 acquires the document identification information associated with the document ID assigned in S506 from the extraction rule. Then, the image processing unit 305 performs overall alignment with the entire text block in the scanned image and the entire text block of the registered document.

Ｓ４００で取得されたスキャン画像の入力文書は、登録文書と同じ種類の文書であり、夫々の項目は登録文書の項目と同じ座標に印刷される。しかし、印刷およびスキャンのタイミングまたは印刷時の機器による違い等により、スキャン画像上のテキストブロックの位置と登録文書のテキストブロックの位置とにズレが生じてしまうことがある。そこで、本ステップではそのズレの影響を軽減して以降の処理の精度を向上させるため、全体の位置合わせを行う。なお、本実施形態では、図５のＳ５００で傾き補正を行っているため、本ステップの全体の位置合わせでは、スキャン画像上のテキストブロック全体をシフト（平行移動）する補正のみを行う例について説明する。 The input document of the scanned image acquired in S400 is a document of the same type as the registered document, and each item is printed at the same coordinates as the item of the registered document. However, the position of the text block on the scanned image and the position of the text block of the registered document may be misaligned due to the timing of printing and scanning or the difference depending on the device at the time of printing. Therefore, in this step, in order to reduce the influence of the deviation and improve the accuracy of the subsequent processing, the entire alignment is performed. In this embodiment, since the tilt correction is performed in S500 of FIG. 5, an example in which only the correction of shifting (translating) the entire text block on the scanned image is performed in the overall alignment of this step will be described. do.

全体の位置合わせでは、登録文書のテキストブロックに対してどれだけスキャン画像のテキストブロックがシフトしているかというシフト量を算出して、シフト量だけスキャン画像の各テキストブロックがシフトするように座標の修正を行う。 In the overall alignment, the shift amount of how much the text block of the scanned image is shifted with respect to the text block of the registered document is calculated, and the coordinates are adjusted so that each text block of the scanned image is shifted by the shift amount. Make corrections.

図９は、スキャン画像のテキストブロックと登録文書のテキストブロックとを同じ座標系に描画した画像の一部分を切り出した図である。図９を用いて全体の位置合わせのためのシフト量の算出の具体的な手順を説明する。図９において、実線の矩形はスキャン画像内のテキストブロック群のうちから選択された１つのテキストブロック９００を示し、破線の矩形は、テキストブロック９００の周囲にある登録文書のテキストブロック９０１〜９０３を示している。また、図９において、一点鎖線の円９０４は、スキャン画像のテキストブロック９００の左上頂点を中心に一定距離を半径とした範囲を示している。 FIG. 9 is a diagram obtained by cutting out a part of an image in which the text block of the scanned image and the text block of the registered document are drawn in the same coordinate system. A specific procedure for calculating the shift amount for overall alignment will be described with reference to FIG. In FIG. 9, the solid line rectangle indicates one text block 900 selected from the text block group in the scanned image, and the broken line rectangle indicates the text blocks 901 to 903 of the registered document surrounding the text block 900. Shown. Further, in FIG. 9, the alternate long and short dash line circle 904 indicates a range centered on the upper left apex of the text block 900 of the scanned image and having a certain distance as a radius.

シフト量の算出のために、スキャン画像の各テキストブロックと対応する候補となる登録文書のテキストブロック（ペアブロックとよぶ）を決定する。ここでスキャン画像のテキストブロックのペアブロックの決定について説明する。 In order to calculate the shift amount, a text block (called a pair block) of a registered document corresponding to each text block of the scanned image is determined. Here, the determination of the pair block of the text block of the scanned image will be described.

初めに、登録文書のテキストブロック９０１〜９０３のうち、スキャン画像内のテキストブロック群から選択された１つのテキストブロック９００の左上頂点を中心とする円９０４の中に、左上頂点が入るテキストブロックを探す。図９では、テキストブロック９０１、９０２が該当することになる。次に、スキャン画像のテキストブロック９００と、登録文書のテキストブロック９０１、９０２それぞれとのオーバラップ率を求める。オーバラップ率は、スキャン画像のテキストブロックと登録画像のテキストブロックとの左上頂点同士を合わせて、両テキストブロックの共通部分の面積を算出する。そして、（共通部分の面積）／（両テキストブロックのうち大きい方の面積）によって値を求めてオーバラップ率とする。 First, among the text blocks 901 to 903 of the registered document, a text block having the upper left vertex in the circle 904 centered on the upper left vertex of one text block 900 selected from the text block group in the scanned image is inserted. seek. In FIG. 9, the text blocks 901 and 902 correspond to each other. Next, the overlap rate between the text block 900 of the scanned image and the text blocks 901 and 902 of the registered document is obtained. For the overlap rate, the area of the common portion of both text blocks is calculated by combining the upper left vertices of the text block of the scanned image and the text block of the registered image. Then, the value is calculated by (area of common part) / (area of both text blocks, whichever is larger) and used as the overlap rate.

オーバラップ率が、所定の条件を満たす登録文書のテキストブロックを、ペアブロックとする。所定の条件は、例えば、スキャン画像のテキストブロックとのオーバラップ率が、最大オーバラップ率に係数αを乗算した値以上であり、かつ、所定の閾値以上であることである。この場合において、係数αは最大オーバラップ率と近いオーバラップ率を持つ組合せを選択するためのもので、例えば０．５〜０．８のような１．０未満の値とする。また、所定の閾値は最低ラインを規定するものであり、例えば０．３〜０．７のような１．０未満の値とする。 A text block of a registered document whose overlap rate satisfies a predetermined condition is defined as a pair block. The predetermined condition is, for example, that the overlap rate of the scanned image with the text block is equal to or greater than the value obtained by multiplying the maximum overlap rate by the coefficient α and equal to or greater than a predetermined threshold value. In this case, the coefficient α is for selecting a combination having an overlap rate close to the maximum overlap rate, and is set to a value less than 1.0 such as 0.5 to 0.8. Further, the predetermined threshold value defines the minimum line, and is set to a value less than 1.0 such as 0.3 to 0.7.

図９では、登録文書のテキストブロック９０１、９０２のうち、スキャン画像のテキストブロック９００と形状の近い、テキストブロック９０１のみがペアブロックとして選択される。所定の条件を満たすテキストブロックが他にもあればペアブロックは複数選択されることもある。このように、スキャン画像内から選択された１つのテキストブロックに対応するペアブロック群のそれぞれに対して、スキャン画像内から選択されたテキストブロックとの左上頂点のＸ方向およびＹ方向の差分量（シフト量）を算出する。そして、差分量をシフト量ヒストグラムに投票する。この場合のヒストグラムのビンの範囲は任意でよい。 In FIG. 9, among the text blocks 901 and 902 of the registered document, only the text block 901 having a shape similar to that of the text block 900 of the scanned image is selected as the pair block. If there are other text blocks that satisfy the predetermined conditions, multiple pair blocks may be selected. In this way, for each of the pair block groups corresponding to one text block selected from the scanned image, the difference amount in the X direction and the Y direction of the upper left vertex from the text block selected from the scanned image ( Shift amount) is calculated. Then, the difference amount is voted in the shift amount histogram. The range of the histogram bins in this case may be arbitrary.

図９の場合、テキストブロック９００については、登録文書のテキストブロック９０１とのの左上頂点のＸ方向およびＹ方向の差分量（シフト量）が算出されて、シフト量がシフト量ヒストグラムに投票される。 In the case of FIG. 9, for the text block 900, the difference amount (shift amount) in the X direction and the Y direction of the upper left vertex of the registered document from the text block 901 is calculated, and the shift amount is voted in the shift amount histogram. ..

スキャン画像内のテキストブロックに対応するペアブロック群を決定し、シフト量ヒストグラムに投票するまでの処理を、スキャン画像の全てテキストブロックに対してそれぞれ行う。そして、最終的に得られたシフト量ヒストグラムにおける最大のピーク点となる位置を決定する。決定された位置が示すシフト量を全体の位置合わせのシフト量とする。 The pair block group corresponding to the text block in the scanned image is determined, and the process of voting for the shift amount histogram is performed for all the text blocks of the scanned image. Then, the position of the maximum peak point in the finally obtained shift amount histogram is determined. The shift amount indicated by the determined position is used as the shift amount for overall alignment.

なお、ノイズの影響が懸念される場合は、生成したシフト量ヒストグラムに対してスムージングを掛けてもよい。また、最大となるピーク点以外の局所的なピーク点についても、シフト量の候補として選び、その候補の中から全体の位置合わせに用いるシフト量を選んでもよい。例えば、シフト量の各候補について、スキャン画像のテキストブロックの座標をシフトさせて、図５のＳ５０４の文書マッチングと同様の類似度算出を行い、最も類似度が高くなる候補を、最終的なシフト量として決定してもよい。 If there is concern about the influence of noise, smoothing may be applied to the generated shift amount histogram. Further, a local peak point other than the maximum peak point may be selected as a shift amount candidate, and the shift amount used for overall positioning may be selected from the candidates. For example, for each candidate of the shift amount, the coordinates of the text block of the scanned image are shifted, the similarity calculation similar to the document matching in S504 of FIG. 5 is performed, and the candidate having the highest similarity is finally shifted. It may be determined as a quantity.

上記の手順で決定されたシフト量だけ、スキャン画像の各テキストブロックの座標をシフトすることで、位置合わせされたスキャン画像のテキストブロック群を得ることができる。なお、テキストブロックの位置合わせの方法は上記の方法に限るものではない。スキャン画像全体のシフト（平行移動）に関する補正のみを行う例について説明したが、印刷およびスキャンのズレとして、倍率に関するズレが想定される場合には、シフト量だけでなく、倍率のズレも考慮した位置合わせを行ってもよい。 By shifting the coordinates of each text block of the scanned image by the shift amount determined in the above procedure, the text block group of the aligned scanned image can be obtained. The method of aligning the text block is not limited to the above method. An example of performing only correction related to the shift (translation) of the entire scanned image has been described, but if a shift related to the magnification is expected as a shift between printing and scanning, not only the shift amount but also the shift in the magnification is taken into consideration. Alignment may be performed.

なお以下のステップにおけるスキャン画像またはスキャン画像のテキストブロック群は、この全体の位置合わせされたスキャン画像またはテキストブロック群を指すものとする。 The scanned image or the text block group of the scanned image in the following steps shall refer to the entire aligned scanned image or the text block group.

次に、Ｓ５０６で付与された文書ＩＤに紐づいた登録文書のインデックス情報を取得する。そしてＳ８０１でインデックス情報に含まれるインデックスの項目のいずれかを処理対象に選んでＳ８０１〜Ｓ８１０を繰り返す。そして、スキャン画像のテキストブロック群から、処理対象の項目のテキストブロックを推定する処理を行う。処理対象の項目に対する処理が終了すると、再度、未処理の項目の中から処理対象の項目が選択される。 Next, the index information of the registered document associated with the document ID given in S506 is acquired. Then, in S801, any one of the index items included in the index information is selected as the processing target, and S801 to S810 are repeated. Then, a process of estimating the text block of the item to be processed is performed from the text block group of the scanned image. When the processing for the item to be processed is completed, the item to be processed is selected again from the unprocessed items.

Ｓ８０１において画像処理部３０５は、登録文書のインデックス情報に登録されている項目のうち未処理のインデックスの項目を１つ選択して処理対象の項目とする。本実施形態では、図７のインデックス情報に保持されている、タイトル（title）、発行元会社名（sender）、帳票番号（number）、合計金額（total_price）の項目の何れかが処理対象として選択される。 In S801, the image processing unit 305 selects one unprocessed index item from the items registered in the index information of the registered document and sets it as the item to be processed. In the present embodiment, any of the items of the title (title), the issuing company name (sender), the form number (number), and the total amount (total_price), which are stored in the index information of FIG. 7, is selected as the processing target. Will be done.

Ｓ８０２において画像処理部３０５は、処理対象の項目の「部分パターン」を取得する。部分パターンには、登録文書に含まれるテキストブロックの一部のレイアウト（部分レイアウト）の情報と、部分レイアウトを含む範囲（部分パターン範囲）の情報と、が含まれる。 In S802, the image processing unit 305 acquires a "partial pattern" of the item to be processed. The partial pattern includes information on a partial layout (partial layout) of the text block included in the registered document and information on a range including the partial layout (partial pattern range).

図１０（ａ）は、図７で文書ＩＤ「０００１」として登録されている登録文書における、それぞれの項目のインデックスブロックの位置およびサイズを図示したものである。図１０（ａ）の破線の矩形は、タイトル、帳票番号、合計金額、発行元会社名のそれぞれの項目のインデックスブロック１０００〜１００３を表している。 FIG. 10A illustrates the position and size of the index block of each item in the registered document registered as the document ID “0001” in FIG. 7. The broken line rectangle in FIG. 10A represents the index blocks 1000 to 1003 of each item of the title, the form number, the total amount, and the issuing company name.

図１０（ｂ）は、「発行元会社名（sender）」の項目の部分パターンを示す図である。図１０（ｂ）の一点鎖線の矩形で表される範囲は、「発行元会社名（sender）」の項目の部分パターン範囲１００６を示す。部分パターン範囲１００６は、「発行元会社名（sender）」の項目のテキストブロックであるインデックスブロック１００３を基準として予め設定された値を使って決定される。 FIG. 10B is a diagram showing a partial pattern of the item of “sender”. The range represented by the rectangle of the alternate long and short dash line in FIG. 10B indicates the partial pattern range 1006 of the item of “sender”. The partial pattern range 1006 is determined using a preset value with reference to the index block 1003, which is a text block of the item "sender".

テキストブロック１００４、１００５は、登録文書における、部分パターン範囲１００６に少なくとも一部が含まれるテキストブロックを表している。このテキストブロック１００４、１００５と、インデックスブロック１００３で表される登録文書内の部分的なレイアウトが、発行元会社名の項目の部分レイアウトである。部分レイアウトは、処理対象の項目のテキストブロックと、処理対象の項目のテキストブロック以外の少なくとも１つのテキストブロックとで表される。レイアウトとは、夫々のテキストブロックの位置情報と、夫々のテキストブロックのサイズと、を表す情報である。 The text blocks 1004 and 1005 represent text blocks in which at least a part of the partial pattern range 1006 is included in the registered document. The partial layout in the registered document represented by the text blocks 1004 and 1005 and the index block 1003 is the partial layout of the item of the issuing company name. The partial layout is represented by a text block of the item to be processed and at least one text block other than the text block of the item to be processed. The layout is information representing the position information of each text block and the size of each text block.

発行元会社名の項目の部分パターンに含まれる情報として、部分パターン範囲１００６と、インデックスブロック１００３とテキストブロック１００４および１００５とからなる部分レイアウトと、が決定される。このように、登録文書の夫々の項目に対応する部分パターンが決定されて記憶されている。 As the information included in the partial pattern of the item of the issuing company name, the partial pattern range 1006 and the partial layout including the index block 1003 and the text blocks 1004 and 1005 are determined. In this way, the partial patterns corresponding to each item of the registration document are determined and stored.

詳細は後述するが、本実施形態では、部分レイアウトと配置が類似または一致しているスキャン画像内の位置を探索して、スキャン画像内における処理対象の項目のテキストブロックを推定する。 Although details will be described later, in the present embodiment, a text block of an item to be processed in the scanned image is estimated by searching for a position in the scanned image whose arrangement is similar to or matching the partial layout.

図１０（ｃ）は、「タイトル(title)」の項目の部分パターンを示す図である。タイトルについても同様に、部分パターン範囲１００７と、タイトルのインデックスブロック１０００と部分パターン範囲１００７に含まれるテキストブロック１００１、１００８〜１０１３とからなる部分レイアウトと、が部分パターンとして決定されている。 FIG. 10 (c) is a diagram showing a partial pattern of the item of “title”. Similarly, for the title, a partial pattern range 1007 and a partial layout including the index block 1000 of the title and the text blocks 1001 and 1008-1013 included in the partial pattern range 1007 are determined as the partial pattern.

なお、部分パターン範囲１００７のサイズは、図１０（ｂ）の部分パターン範囲１００６と比べてサイズが異なる。このように項目の性質に応じて部分パターンサイズは異ならせてもよい。または、部分パターン範囲のサイズは、全ての項目で共通のサイズが用いられてもよい。部分パターン範囲のサイズの決定方法については実施形態２で説明する。 The size of the partial pattern range 1007 is different from that of the partial pattern range 1006 of FIG. 10B. In this way, the partial pattern size may be different depending on the nature of the item. Alternatively, as the size of the partial pattern range, a common size may be used for all items. The method of determining the size of the partial pattern range will be described in the second embodiment.

なお、部分パターンは、文書原稿をスキャンした後に行われるインデックス抽出処理の実行が行われるごとに決定される必要はない。例えば、文書の登録時において、項目ごとに部分パターンを決定し、図７で示した抽出ルールの一部として予め記憶させてもよい。つまり、Ｓ８０２では、記憶されている処理対象の項目の部分パターンが取得されればよい。 The partial pattern does not need to be determined each time the index extraction process performed after scanning the document document is executed. For example, when registering a document, a partial pattern may be determined for each item and stored in advance as a part of the extraction rule shown in FIG. 7. That is, in S802, it suffices to acquire the stored partial pattern of the item to be processed.

次のＳ８０３およびＳ８０４では、処理対象の項目の部分レイアウトとの一致度が高い領域のある、スキャン画像内の位置（ＸＹ候補位置）を決定する。ＸＹ候補位置の決定方法としては、例えば、テンプレートマッチングのようにスキャン画像内の探索範囲に対して部分パターンを走査して一致度を算出することで候補位置を推定してもよい。本実施形態では計算量を抑制させるため、探索範囲におけるＹ方向の候補となる位置を決定してＹ方向の位置（Ｙ位置）を絞り込む。その上で、Ｙ位置の候補（Ｙ候補位置）群それぞれにおいて、Ｘ方向に部分パターンを走査してＸＹ候補位置を決定することで、計算量を抑える方法を説明する。 In the next S803 and S804, a position (XY candidate position) in the scanned image having a region having a high degree of coincidence with the partial layout of the item to be processed is determined. As a method of determining the XY candidate position, for example, the candidate position may be estimated by scanning a partial pattern with respect to the search range in the scanned image and calculating the degree of matching as in template matching. In the present embodiment, in order to suppress the amount of calculation, a candidate position in the Y direction in the search range is determined and the position in the Y direction (Y position) is narrowed down. Then, in each of the Y position candidate (Y candidate position) groups, a method of suppressing the amount of calculation by scanning a partial pattern in the X direction to determine the XY candidate position will be described.

Ｓ８０３において画像処理部３０５は、スキャン画像のテキストブロック群から、登録文書における処理対象の項目の部分パターンのテキストブロックに類似するＹ候補位置群を決定する。 In S803, the image processing unit 305 determines a Y candidate position group similar to the text block of the partial pattern of the item to be processed in the registered document from the text block group of the scanned image.

図１１は、Ｙ候補位置群の決定処理を説明するための図である。処理対象の項目が発行元会社名（sender）であるものとして説明を行う。 FIG. 11 is a diagram for explaining the determination process of the Y candidate position group. The explanation is made assuming that the item to be processed is the issuing company name (sender).

図１１（ａ）は、登録文書における発行元会社名（sender）の部分パターンを示す図であり図１０（ｂ）と同様の図である。図１１（ｂ）は、スキャン画像であり破線の矩形は、位置合わせがされたテキストブロック群を表している。また、図１１（ｂ）で示したスキャン画像が示す文書は、登録文書「０００１」と同じ種類の文書として判定された文書であるが、図７の登録文書に比べて表構造内の項目行数が増えている例を示している。よって、スキャン画像における推定されるべき発行元会社名（sender）のインデックスブロック１１０１が、登録文書における発行元会社名（sender）のインデックスブロック１００２の位置と比較して下方向にシフトしてしまっている。 FIG. 11A is a diagram showing a partial pattern of the issuing company name (sender) in the registration document, and is the same diagram as in FIG. 10B. FIG. 11B is a scanned image, and the broken line rectangle represents a group of aligned text blocks. Further, the document indicated by the scanned image shown in FIG. 11B is a document determined to be the same type of document as the registered document “0001”, but the item line in the table structure is compared with that of the registered document of FIG. It shows an example where the number is increasing. Therefore, the index block 1101 of the issuing company name (sender) to be estimated in the scanned image is shifted downward as compared with the position of the index block 1002 of the issuing company name (sender) in the registration document. There is.

図１１（ｃ）は、発行元会社名の部分パターンに含まれる部分レイアウトを表すテキストブロック１００３〜１００５のうちの１つのテキストブロック１００３を、スキャン画像のテキストブロック群と同じ座標系に重畳させた図である。Ｙ候補位置群の決定について、部分パターン内のテキストブロック１００３に注目して図１１（ｃ）を用いて説明する。 In FIG. 11C, one of the text blocks 1003 to 1005 representing the partial layout included in the partial pattern of the issuing company name, the text block 1003, is superimposed on the same coordinate system as the text block group of the scanned image. It is a figure. The determination of the Y candidate position group will be described with reference to FIG. 11C, focusing on the text block 1003 in the partial pattern.

図１１（ｃ）の、一点鎖線の矩形で表される探索範囲１１００は、処理対象の項目のＹ候補位置群を決定するために探索する範囲を表している。破線の矩形で表されるテキストブロック１１０１〜１１０９は、図１１（ｂ）に示すスキャン画像のテキストブロックのうち、矩形の中心が探索範囲１１００の中にあるテキストブロックである。 The search range 1100 represented by the rectangular chain line in FIG. 11C represents a range to be searched for in order to determine the Y candidate position group of the item to be processed. The text blocks 1101 to 1109 represented by the broken line rectangles are the text blocks of the scanned image shown in FIG. 11B, in which the center of the rectangle is within the search range 1100.

Ｙ候補位置群の決定には、はじめに、部分レイアウトに含まれる１つのテキストブロック（図１１（ｃ）ではテキストブロック１００３）が選択される。そして選択されたテキストブロックをスキャン画像のテキストブロック群と同じ座標系に重畳し、探索範囲内のスキャン画像のテキストブロック（図１１（ｃ）ではテキストブロック１１０１〜１１０９）との矩形の中心のＹ位置の差分量をそれぞれ算出する。そして、算出された差分量がＹ方向のシフト量ヒストグラムに投票される。シフト量ヒストグラムのビンの範囲は任意でよい。 To determine the Y candidate position group, first, one text block included in the partial layout (text block 1003 in FIG. 11C) is selected. Then, the selected text block is superimposed on the same coordinate system as the text block group of the scanned image, and the Y of the center of the rectangle with the text block of the scanned image (text blocks 1101 to 1109 in FIG. 11C) within the search range. Calculate the difference amount of each position. Then, the calculated difference amount is voted in the shift amount histogram in the Y direction. The range of bins in the shift amount histogram may be arbitrary.

図１２は、Ｙ方向のシフト量ヒストグラムの例を示す図である。図１２（ａ）は、図１１（ｃ）における部分パターンのテキストブロック１００３と、スキャン画像のテキストブロック１１０２とのＹ位置の差分量を投票した後のシフト量ヒストグラムである。ｈは基準からのＹ方向の探索範囲の絶対値の上限を示している。テキストブロック１００３とテキストブロック１１０２とのＹ方向の差分量に従い、位置１２００に投票が行われている。同様に、部分パターンに含まれる１つのテキストブロックと、スキャン画像の探索範囲内の全てのテキストブロックとのＹ中心の差分量に応じた投票が行われる。この投票を、部分パターン内の全テキストブロックに対して行う。つまり、部分パターンのテキストブロック１００４、１００５についても、探索範囲内のテキストブロック１１０１〜１１０９とのＹ中心の差分量が算出されてシフト量ヒストグラムに投票される。そして、Ｙ方向のシフト量ヒストグラムを完成させる。なお、ノイズの影響が懸念される場合は、Ｙ方向の生成したシフト量ヒストグラムに対してスムージングを掛けてもよい。 FIG. 12 is a diagram showing an example of a shift amount histogram in the Y direction. FIG. 12A is a shift amount histogram after voting for the difference amount of the Y position between the text block 1003 of the partial pattern in FIG. 11C and the text block 1102 of the scanned image. h indicates the upper limit of the absolute value of the search range in the Y direction from the reference. Voting is performed at the position 1200 according to the difference amount in the Y direction between the text block 1003 and the text block 1102. Similarly, voting is performed according to the difference amount of the Y center between one text block included in the partial pattern and all the text blocks within the search range of the scanned image. This vote is made for all text blocks in the partial pattern. That is, for the text blocks 1004 and 1005 of the partial pattern, the difference amount of the Y center from the text blocks 1101 to 1109 in the search range is calculated and voted in the shift amount histogram. Then, the shift amount histogram in the Y direction is completed. If there is concern about the influence of noise, smoothing may be applied to the shift amount histogram generated in the Y direction.

図１２（ｂ）は最終的に生成されるＹ方向のシフト量ヒストグラムである。シフト量ヒストグラムの生成が完了した後、ヒストグラム内の位置１２０１〜１２０６に示すようなピーク点を決定し、各ピーク点のビンに応じたＹ方向のシフト量に基づきＹ候補位置群を決定する。 FIG. 12B is a finally generated shift amount histogram in the Y direction. After the generation of the shift amount histogram is completed, the peak points as shown in the positions 1201 to 1206 in the histogram are determined, and the Y candidate position group is determined based on the shift amount in the Y direction according to the bin of each peak point.

なお、図１１（ｃ）のＹ候補位置群を決定するための探索範囲１１００は、部分パターンのインデックスブロックの位置を基準に、あらかじめ設定された値で自動決定される。なお、探索範囲のサイズについては、全ての項目で共通の範囲を使用してもよいし、処理対象の項目の属性に応じて決定してもよい。例えば、タイトルのインデックスブロックは文書内で固定の位置にあることが多い。よって、処理対象の項目がタイトルの場合、探索範囲を狭くしても探索範囲から推定されるべきインデックスブロックが外れる可能性は低いため、探索範囲を狭く設定してもよい。探索範囲を狭くすることで、計算量を抑えつつ、余計な候補位置が決定されることを防ぐことができる。一方、項目が合計金額のインデックスブロックは、文書内の表構造の項目行数の変化に応じて、位置が上下に変化することがある。このため、処理対象の項目が合計金額の場合は他の項目よりも探索範囲を上下に広く設定してもよい。 The search range 1100 for determining the Y candidate position group in FIG. 11C is automatically determined with a preset value based on the position of the index block of the partial pattern. The size of the search range may be determined by using a range common to all items or according to the attributes of the items to be processed. For example, the title index block is often in a fixed position in the document. Therefore, when the item to be processed is a title, it is unlikely that the index block to be estimated from the search range is out of the search range even if the search range is narrowed, so the search range may be set narrow. By narrowing the search range, it is possible to prevent unnecessary candidate positions from being determined while reducing the amount of calculation. On the other hand, the position of the index block whose items are the total amount of money may change up and down according to the change in the number of item rows of the table structure in the document. Therefore, when the item to be processed is the total amount, the search range may be set wider up and down than other items.

Ｓ８０４において画像処理部３０５は、Ｓ８０３で決定された夫々のＹ候補位置を基準に、部分パターンの部分レイアウトとスキャン画像のテキストブロック群との一致度を導出する。 In S804, the image processing unit 305 derives the degree of coincidence between the partial layout of the partial pattern and the text block group of the scanned image based on the respective Y candidate positions determined in S803.

図１３は、スキャン画像内のある位置に処理対象の項目の部分レイアウトを重ねて置いた場合の、部分レイアウトとスキャン画像のテキストブロックのレイアウトとのの重なりの状態を示した図である。図１３を用いて、部分レイアウトとスキャン画像のテキストブロック群の一致度の導出方法について説明する。 FIG. 13 is a diagram showing a state in which the partial layout and the layout of the text block of the scanned image are overlapped when the partial layout of the item to be processed is superposed at a certain position in the scanned image. A method of deriving the degree of matching between the partial layout and the text block group of the scanned image will be described with reference to FIG.

図１３において、実線の矩形は、処理対象の項目の部分レイアウトを構成するテキストブロック１００３〜１００５である。一点鎖線の矩形は、部分パターン範囲１００６を表している。破線の矩形は、スキャン画像のテキストブロック１１０１、１１０４〜１１０６、１１０９を表す。斜線塗りつぶし領域１３０９、１３１０は、部分レイアウトのテキストブロック１００３〜１００５とスキャン画像のテキストブロックの重なっている領域を表している。 In FIG. 13, the solid line rectangle is a text block 1003 to 1005 that constitutes a partial layout of the item to be processed. The dashed line rectangle represents the partial pattern range 1006. The dashed rectangle represents the text blocks 1101, 1104-1106, 1109 of the scanned image. The diagonally filled areas 1309 and 1310 represent areas where the text blocks 1003 to 1005 of the partial layout and the text blocks of the scanned image overlap.

部分レイアウトとスキャン画像のテキストブロックとの一致度Ｓｃｏｒｅは、以下の式（１）で導出する。 The degree of matching Score between the partial layout and the text block of the scanned image is derived by the following equation (1).

上記式（１）において、Ｒは部分レイアウトを構成する全テキストブロックを表しており、またＮ_Rは部分レイアウトを構成するテキストブロックの総数を表す。図１３において、Ｒは、テキストブロック１００３〜１００５であり、Ｎ_Rは３である。 In the above formula (1), R represents the full text blocks constituting the partial layout and N _R represents the total number of text blocks constituting the partial layout. In FIG. 13, R is a text block 1003-1005 and _NR is 3.

Correlation(r)は、部分レイアウトを構成する一つのテキストブロックｒの個別一致度である。テキストブロックｒの個別一致度Correlation(r)は、式（２）によって導出する。 Correlation (r) is the degree of individual matching of one text block r constituting the partial layout. The individual agreement degree Correlation (r) of the text block r is derived by Eq. (2).

OverlappingQは、テキストブロックｒと重なりのあるスキャン画像のテキストブロックの集合である。OverlapArea(r,q)は、テキストブロックｒとOverlappingQのテキストブロックうちの１つのテキストブロックｑとの重なり領域の面積である。またＮ_OverlappingQはOverlappingQの総数を表す。 OverlappingQ is a set of text blocks of scanned images that overlap with the text block r. OverlapArea (r, q) is the area of the overlapping area of the text block r and the text block q of one of the text blocks of Overlapping Q. N _Overlapping Q represents the total number of Overlapping Q.

図１３において、rをテキストブロック１００３とした場合、OverlappingQはテキストブロック１１０５のみでありOverlapArea(r,q)は領域１３０９である。ｒをテキストブロック１００５とした場合、OverlappingQは、テキストブロック１１０４のみでありOverlapArea(r,q)は領域１３１０が該当する。ｒをテキストブロック１００４とした場合、該当するOverlappingQは無いためＮ_OverlappingQは0であることから、Correlation(r)は0である。 In FIG. 13, when r is the text block 1003, the Overlapping Q is only the text block 1105, and the Overlap Area (r, q) is the area 1309. When r is set to the text block 1005, the Overlapping Q corresponds to only the text block 1104, and the Overlap Area (r, q) corresponds to the area 1310. When r is the text block 1004, Correlation (r) is 0 because there is no corresponding Overlapping Q and N _{Overlapping Q is 0.}

Area_rはテキストブロックｒの面積であり、Area_qはテキストブロックｑの面積である。 Area_r is the area of the text block r, and Area_q is the area of the text block q.

なお、式（１）による一致度の導出では、スキャン画像のテキストブロックの数が多く、またテキストブロックの面積が大きいほど、個別一致度Collrelation(r)の値は大きく導出されてしまうことがある。そこで、一致度Ｓｃｏｒｅは、以下の式（１）’に示すようにペナルティ項PenaltyTermを追加してもよい。 In the derivation of the degree of matching by the equation (1), the larger the number of text blocks in the scanned image and the larger the area of the text blocks, the larger the value of the individual degree of matching Collrelation (r) may be derived. .. Therefore, for the degree of agreement Score, a penalty term PenaltyTerm may be added as shown in the following equation (1)'.

式（１）’におけるペナルティ項PenaltyTermは、式（３）によって導出する。 The penalty term PenaltyTerm in equation (1)'is derived by equation (3).

TotalArea_Rは、部分レイアウトを構成する全テキストブロックの総面積である。
図１３ではテキストブロック１００３〜１００５の総面積である。 TotalArea_R is the total area of all text blocks that make up the partial layout.
In FIG. 13, it is the total area of the text blocks 1003 to 1005.

TotalArea_NonOverlappingQは、部分パターン範囲内に存在するスキャン画像のテキストブロックのうち、部分レイアウトを構成するテキストブロックの何れとも重ならないテキストブロック群の面積の総和である。図１３の場合、部分パターン範囲１００６内のテキストブロック１１０１、１１０４、１１０５、１１０６、１１０９のうちテキストブロック１００３〜１００５と重ならないテキストブロック１１０１、１１０６、１１０９の面積の総和である。 TotalArea_NonOverlappingQ is the sum of the areas of the text blocks of the scanned image existing in the partial pattern range that do not overlap with any of the text blocks constituting the partial layout. In the case of FIG. 13, it is the total area of the text blocks 1101, 1104, 1105, 1106, and 1109 of the text blocks 1101, 1104, 1105, 1106, and 1109 that do not overlap with the text blocks 1003 to 1005 in the partial pattern range 1006.

ペナルティ項を設けることによって、部分パターン範囲１００６内の部分レイアウトを構成するテキストブロックが存在しなかった範囲に、スキャン画像内のテキストブロックが存在する場合に一致度を減点するように調整することができる。よって、部分レイアウトを構成するテキストブロックが少ない場合であっても、部分パターン範囲内の部分レイアウトを構成するテキストブロックが存在しない領域の情報を活用して一致度を導出することができる。なお、一致度の導出方法は、上記の式による導出に限るものではなく、部分レイアウトとの一致度が決定できればよい。 By providing a penalty term, it is possible to adjust so that the degree of matching is deducted when the text block in the scanned image exists in the range where the text block constituting the partial layout in the partial pattern range 1006 does not exist. can. Therefore, even when the number of text blocks constituting the partial layout is small, the degree of matching can be derived by utilizing the information in the area where the text blocks constituting the partial layout within the partial pattern range do not exist. The method of deriving the degree of coincidence is not limited to the derivation by the above equation, and it suffices if the degree of coincidence with the partial layout can be determined.

Ｓ８０４において画像処理部３０５は、Ｓ８０３で決定したＹ候補位置群のうちのいずれかのＹ候補位置に、インデックスブロックが位置するように部分パターン（部分レイアウトおよび部分パターン範囲）を置く。そして、画像処理部３０５は、部分パターンをＸ方向に走査して、各位置における一致度を導出する。画像処理部３０５は、これを全てのＹ候補位置群に対して行う。 In S804, the image processing unit 305 places a partial pattern (partial layout and partial pattern range) so that the index block is located at any Y candidate position in the Y candidate position group determined in S803. Then, the image processing unit 305 scans the partial pattern in the X direction to derive the degree of coincidence at each position. The image processing unit 305 performs this for all Y candidate position groups.

図１４は、Ｓ８０３で決定したＹ候補位置群のうちの一つのＹ候補位置における本ステップの処理を表した図である。図１４（ａ）において、実線の矩形は、部分レイアウトを構成するテキストブロック１００３〜１００５であり、一点鎖線の矩形は部分パターン範囲１００６を表している。また破線の矩形は、スキャン画像のテキストブロック１１０１、１１０５、１１０６を表し、斜線の領域は、部分レイアウトのテキストブロックとスキャン画像のテキストブロックとの重なっている領域を表している。また、図１４では、本ステップにおける処理が図１４（ａ）〜（ｅ）から順に処理が進むように示されており、探索範囲内で部分パターンをＸ方向に（左から右に）走査しながら、それぞれの位置における一致度を導出する様子を示している。同様の処理が夫々のＹ候補位置において行われる。 FIG. 14 is a diagram showing the processing of this step at the Y candidate position of one of the Y candidate position groups determined in S803. In FIG. 14A, the solid line rectangle represents the text blocks 1003 to 1005 constituting the partial layout, and the alternate long and short dash line rectangle represents the partial pattern range 1006. The dashed square represents the text blocks 1101, 1105, 1106 of the scanned image, and the shaded area represents the overlapping region of the text block of the partial layout and the text block of the scanned image. Further, in FIG. 14, the processing in this step is shown to proceed in order from FIGS. 14 (a) to 14 (e), and the partial pattern is scanned in the X direction (from left to right) within the search range. However, it shows how to derive the degree of coincidence at each position. Similar processing is performed at each Y candidate position.

Ｓ８０５において画像処理部３０５は、Ｓ８０４で導出した一致度が最大となる位置をＸＹ候補位置と決定する。例えば、図１４の場合、部分パターン（部分レイアウト）が、図１４（ｃ）に示す位置で一致度が最大となる。このため、図１４（ｃ）における部分レイアウトに含まれるインデックスブロックを示すテキストブロック１００３の位置が、ＸＹ候補位置として決定される。 In S805, the image processing unit 305 determines the position where the degree of coincidence derived in S804 is maximum as the XY candidate position. For example, in the case of FIG. 14, the degree of coincidence of the partial pattern (partial layout) is maximized at the position shown in FIG. 14 (c). Therefore, the position of the text block 1003 indicating the index block included in the partial layout in FIG. 14C is determined as the XY candidate position.

Ｓ８０６において画像処理部３０５は、Ｓ８０５で決定したＸＹ候補位置における一致度が所定の閾値以上かどうかを判定する。 In S806, the image processing unit 305 determines whether or not the degree of coincidence at the XY candidate positions determined in S805 is equal to or greater than a predetermined threshold value.

一致度が閾値以上の場合（Ｓ８０６がＹＥＳ）、Ｓ８０７において画像処理部３０５は、Ｓ８０５で決定したスキャン画像上のＸＹ候補位置を処理対象の項目のテキストブロック（インデックスブロック）のある位置と推定する。画像処理部３０５は、推定した位置に基づき、スキャン画像内の処理対象の項目のインデックスブロックを推定する処理を行う。 When the degree of coincidence is equal to or greater than the threshold value (YES in S806), the image processing unit 305 estimates the XY candidate position on the scanned image determined in S805 as the position where the text block (index block) of the item to be processed exists in S807. .. The image processing unit 305 performs a process of estimating the index block of the item to be processed in the scanned image based on the estimated position.

例えば、登録文書における処理対象の項目のインデックスブロックをスキャン画像内のＸＹ候補位置にシフトさせた場合に、重なり合うスキャン画像内のテキストブロックが、所定の条件を満たすかが判定される。所定の条件とは、例えば、登録文書における処理対象のインデックスブロックとの重なり度合いを示す重なり率が所定の値以上、かつ、登録文書における処理対象のインデックスブロックとの左上座標の距離が一定の範囲内に入っているかという条件である。 For example, when the index block of the item to be processed in the registered document is shifted to the XY candidate position in the scanned image, it is determined whether the overlapping text blocks in the scanned image satisfy a predetermined condition. The predetermined condition is, for example, that the overlap rate indicating the degree of overlap with the index block to be processed in the registered document is equal to or more than a predetermined value, and the distance of the upper left coordinate from the index block to be processed in the registered document is within a certain range. It is a condition that it is inside.

所定の条件を満たすテキストブロックがあると判定された場合（Ｓ８０７がＹＥＳ）、Ｓ８０８に進む。Ｓ８０８において画像処理部３０５は、Ｓ８０７で所定の条件を満たすと判定されたスキャン画像のテキストブロックを、Ｓ８０１で選択した処理対象の項目を示す文字列を含むテキストブロック（インデックスブロック）と推定する。 If it is determined that there is a text block satisfying a predetermined condition (YES in S807), the process proceeds to S808. In S808, the image processing unit 305 estimates that the text block of the scanned image determined in S807 to satisfy the predetermined condition is a text block (index block) including a character string indicating the item to be processed selected in S801.

一致度が閾値未満の場合（Ｓ８０６がＮＯ）または該当のテキストブロックがないと判定された場合（Ｓ８０７がＮＯ）、Ｓ８０９に進む。Ｓ８０９において画像処理部３０５は、Ｓ８０１で選択した処理対象の項目に対応するテキストブロックはスキャン画像内には無いと決定する。例えば、スキャン画像において処理対象の項目に対応する文字列が所定の領域に記載されていない場合、あるいは、Ｓ８０４で誤って位置を推定してしまった場合、Ｓ８０９において決定が行われる。 If the degree of matching is less than the threshold value (S806 is NO) or if it is determined that there is no corresponding text block (S807 is NO), the process proceeds to S809. In S809, the image processing unit 305 determines that the text block corresponding to the item to be processed selected in S801 is not included in the scanned image. For example, if the character string corresponding to the item to be processed is not described in the predetermined area in the scanned image, or if the position is erroneously estimated in S804, the determination is made in S809.

Ｓ８１０において画像処理部３０５は、登録文書のインデックス情報に登録されている全ての項目について、インデックスブロックを推定する処理を完了したかを判定する。未処理の項目があればＳ８０１に戻る。 In S810, the image processing unit 305 determines whether or not the process of estimating the index block has been completed for all the items registered in the index information of the registered document. If there is an unprocessed item, the process returns to S801.

全ての項目について処理が完了していれば本フローチャートの処理を終えＳ５０８に進む。Ｓ５０８において画像処理部３０５は、推定された夫々の項目のインデックスブロックにＯＣＲ処理を実行し、それぞれの項目に対応する文字列をインデックスとして抽出する。 If the processing for all the items is completed, the processing of this flowchart is completed and the process proceeds to S508. In S508, the image processing unit 305 executes OCR processing on the index block of each of the estimated items, and extracts the character string corresponding to each item as an index.

以上説明したように本実施形態では、テキストブロックのレイアウトの一部を利用してスキャン画像に含まれるインデックスの抽出を行う。このため、本実施形態によれば、入力文書おける記載内容の変化等によって、スキャン画像に含まれるインデックスブロックの位置が登録文書と異なる場合であっても、インデックスを抽出することができる。また、本実施形態では、文書マッチングによって入力文書の種類を特定して、文書の種類に紐づいた抽出ルールを利用する。このため、テキストブロックの部分的なレイアウトによるインデックスブロックを推定する処理であっても、インデックスの誤抽出を抑制することができる。また、文書マッチングおよびインデックスブロック推定処理では、ＯＣＲ処理の前処理の結果として得られる前景領域のうちテキストブロックのみを使用する。このため、余計な計算コストをかけることなく、インデックス抽出処理を行うことができる。 As described above, in the present embodiment, the index included in the scanned image is extracted by using a part of the layout of the text block. Therefore, according to the present embodiment, the index can be extracted even when the position of the index block included in the scanned image is different from that of the registered document due to a change in the description content in the input document or the like. Further, in the present embodiment, the type of the input document is specified by document matching, and the extraction rule associated with the document type is used. Therefore, even in the process of estimating the index block by the partial layout of the text block, it is possible to suppress the erroneous extraction of the index. Further, in the document matching and index block estimation processing, only the text block in the foreground area obtained as a result of the preprocessing of the OCR processing is used. Therefore, the index extraction process can be performed without incurring an extra calculation cost.

＜実施形態２＞
実施形態１では、部分パターン範囲は、予め設定された値に基づき決定する方法について説明した。しかしながら、部分パターン範囲を広く設定しすぎると、インデックスブロックの周囲のみレイアウトが変わっているような場合、適切にインデックスブロックの位置を推定することができない。一方、部分パターン範囲が小さくなると部分レイアウトを構成するテキストブロックの数が少なく決定されることがあり、スキャン画像内の一致度の高い領域を探索するのが難しくなる。このため本実施形態では、部分パターン範囲を適切なサイズに決定する方法を説明する。なお、本実施形態については、実施形態１からの差分を中心に説明する。特に明記しない部分については実施形態１と同じ構成および処理である。 <Embodiment 2>
In the first embodiment, a method of determining the partial pattern range based on a preset value has been described. However, if the partial pattern range is set too wide, the position of the index block cannot be estimated appropriately when the layout changes only around the index block. On the other hand, when the partial pattern range becomes small, the number of text blocks constituting the partial layout may be determined to be small, and it becomes difficult to search for a region having a high degree of matching in the scanned image. Therefore, in the present embodiment, a method of determining the partial pattern range to an appropriate size will be described. In addition, this embodiment will be described mainly on the difference from the first embodiment. The parts not specified in particular have the same configuration and processing as in the first embodiment.

文書の種類に応じてインデックスブロックの周囲に存在するテキストブロックの数、レイアウトは変わる。このため、本実施形態では、部分パターン範囲のサイズを決定するために、段階的に対象の項目のインデックスブロックを含む領域を広げながら、その領域にと重なるテキストブロックの数をカウントする。そして重なるテキストブロックの数が一定数以上になったときの領域を、その項目の部分パターン範囲として決定する。 The number and layout of text blocks around the index block change depending on the type of document. Therefore, in the present embodiment, in order to determine the size of the partial pattern range, the area including the index block of the target item is gradually expanded, and the number of text blocks overlapping the area is counted. Then, the area when the number of overlapping text blocks exceeds a certain number is determined as the partial pattern range of the item.

図１５は、本実施形態における部分パターン範囲の決定方法を説明するための図である。図１５（ａ）における、実線の矩形はタイトルのインデックスブロック１０００であり、一点鎖線の矩形は、タイトルの部分パターン範囲を決定するための領域である。領域は、それぞれ、初期領域１５００、２段階目の領域１５０１、最大領域１５０２を示している。図１５（ａ）では、タイトルの項目における部分パターン範囲を決定するための領域が段階的に変更される様子を示している。初期領域から最大領域まで段階的に領域を広げながら、その領域と重なるインデックスブロックを除くテキストブロックをカウントする。そして、カウントされたテキストブロックが所定の数以上になったときの一点鎖線の矩形で示す領域を、その項目の部分パターン範囲として決定する。なお、所定の数は、１個以上であることが好ましい。本実施形態では、所定の数が５であるものとして説明する。 FIG. 15 is a diagram for explaining a method of determining a partial pattern range in the present embodiment. In FIG. 15A, the solid line rectangle is the title index block 1000, and the alternate long and short dash line rectangle is an area for determining the partial pattern range of the title. The regions indicate the initial region 1500, the second stage region 1501, and the maximum region 1502, respectively. FIG. 15A shows how the area for determining the partial pattern range in the title item is changed stepwise. While gradually expanding the area from the initial area to the maximum area, the text blocks excluding the index block that overlaps the area are counted. Then, the area indicated by the rectangle of the alternate long and short dash line when the number of counted text blocks becomes a predetermined number or more is determined as the partial pattern range of the item. The predetermined number is preferably one or more. In this embodiment, it is assumed that the predetermined number is 5.

本実施形態の部分パターン範囲の決定方法について具体的に説明する。はじめに、初期領域１５００と少しでも重なっているテキストブロックの数をカウントする。この場合、インデックスブロック１０００以外のテキストブロックが存在しないため、次の段階へ進む。 A method for determining the partial pattern range of the present embodiment will be specifically described. First, the number of text blocks that overlap the initial area 1500 even a little is counted. In this case, since there is no text block other than the index block 1000, the process proceeds to the next stage.

次に、領域を広げて、２段階目の領域１５０１と少しでも重なっているテキストブロックをカウントする。図１５（ｂ）は、部分パターン範囲を決定するための領域を２段階目の領域１５０１とした場合の図である。図１５（ｂ）に示すように２段階目の領域１５０１とは、テキストブロック１００１、１００８〜１０１３が重なる。このため２段階目の領域１５０１と重なるテキストブロックは７個とカウントされる。そして重なるテキストブロックの数が所定の数である５以上であると判定される。このため、タイトルの部分パターン範囲については２段階目の領域１５０１が示す位置およびサイズに決定される。このため部分パターン範囲に少なくとも一部が含まれるテキストブロック１００１、１００８〜１０１３と、インデックスブロック１０００とからなるレイアウトが、タイトルの部分レイアウトとして決定される。 Next, the area is expanded and the text blocks that overlap the second stage area 1501 as much as possible are counted. FIG. 15B is a diagram in the case where the region for determining the partial pattern range is the second stage region 1501. As shown in FIG. 15B, the text blocks 1001 and 1008-1013 overlap with the second stage region 1501. Therefore, the number of text blocks overlapping the second stage area 1501 is counted as seven. Then, it is determined that the number of overlapping text blocks is 5 or more, which is a predetermined number. Therefore, the partial pattern range of the title is determined by the position and size indicated by the second stage region 1501. Therefore, the layout including the text blocks 1001, 1008-1013 and the index block 1000 whose partial pattern range includes at least a part is determined as the partial layout of the title.

または、項目によって、周囲のテキストブロックの数は異なり、記載内容によるテキストブロックのレイアウトの変化が少ない領域は異なる。このため、例えば、項目の属性に応じて部分パターン範囲のサイズを異ならせてもよい。つまり、項目の属性に応じた部分パターンのサイズを予め設定してもよい。 Alternatively, the number of surrounding text blocks differs depending on the item, and the area where the layout of the text blocks does not change much depending on the description content differs. Therefore, for example, the size of the partial pattern range may be different depending on the attribute of the item. That is, the size of the partial pattern may be set in advance according to the attribute of the item.

項目がタイトルの場合、タイトルのテキストブロックの近傍にはテキストブロックが存在しないことが多いという特徴がある。また、タイトルは、文書の記載内容の変化によるテキストブロックのレイアウトの変化が少ない文書の上部に存在するという特徴がある。このため、図１０（ｃ）の部分パターン範囲１００７に示すように、項目が文書のタイトルであれば、Ｘ方向は画像幅全体が収まり、Ｙ方向も画像の約４分の１が収まるような領域が部分パターン範囲として決定されてもよい。 When the item is a title, there is often no text block in the vicinity of the text block of the title. Further, the title is characterized in that it exists at the upper part of the document in which the layout of the text block does not change much due to the change in the description content of the document. Therefore, as shown in the partial pattern range 1007 of FIG. 10C, if the item is the title of the document, the entire image width fits in the X direction, and about a quarter of the image fits in the Y direction. The region may be determined as a partial pattern range.

以上説明したように本実施形態では、文書に応じて部分パターン範囲が決定される。このため、文書に応じて適切な部分パターン範囲によって、インデックスブロック推定処理の精度を向上させることができる。 As described above, in the present embodiment, the partial pattern range is determined according to the document. Therefore, the accuracy of the index block estimation process can be improved by the appropriate partial pattern range according to the document.

＜実施形態３＞
実施形態１では、部分パターンを利用して導出された一致度が最大となる位置をＸＹ候補位置として決定し、ＸＹ候補位置の一致度が所定の閾値以上であれば、ＸＹ候補位置に基づき処理対象の項目のインデックスブロックのある位置を推定する方法を説明した。 <Embodiment 3>
In the first embodiment, the position where the degree of matching derived by using the partial pattern is maximized is determined as the XY candidate position, and if the degree of matching of the XY candidate positions is equal to or higher than a predetermined threshold value, processing is performed based on the XY candidate position. The method of estimating the position of the index block of the target item was explained.

しかしながら、入力文書には、登録文書の部分レイアウトと配置が類似したテキストブロックを含む領域が複数存在することがある。入力文書内に部分レイアウトと類似する領域が複数存在する場合、実施形態１の方法では、入力文書内における処理対象の項目のインデックスブロックの推定に失敗してしまうことがある。 However, the input document may have a plurality of areas containing text blocks that are similar in arrangement to the partial layout of the registered document. When there are a plurality of areas similar to the partial layout in the input document, the method of the first embodiment may fail to estimate the index block of the item to be processed in the input document.

そこで本実施形態では、処理対象の項目の部分レイアウトに類似した領域が入力文書内に複数存在する場合であっても、入力文書内のインデックスブロックの位置を適切に推定する方法について説明する。なお、本実施形態については、実施形態１からの差分を中心に説明する。特に明記しない部分については実施形態１と同じ構成および処理である。 Therefore, in the present embodiment, a method of appropriately estimating the position of the index block in the input document will be described even when a plurality of areas similar to the partial layout of the item to be processed exist in the input document. In addition, this embodiment will be described mainly on the difference from the first embodiment. The parts not specified in particular have the same configuration and processing as in the first embodiment.

図１６は、本実施形態におけるＳ５０７のインデックスブロック推定処理を説明するためのフローチャートである。本実施形態におけるインデックスブロック推定処理の詳細について、図１６のフローチャートに従い説明する。Ｓ１６００〜Ｓ１６０４はＳ８００〜Ｓ８０４と同一であるため説明を省略する。 FIG. 16 is a flowchart for explaining the index block estimation process of S507 in the present embodiment. The details of the index block estimation process in the present embodiment will be described with reference to the flowchart of FIG. Since S1600 to S1604 are the same as S800 to S804, the description thereof will be omitted.

Ｓ１６０５において画像処理部３０５は、Ｓ１６０４で導出した一致度が所定の閾値以上となるスキャン画像内のＸＹ位置を決定する。本ステップの結果、複数のＸＹ位置が決定されない場合もあるが、便宜的に本ステップによって決定されるＸＹ位置をＸＹ候補位置群と呼ぶ。 In S1605, the image processing unit 305 determines the XY position in the scanned image in which the degree of coincidence derived in S1604 is equal to or greater than a predetermined threshold value. As a result of this step, a plurality of XY positions may not be determined, but for convenience, the XY positions determined by this step are referred to as XY candidate position groups.

図１７は、インデックスブロックとその周囲のブロックからなる部分レイアウトと類似する領域が複数存在する登録文書の例を示す図である。図１７（ａ）は、登録文書の一例を示す図である。図１７（ｂ）は、図１７（ａ）の登録文書における「見積日付（ＱｕｏｔａｔｉｏｎＤａｔｅ）」の項目に対応する文字列を含むテキストブロック１７０５をインデックスブロックとした場合の部分パターンを示す図である。図１７（ｂ）において、一点鎖線の矩形は、「見積日付」の項目の部分パターン範囲１７００を示し、実線の矩形で表されるテキストブロック１７０１〜１７０６は、「見積日付」の項目の部分レイアウトを構成するテキストブロックを示している。図１６のフローチャートの説明では、「見積日付」を処理対象の項目とした場合の処理について説明する。 FIG. 17 is a diagram showing an example of a registered document in which a plurality of areas similar to a partial layout composed of an index block and a block around the index block exist. FIG. 17A is a diagram showing an example of a registered document. FIG. 17B is a diagram showing a partial pattern when the text block 1705 including the character string corresponding to the item of “Quotation Date” in the registration document of FIG. 17A is used as the index block. .. In FIG. 17B, the alternate long and short dash line rectangle indicates the partial pattern range 1700 of the "estimated date" item, and the text blocks 1701-1706 represented by the solid line rectangle are the partial layout of the "estimated date" item. Shows the text blocks that make up. In the description of the flowchart of FIG. 16, processing when "estimated date" is set as the item to be processed will be described.

図１８は、入力文書を説明するための図である。図１８（ａ）は、入力文書を示す図であり、本フローチャートの説明では、この入力文書がスキャンされた結果得られたスキャン画像に対して、インデックスブロック推定処理が行われるものとして説明する。また、Ｓ５０４の文書マッチングにより、図１８（ａ）の入力文書に類似する文書は、図１７の登録文書が特定されたものとして説明する。 FIG. 18 is a diagram for explaining an input document. FIG. 18A is a diagram showing an input document, and in the description of this flowchart, it is assumed that the index block estimation process is performed on the scanned image obtained as a result of scanning the input document. Further, the document similar to the input document of FIG. 18A will be described as assuming that the registered document of FIG. 17 is specified by the document matching of S504.

図１８（ｂ）〜（ｅ）は、それぞれ、図１８（ａ）の入力文書のスキャン画像に対してブロックセレクション処理を行った結果検出されたテキストブロックを表す画像に、図１７（ｂ）の「見積日付」の部分パターンを重畳した図である。図１８（ｂ）〜（ｅ）の夫々の図における矩形は、部分パターンを示す。即ち、実線の矩形は、部分レイアウトを構成するテキストブロックであり、一点鎖線の矩形は部分パターン範囲である。 18 (b) to 18 (e) are images showing text blocks detected as a result of performing block selection processing on the scanned image of the input document of FIG. 18 (a), respectively. It is the figure which superposed the partial pattern of "estimated date". The rectangles in the respective figures of FIGS. 18 (b) to 18 (e) indicate partial patterns. That is, the solid line rectangle is a text block constituting the partial layout, and the alternate long and short dash line rectangle is the partial pattern range.

図１８（ｂ）〜（ｅ）で示す、部分パターンの位置は、Ｓ１６０４で導出した一致度が所定の閾値以上となったときの位置である。このため部分レイアウトを構成する実線の矩形で表したテキストブロックのうち、インデックスブロックのＸＹ位置１８０１〜１８０４が、本ステップの処理の結果、ＸＹ候補位置群として決定されている。 The positions of the partial patterns shown in FIGS. 18 (b) to 18 (e) are positions when the degree of coincidence derived in S1604 is equal to or greater than a predetermined threshold value. Therefore, among the text blocks represented by the solid rectangles constituting the partial layout, the XY positions 1801 to 1804 of the index block are determined as the XY candidate position groups as a result of the processing of this step.

図１８（ａ）に示す入力文書のように、単純なテキストブロックの配置が繰り返し存在する文書において、その繰り返して配置されているテキストブロックの中にインデックスブロックが存在される場合には、一致度が閾値以上となるＸＹ位置が複数決定される。このため、図１８（ａ）に示す入力文書に対して、本ステップの処理がされた結果決定されるＸＹ候補位置群の数は２以上となる。 In a document in which simple text block arrangements exist repeatedly as in the input document shown in FIG. 18A, if an index block exists in the repeatedly arranged text blocks, the degree of matching A plurality of XY positions where is equal to or greater than the threshold value are determined. Therefore, for the input document shown in FIG. 18A, the number of XY candidate position groups determined as a result of the processing of this step is 2 or more.

Ｓ１６０６において画像処理部３０５は、Ｓ１６０５で決定したＸＹ候補位置群の数に応じて処理を切り替える。ＸＹ候補位置群の数が１個であれば、Ｓ１６１０に進み、ＸＹ候補位置群の数が０個であれば、Ｓ１６１２に進む。Ｓ１６１２の処理はＳ８０９と同一であるため説明を省略する。 In S1606, the image processing unit 305 switches the processing according to the number of XY candidate position groups determined in S1605. If the number of XY candidate position groups is 1, the process proceeds to S1610, and if the number of XY candidate position groups is 0, the process proceeds to S1612. Since the processing of S1612 is the same as that of S809, the description thereof will be omitted.

ＸＹ候補位置群の数が２個以上である場合はＳ１６０７に進む。Ｓ１６０７において画像処理部３０５は、登録文書内の位置であって、処理対象の項目の部分レイアウトとの一致度が所定の閾値以上となる位置である類似位置（群）を取得する。 If the number of XY candidate position groups is two or more, the process proceeds to S1607. In S1607, the image processing unit 305 acquires a similar position (group) that is a position in the registered document and the degree of coincidence with the partial layout of the item to be processed is equal to or more than a predetermined threshold value.

登録文書内の位置に、処理対象の項目の部分パターンに含まれる部分レイアウトを重畳させてテキストブロックの一致度の導出を行い、一致度が所定の閾値以上となる登録文書内のＸＹ位置が「類似位置」として決定される。登録文書内のテキストブロックと部分レイアウトのテキストブロックとの一致度の算出方法は、Ｓ１６０２〜Ｓ１６０４と同様の方法で導出されればよい。即ち、入力文書を対象としていたところを、登録文書を対象として同様の手順で一致度を導出すればよい。 The partial layout included in the partial pattern of the item to be processed is superimposed on the position in the registered document to derive the matching degree of the text block, and the XY position in the registered document where the matching degree is equal to or higher than a predetermined threshold value is ". Determined as "similar position". The method of calculating the degree of coincidence between the text block in the registered document and the text block of the partial layout may be derived by the same method as in S1602 to S1604. That is, the degree of coincidence may be derived by the same procedure for the registered document instead of the input document.

図１９は、登録文書内の類似位置を説明するための図である。図１９（ａ）は、図１７（ａ）と同一の登録文書を示す図である。図１９（ｂ）〜（ｅ）は、それぞれ、図１９（ａ）の登録文書のスキャン画像に対してブロックセレクション処理を行った結果検出されたテキストブロックを表す画像に、図１７（ｂ）の「見積日付」の部分パターンを重畳した図である。図１９（ｂ）〜（ｅ）の夫々の図における矩形は、部分パターンを示す。即ち、実線の矩形は、部分レイアウトを構成するテキストブロックであり、一点鎖線の矩形は部分パターン範囲である。 FIG. 19 is a diagram for explaining similar positions in the registration document. FIG. 19 (a) is a diagram showing the same registration document as in FIG. 17 (a). 19 (b) to 19 (e) are images showing text blocks detected as a result of performing block selection processing on the scanned image of the registered document of FIG. 19 (a), respectively. It is the figure which superposed the partial pattern of "estimated date". The rectangles in the respective figures of FIGS. 19 (b) to 19 (e) indicate partial patterns. That is, the solid line rectangle is a text block constituting the partial layout, and the alternate long and short dash line rectangle is the partial pattern range.

図１９（ｂ）〜（ｅ）の、部分パターンの位置は、導出された一致度が所定の閾値以上となったときの、それぞれの位置である。このため部分レイアウトを構成するテキストブロックのうちのインデックスブロックのＸＹ位置が、類似位置群１９０１〜１９０４として決定されている。本ステップでは、処理対象の項目の類似位置群の位置情報が取得される。類似位置群１９０１〜１９０４には、類似位置１９０２のように、図１７（ｂ）で示した登録時のインデックスブロック１７０５のＸＹ位置も含まれる。 The positions of the partial patterns in FIGS. 19B to 19E are the respective positions when the derived degree of coincidence becomes equal to or higher than a predetermined threshold value. Therefore, the XY positions of the index blocks among the text blocks constituting the partial layout are determined as the similar position groups 1901-1904. In this step, the position information of the similar position group of the item to be processed is acquired. The similar position group 1901-1904 also includes the XY position of the index block 1705 at the time of registration shown in FIG. 17B, as in the similar position 1902.

なお、Ｓ１６０７で登録文書内の類似位置を決定する処理が行われる必要はない。例えば、文書の登録時において、項目ごとに部分パターンを決定した後に類似位置群を決定し、類似位置群の情報を図７で示した抽出ルールの一部として予め記憶させてもよい。つまり、Ｓ１６０７では、記憶されている処理対象の項目の抽出ルールの１つとして類似位置群が取得されればよい。 It is not necessary to perform the process of determining the similar position in the registered document in S1607. For example, when registering a document, a partial pattern may be determined for each item, then a similar position group may be determined, and information on the similar position group may be stored in advance as part of the extraction rule shown in FIG. That is, in S1607, a similar position group may be acquired as one of the stored extraction rules for the items to be processed.

Ｓ１６０８において画像処理部３０５は、Ｓ１６０７で取得した登録文書の類似位置群と、Ｓ１６０５で決定した入力文書におけるＸＹ候補位置群との対応付けを行う。具体的には、Ｙ位置でソートされた類似位置群に対して、類似位置群と同一条件でソートされたＸＹ候補位置群を、Ｙ位置の一方の側から順番で対応付けを行い、さらにＹ位置の他方の側からの順番で対応付けを行う。 In S1608, the image processing unit 305 associates the similar position group of the registered document acquired in S1607 with the XY candidate position group in the input document determined in S1605. Specifically, the XY candidate position groups sorted under the same conditions as the similar position group are associated with the similar position group sorted at the Y position in order from one side of the Y position, and further Y. Correspondence is performed in the order from the other side of the position.

図２０は、本ステップの処理を説明するための図である。表中の数値は、図１８または図１９で示した文書内の位置を示す符号を示す数値である。 FIG. 20 is a diagram for explaining the process of this step. The numerical values in the table are numerical values indicating symbols indicating positions in the document shown in FIG. 18 or FIG.

図２０（ａ）は、図１８および図１９で示したように、類似位置群とＸＹ候補位置群の数が一致している場合の対応付けを示す図である。列２００１はＹ位置でソートされた類似位置群である。列２００２はＹ位置でソートされたＸＹ候補位置群であり、列２００１の類似位置群に対してＹ位置の上から順番で対応付けられたＸＹ候補位置群である。列２００３はＹ位置でソートされたＸＹ候補位置群であり、列２００１の類似位置群に対してＹ位置の下から順番で対応付けられたＸＹ候補位置群である。図２０（ａ）では、列２００２のＸＹ候補位置群も列２００３のＸＹ候補位置群も、それぞれ同じ類似位置と対応付けられる。 FIG. 20A is a diagram showing the correspondence when the numbers of the similar position group and the number of the XY candidate position groups match, as shown in FIGS. 18 and 19. Column 2001 is a group of similar positions sorted by Y position. Column 2002 is an XY candidate position group sorted by Y position, and is an XY candidate position group associated with a similar position group in column 2001 in order from the top of the Y position. Column 2003 is an XY candidate position group sorted by Y position, and is an XY candidate position group associated with a similar position group in column 2001 in order from the bottom of the Y position. In FIG. 20A, both the XY candidate position group in column 2002 and the XY candidate position group in column 2003 are associated with the same similar positions.

図２０（ｂ）は、ＸＹ位置群の数に対して、類似位置群の数が少ない場合の本ステップの対応付けの方法を説明するための図である。例えば、図１９（ｅ）に示す登録文書の位置に部分パターンを重畳させた場合の登録文書との一致度が閾値未満であり、Ｓ１６０７では類似位置群１９０１〜１９０３のみが取得された場合の、対応付けを表した図が図２０（ｂ）である。列２０１１はＹ位置でソートされた類似位置群である。列２０１２は、列２０１１の類似位置群に対してＹ位置の上から順番で対応付けられたＸＹ候補位置群である。列２０１３は、列２０１１の類似位置群に対してＹ位置の下から順番で対応付けられたＸＹ候補位置群である。図２０（ｂ）では、上からの対応付けと下からの対応付けでは、類似位置群に対応するＸＹ候補位置群が異なる結果となっている。 FIG. 20B is a diagram for explaining a method of associating the number of similar position groups with respect to the number of XY position groups in this step. For example, when the degree of coincidence with the registered document when the partial pattern is superimposed on the position of the registered document shown in FIG. 19E is less than the threshold value, and only the similar position groups 1901-1903 are acquired in S1607, The figure showing the correspondence is shown in FIG. 20 (b). Column 2011 is a group of similar positions sorted by Y position. Column 2012 is an XY candidate position group associated with the similar position group in column 2011 in order from the top of the Y position. Column 2013 is an XY candidate position group associated with the similar position group in column 2011 in order from the bottom of the Y position. In FIG. 20B, the XY candidate position groups corresponding to the similar position groups are different in the association from the top and the association from the bottom.

図２０（ｃ）は、ＸＹ候補位置群の数に対して、類似位置群の数が多い場合の本ステップの対応付けの方法を説明するための図である。図１８（ｅ）に示す入力文書の位置に部分パターンを重畳させた場合の入力文書との一致度が閾値未満であり、Ｓ１６０５ではＸＹ位置１８０１〜１８０３のみがＸＹ候補位置群として決定された場合の、対応付けを表した図が図２０（ｃ）である。列２０２１はＹ位置でソートされた類似位置群である。列２０２２は、列２０２１の類似位置群に対してＹ位置の上から順番で対応付けられたＸＹ候補位置群である。列２０２３は、列２０２１の類似位置群に対してＹ位置の下から順番で対応付けられたＸＹ候補位置群である。上からの対応付けと下からの対応付けとでは異なる結果となり、上からの対応付けでは類似位置１９０４に対応するＸＹ候補位置群は見つからず、下からの対応付けでは類似位置１９０１に対応するＸＹ候補位置群は見つからない結果となる。 FIG. 20C is a diagram for explaining a method of associating the number of similar position groups with respect to the number of XY candidate position groups in this step. When the degree of coincidence with the input document when the partial pattern is superimposed on the position of the input document shown in FIG. 18E is less than the threshold value, and in S1605, only the XY positions 1801 to 1803 are determined as the XY candidate position group. 20 (c) is a diagram showing the correspondence of the above. Column 2021 is a group of similar positions sorted by Y position. Column 2022 is an XY candidate position group associated with the similar position group in column 2021 in order from the top of the Y position. Column 2023 is an XY candidate position group associated with the similar position group in column 2021 in order from the bottom of the Y position. The results of the mapping from the top and the mapping from the bottom are different. In the mapping from the top, the XY candidate position group corresponding to the similar position 1904 cannot be found, and in the mapping from the bottom, the XY corresponding to the similar position 1901. The result is that the candidate position group cannot be found.

Ｓ１６０９において画像処理部３０５は、Ｓ１６０８で行った対応付けの結果に基づき、Ｓ１６０５で決定されたＸＹ候補位置群から１つのＸＹ候補位置を決定する。 In S1609, the image processing unit 305 determines one XY candidate position from the XY candidate position group determined in S1605 based on the result of the association performed in S1608.

Ｓ１６０８で行われた対応付けの結果が、図２０（ａ）に示したように、上からの対応付けと下からの対応付けの結果が一致する場合がある。この場合は、ＸＹ候補位置群のうち、登録時のインデックスブロックの位置を示す類似位置に対応付けられたＸＹ位置を、１つのＸＹ候補位置として決定する。図２０（ａ）の例では、インデックスブロックの位置を示す類似位置１９０２に対応付けられたＸＹ位置１８０２が、１つのＸＹ候補位置として決定される。 As shown in FIG. 20A, the result of the association performed in S1608 may match the result of the association from the top and the result of the association from the bottom. In this case, among the XY candidate position groups, the XY position associated with the similar position indicating the position of the index block at the time of registration is determined as one XY candidate position. In the example of FIG. 20A, the XY position 1802 associated with the similar position 1902 indicating the position of the index block is determined as one XY candidate position.

一方、Ｓ１６０８で行われた対応付けの結果が、図２０（ｂ）および（ｃ）で示したように、上からの対応付けと下からの対応付けの結果が一致しない場合がある。この場合ははじめに、上からの対応付けを行った場合の、インデックスブロックの位置を示す類似位置に対応付けられた入力文書のＸＹ位置を決定する。さらに、下からの対応付けを行った場合の、インデックスブロックの位置を示す類似位置に対応付けられた入力文書のＸＹ位置を決定する。 On the other hand, as shown in FIGS. 20 (b) and 20 (c), the result of the association performed in S1608 may not match the result of the association from above and the result of the association from below. In this case, first, the XY position of the input document associated with the similar position indicating the position of the index block when the association is performed from above is determined. Further, when the association is performed from the bottom, the XY position of the input document associated with the similar position indicating the position of the index block is determined.

図２０（ｂ）の例では、インデックスブロックの位置を示す類似位置１９０２に対応付けられた、ＸＹ位置１８０２とＸＹ位置１８０３とが決定される。図２０（Ｃ）の例では、類似位置１９０２に対応付けられた、ＸＹ位置１８０２とＸＹ位置１８０１とが決定される。そして、決定された２つのＸＹ位置のうち、Ｓ１６０４で導出した一致度が高い方を、ＸＹ候補位置群のうちの１つのＸＹ候補位置として決定する。なお、一致度を用いないで、２つのＸＹ位置から１つの中から１つのＸＹ位置を選択してもよい。例えば、２つのＸＹ位置を表示させてユーザからの指示を受け付け、上からの対応付けと下からの対応付けのどちらを利用するかを項目ごとに覚えておいて利用してもよい。 In the example of FIG. 20B, the XY position 1802 and the XY position 1803 associated with the similar position 1902 indicating the position of the index block are determined. In the example of FIG. 20C, the XY position 1802 and the XY position 1801 associated with the similar position 1902 are determined. Then, of the two determined XY positions, the one with the higher degree of coincidence derived in S1604 is determined as one XY candidate position in the XY candidate position group. It should be noted that one XY position may be selected from one of the two XY positions without using the degree of coincidence. For example, two XY positions may be displayed to receive an instruction from the user, and it may be used by remembering for each item whether to use the correspondence from the top or the correspondence from the bottom.

ＸＹ候補位置群から１つのＸＹ候補位置を決定されるとＳ１６１０に進む。Ｓ１６１０では、Ｓ８０７の処理と同様に、ＸＹ候補位置を処理対象のインデックスブロックのある位置として推定して、スキャン画像のテキストブロックから、処理対象の項目のインデックスブロックを推定する処理を行う。Ｓ１６１１はＳ８０８と、Ｓ１６１３はＳ８１０とそれぞれ同一であるため説明を省略する。 When one XY candidate position is determined from the XY candidate position group, the process proceeds to S1610. In S1610, similarly to the processing of S807, the XY candidate position is estimated as the position of the index block to be processed, and the index block of the item to be processed is estimated from the text block of the scanned image. Since S1611 is the same as S808 and S1613 is the same as S810, the description thereof will be omitted.

以上説明したように本実施形態では、入力文書において一致度が閾値以上となるＸＹ候補位置が複数存在した場合に、部分パターンとの一致度が閾値以上となる登録文書の類似位置群との対応付けを行った上で１つのＸＹ候補位置を決定する。このため、インデックスブロックとその周囲のテキストブロックからなる部分レイアウトに類似した領域が文書内に複数存在する場合でも、インデックスブロック推定処理の精度を向上させることができる。 As described above, in the present embodiment, when there are a plurality of XY candidate positions whose matching degree is equal to or higher than the threshold value in the input document, the correspondence with the similar position group of the registered document whose matching degree with the partial pattern is equal to or higher than the threshold value. After making the attachment, one XY candidate position is determined. Therefore, the accuracy of the index block estimation process can be improved even when a plurality of areas similar to the partial layout consisting of the index block and the text block around the index block exist in the document.

＜その他の実施形態＞
上述の実施形態では、画像形成装置１００が単体で図４のフローチャートの各ステップの処理を行う例を説明した。他にも、これらの処理の全部または一部を図３の機能を有するシステム１０５上の他の画像処理装置で行う形態でもよい。 <Other Embodiments>
In the above-described embodiment, an example in which the image forming apparatus 100 performs processing of each step of the flowchart of FIG. 4 by itself has been described. Alternatively, all or part of these processes may be performed by another image processing device on the system 105 having the function shown in FIG.

例えば、スキャン処理を画像形成装置１００で実行して、スキャン画像を端末１０１にネットワークを介して送信する。端末１０１が画像処理部３０５と同様の機能を有しており、端末１０１においてインデックス抽出処理を実行してもよい。この場合、端末１０１はインデックス抽出結果を画像形成装置１００に返信して、画像形成装置１００は取得したインデックス抽出結果に基づきファイル生成およびファイル送信をする。 For example, the scanning process is executed by the image forming apparatus 100, and the scanned image is transmitted to the terminal 101 via the network. The terminal 101 has the same function as the image processing unit 305, and the terminal 101 may execute the index extraction process. In this case, the terminal 101 returns the index extraction result to the image forming apparatus 100, and the image forming apparatus 100 generates a file and transmits a file based on the acquired index extraction result.

本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

１００画像形成装置
３０５画像処理部
１１１ＣＰＵ 100 Image forming apparatus 305 Image processing unit 111 CPU

Claims

An acquisition method for acquiring image data of a manuscript,
A detection means for detecting a text block in an image indicated by the image data, and
From the information that defines the layout of the text block for each document group, a specific means for specifying the document corresponding to the image as a registered document based on a predetermined rule, and
A partial layout that is a layout including a text block corresponding to an item to be processed and at least one text block other than the text block corresponding to the item to be processed among the text blocks specified in the registered document. Based on, an estimation means for estimating a text block corresponding to the item to be processed in the image, and
An extraction means for extracting a character string in the estimated text block as a character string corresponding to the item to be processed, and
An image processing device characterized by having.

The estimation means
By superimposing the partial layout on any position in the search range in the image, a degree of matching based on the size of the area where the text block included in the partial layout and the text block in the image overlap is derived. The image processing apparatus according to claim 1, wherein the estimation is performed.

The estimation means
2. The claim 2 is characterized in that the degree of matching is derived with a predetermined area including a position in the image corresponding to the position of the text block corresponding to the item to be processed in the registered document as the search range. The image processing apparatus according to.

The estimation means
Based on the difference in vertical position between the text block included in the partial layout and the text block in the search range, a vertical position group for estimating the text block corresponding to the item to be processed is determined. The image processing apparatus according to any one of claims 2 or 3, wherein the image processing apparatus is characterized by the above.

The estimation means
The image processing apparatus according to claim 4, wherein the degree of coincidence at each position is derived when the partial layout is superimposed in the horizontal direction of the vertical position group within the search range.

The estimation means
The image processing apparatus according to any one of claims 2 to 5, wherein the estimation is performed based on a position in the image in which the degree of matching is equal to or higher than a threshold value and the degree of matching is maximized. ..

The estimation means
A candidate position in the image where the degree of coincidence is equal to or higher than the threshold value is determined, and the candidate position is determined.
When the number of the candidate positions is one, the candidate positions are determined as positions in the image for performing the estimation.
When the number of the candidate positions is two or more,
When the text block included in the partial layout is superimposed on any position in the registered document, the registered document whose degree of matching derived by the same method as the method for deriving the degree of matching is equal to or greater than the threshold value. The image according to claim 5, wherein the position in the image for performing the estimation is determined by acquiring the position in the image as a similar position and associating the candidate position with the similar position. Processing equipment.

The estimation means
When the number of the candidate positions is two or more and the number of the candidate positions and the number of similar positions are the same.
As a result of associating the candidate positions arranged under the same conditions with the similar positions in order from one side, they are associated with the similar positions corresponding to the positions of the text blocks corresponding to the items to be processed. The image processing apparatus according to claim 7, wherein the candidate position is determined as a position in the image for performing the estimation.

When the number of the candidate positions is two or more, and the number of the candidate positions and the number of similar positions are different.
As a result of associating the candidate positions arranged under the same conditions and the similar positions in order from one side, the candidate positions and the similar positions corresponding to the positions of the text blocks corresponding to the items to be processed are associated with each other. The first position indicated by the candidate position and
As a result of associating the candidate positions arranged under the same conditions with the similar positions in order from the other side, the candidate positions are associated with the similar positions corresponding to the positions of the text blocks corresponding to the items to be processed. The second position indicated by the candidate position was obtained, and
According to claim 7 or 8, the position of the first position and the second position that satisfies a predetermined condition is determined as a position in the image for performing the estimation. The image processing apparatus described.

The estimation means
A text block corresponding to the item to be processed in the registered document is arranged at a position in the image determined based on the degree of matching, and a text block in the image that overlaps with the arranged text block is a predetermined value. The image processing apparatus according to any one of claims 2 to 9, wherein when the conditions are satisfied, the overlapping text blocks are estimated to be text blocks corresponding to the item to be processed in the image.

The predetermined conditions are
The image according to claim 10, wherein the degree of overlap between the text block corresponding to the item to be processed and the overlapping text block is equal to or greater than a predetermined value and the distance between the vertices is within a certain range. Processing equipment.

A predetermined range is set based on the text block corresponding to the item to be processed in the registered document.
The degree of agreement is
When the partial layout is superimposed on the image, the larger the area of the text block that does not overlap with the text block included in the partial layout among the text blocks included in the predetermined range in the image, the higher the degree of matching. The image processing apparatus according to any one of claims 2 to 11, wherein the image processing apparatus is adjusted so as to be lowered.

The text block included in the partial layout is included in a predetermined range based on the text block corresponding to the item to be processed in the registered document and the text block corresponding to the item to be processed in the registered document. The image processing apparatus according to any one of claims 1 to 12, characterized in that it is determined by a text block.

The predetermined range is
12. The image processing apparatus according to.

The specific means
To identify the registered document based on the similarity of the layout of the text blocks by comparing the layout of the text blocks in the image detected by the detection means with the layout of the text blocks of the document group. The image processing apparatus according to any one of claims 1 to 14, which is characterized.

6. Image processing equipment.

Further having a correction means for correcting the image or text blocks in the image,
The image processing apparatus according to any one of claims 1 to 16, wherein the estimation means makes the estimation based on a text block in the image based on the correction.

The image processing apparatus according to claim 17, wherein the correction by the correction means includes at least one of tilt correction, rotation correction, and alignment with the registered document.

The extraction means
The feature is that the text block corresponding to the item to be processed in the estimated image is subjected to OCR processing, and the character string obtained as a result of the OCR processing is extracted as the character string corresponding to the item to be processed. The image processing apparatus according to any one of claims 1 to 18.

Further having a storage means for storing the rules for each document group for making the estimation,
The image processing according to any one of claims 1 to 19, wherein the estimation means acquires the rule of the registered document specified by the specific means and performs the estimation based on the rule. Device.

The invention according to any one of claims 1 to 20, further comprising a setting means for setting the property of the image data based on the character string corresponding to the item to be processed extracted by the extraction means. Image processing equipment.

The acquisition step to acquire the image data of the manuscript and
A detection step for detecting a text block in an image indicated by the image data, and
From the information that defines the layout of the text block for each document group, a specific step that identifies the document corresponding to the image as a registered document based on a predetermined rule, and
A partial layout that is a layout including a text block corresponding to an item to be processed and at least one text block other than the text block corresponding to the item to be processed among the text blocks specified in the registered document. Based on the estimation step of estimating the text block corresponding to the item to be processed in the image,
An extraction step of extracting a character string in the estimated text block as a character string corresponding to the item to be processed, and
An image processing method characterized by having.

A program for causing a computer to function as each means of the image processing apparatus according to any one of claims 1 to 21.