JP2021149459A

JP2021149459A - Image processor, control method, and control program

Info

Publication number: JP2021149459A
Application number: JP2020048290A
Authority: JP
Inventors: 圭祐大島; Keisuke Oshima; 裕紀谷崎; Hiroki Tanizaki; 貴彦深澤; Takahiko Fukazawa
Original assignee: PFU Ltd
Current assignee: PFU Ltd
Priority date: 2020-03-18
Filing date: 2020-03-18
Publication date: 2021-09-27

Abstract

To more correctly provide an image processor that can output letter information described in an input ledger sheet image.SOLUTION: The image processor includes: a storage unit 210 for storing a partial image, or a feature amount, positional information, and letter information that were failed to be recognized in the past, for each of types of a plurality of ledger sheet layouts; a layout detection unit 222 for detecting the type of a ledger sheet layout on the basis of an input ledger sheet image; an extraction unit for extracting an input partial image from the input ledger sheet image on the basis of positional information corresponding to the detected type of the ledger sheet layout; and an output unit 203 for determining that the input ledger sheet image describes letter information for the type of the ledger sheet layout and outputting the letter information, when the input partial image and a partial image for the type of the ledger sheet layout are similar or when the feature amount of the input partial image and the feature amount corresponding to the type of the ledger sheet layout are similar.SELECTED DRAWING: Figure 6

Description

本発明は、画像処理装置、制御方法及び制御プログラムに関し、特に、入力帳票画像を処理する画像処理装置、制御方法及び制御プログラムに関する。 The present invention relates to an image processing device, a control method and a control program, and more particularly to an image processing device, a control method and a control program for processing an input form image.

請求書等の帳票を担当者が手作業によりデータ化している会社では、膨大な数の帳票のデータ化が必要である場合に担当者の業務負担が大きくなるため、帳票のデータ化作業の効率化に対する要望が高まっている。担当者の業務負担を軽減させるために、帳票のデータ化を行う画像処理装置では、入力帳票画像に記載されている文字情報を正しく出力することが望まれている。 In a company where the person in charge manually converts invoices and other forms into data, the work load of the person in charge becomes heavy when it is necessary to convert a huge number of forms into data, so the efficiency of the form data conversion work is efficient. There is a growing demand for conversion. In order to reduce the work load of the person in charge, it is desired that the image processing device that converts the form into data correctly outputs the character information described in the input form image.

画像データから検出した発注者のロゴやシンボルマーク等の模様が、予め定められた模様と類似する場合、画像データが、発注書のＦＡＸ画像のような特定の書類の画像データであると判断する情報処理装置が開示されている（特許文献１）。この情報処理装置は、画像データが特定の書類の画像データであると判断した場合、その予め定められた模様に対応付けられた会社名を会社マスタＤＢから特定する。 When the pattern such as the orderer's logo or symbol mark detected from the image data is similar to the predetermined pattern, it is determined that the image data is the image data of a specific document such as the FAX image of the purchase order. An information processing device is disclosed (Patent Document 1). When the information processing device determines that the image data is the image data of a specific document, the information processing device specifies the company name associated with the predetermined pattern from the company master DB.

特開２０１８−４２０６７号公報JP-A-2018-42067

画像処理装置では、入力帳票画像に記載されている文字情報をより正しく出力することが望まれている。 It is desired that the image processing apparatus output the character information described in the input form image more accurately.

本発明の目的は、入力帳票画像に記載されている文字情報をより正しく出力することが可能な画像処理装置、制御方法及び制御プログラムを提供することにある。 An object of the present invention is to provide an image processing device, a control method, and a control program capable of more accurately outputting character information described in an input form image.

本発明の一側面に係る画像処理装置は、過去に認識に失敗した部分画像又はその部分画像の特徴量、その部分画像の帳票内の位置情報、及び、その部分画像に対応する文字情報が、複数の帳票レイアウトの種類毎に記憶されている記憶部と、入力帳票画像を取得する取得部と、入力帳票画像に基づいて、帳票レイアウトの種類を検出するレイアウト検出部と、検出された帳票レイアウトの種類に対応して記憶部に記憶されている位置情報に基づいて、入力帳票画像から入力部分画像を抽出する抽出部と、入力部分画像と検出された帳票レイアウトの種類に対応して記憶部に記憶されている部分画像とが類似する場合、又は、入力部分画像の特徴量と検出された帳票レイアウトの種類に対応して記憶部に記憶されている部分画像の特徴量とが類似する場合、入力帳票画像には、検出された帳票レイアウトの種類に対応して記憶部に記憶されている文字情報が記載されているものとして、その文字情報を出力する出力部と、を有する。 In the image processing apparatus according to one aspect of the present invention, the partial image that has failed to be recognized in the past or the feature amount of the partial image, the position information of the partial image in the form, and the character information corresponding to the partial image are A storage unit that is stored for each of a plurality of form layout types, an acquisition unit that acquires an input form image, a layout detection unit that detects the type of form layout based on the input form image, and a detected form layout. An extraction unit that extracts an input partial image from an input form image based on the position information stored in the storage unit corresponding to the type of, and a storage unit corresponding to the type of the input partial image and the detected form layout. When the partial image stored in is similar, or when the feature amount of the input partial image is similar to the feature amount of the partial image stored in the storage unit corresponding to the type of the detected form layout. The input form image has an output unit that outputs the character information, assuming that the character information stored in the storage unit is described according to the type of the detected form layout.

本発明の一側面に係る制御方法は、記憶部と、出力部とを有する画像処理装置の制御方法であって、画像処理装置が、過去に認識に失敗した部分画像又はその部分画像の特徴量、その部分画像の帳票内の位置情報、及び、その部分画像に対応する文字情報を、複数の帳票レイアウトの種類毎に記憶部に記憶し、入力帳票画像を取得し、入力帳票画像に基づいて、帳票レイアウトの種類を検出し、検出された帳票レイアウトの種類に対応して記憶部に記憶されている位置情報に基づいて、入力帳票画像から入力部分画像を抽出し、入力部分画像と検出された帳票レイアウトの種類に対応して記憶部に記憶されている部分画像とが類似する場合、又は、入力部分画像の特徴量と検出された帳票レイアウトの種類に対応して記憶部に記憶されている部分画像の特徴量とが類似する場合、入力部分画像には、検出された帳票レイアウトの種類に対応して記憶部に記憶されている文字情報が記載されているものとして、その文字情報を出力部から出力する。 The control method according to one aspect of the present invention is a control method of an image processing device having a storage unit and an output unit, and is a partial image or a feature amount of the partial image that the image processing device has failed to recognize in the past. , The position information in the form of the partial image and the character information corresponding to the partial image are stored in the storage unit for each type of a plurality of form layouts, the input form image is acquired, and based on the input form image. , Detects the type of form layout, extracts the input partial image from the input form image based on the position information stored in the storage unit corresponding to the detected form layout type, and detects it as the input partial image. When the partial image stored in the storage unit corresponds to the type of form layout, or when the feature amount of the input partial image and the detected form layout type are stored in the storage unit. When the feature amount of the partial image is similar to that of the partial image, it is assumed that the input partial image contains the character information stored in the storage unit corresponding to the type of the detected form layout. Output from the output section.

本発明の一側面に係る制御プログラムは、記憶部と、出力部とを有するコンピュータの制御プログラムであって、過去に認識に失敗した部分画像又はその部分画像の特徴量、その部分画像の帳票内の位置情報、及び、その部分画像に対応する文字情報を、複数の帳票レイアウトの種類毎に記憶部に記憶し、入力帳票画像を取得し、入力帳票画像に基づいて、帳票レイアウトの種類を検出し、検出された帳票レイアウトの種類に対応して記憶部に記憶されている位置情報に基づいて、入力帳票画像から入力部分画像を抽出し、入力部分画像と検出された帳票レイアウトの種類に対応して記憶部に記憶されている部分画像とが類似する場合、又は、入力部分画像の特徴量と検出された帳票レイアウトの種類に対応して記憶部に記憶されている部分画像の特徴量とが類似する場合、入力部分画像には、検出された帳票レイアウトの種類に対応して記憶部に記憶されている文字情報が記載されているものとして、その文字情報を出力部から出力することをコンピュータに実行させる。 The control program according to one aspect of the present invention is a computer control program having a storage unit and an output unit, and is a partial image that has failed to be recognized in the past, a feature amount of the partial image, and a form of the partial image. The position information of the above and the character information corresponding to the partial image are stored in the storage unit for each of a plurality of form layout types, the input form image is acquired, and the type of the form layout is detected based on the input form image. Then, based on the position information stored in the storage unit corresponding to the detected form layout type, the input partial image is extracted from the input form image, and the input partial image and the detected form layout type are supported. When the partial image stored in the storage unit is similar, or the feature amount of the input partial image and the feature amount of the partial image stored in the storage unit corresponding to the detected form layout type. If they are similar, it is assumed that the input partial image contains the character information stored in the storage unit corresponding to the detected form layout type, and the character information is output from the output unit. Let the computer do it.

本発明によれば、画像処理装置、制御方法及び制御プログラムは、入力帳票画像に記載されている文字情報をより正しく出力することが可能となる。 According to the present invention, the image processing device, the control method, and the control program can more correctly output the character information described in the input form image.

実施形態に従った画像処理システム１の概略構成を示す図である。It is a figure which shows the schematic structure of the image processing system 1 according to an embodiment. レイアウトテーブルのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of a layout table. 辞書テーブルのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of a dictionary table. 第２記憶装置２１０及び第２処理回路２２０の概略構成を示す図である。It is a figure which shows the schematic structure of the 2nd storage device 210 and the 2nd processing circuit 220. 画像読取処理の動作の例を示すフローチャートである。It is a flowchart which shows the example of the operation of the image reading process. 認識処理の動作の例を示すフローチャートである。It is a flowchart which shows the example of the operation of the recognition process. 登録処理の動作の例を示すフローチャートである。It is a flowchart which shows an example of operation of a registration process. 入力帳票画像８００の一例を示す模式図である。It is a schematic diagram which shows an example of the input form image 800. 他の第２処理回路２３０の概略構成を示すブロック図である。It is a block diagram which shows the schematic structure of another 2nd processing circuit 230.

以下、本発明の一側面に係る画像処理装置、制御方法及び制御プログラムについて図を参照しつつ説明する。但し、本発明の技術的範囲はそれらの実施の形態に限定されず、特許請求の範囲に記載された発明とその均等物に及ぶ点に留意されたい。 Hereinafter, the image processing apparatus, the control method, and the control program according to one aspect of the present invention will be described with reference to the drawings. However, it should be noted that the technical scope of the present invention is not limited to those embodiments, but extends to the inventions described in the claims and their equivalents.

図１は、実施形態に従った画像処理システム１の概略構成を示す図である。図１に示すように、画像処理システム１は、画像読取装置１００と、情報処理装置２００とを有する。 FIG. 1 is a diagram showing a schematic configuration of an image processing system 1 according to an embodiment. As shown in FIG. 1, the image processing system 1 includes an image reading device 100 and an information processing device 200.

画像読取装置１００は、例えばスキャナ装置等である。画像読取装置１００は、情報処理装置２００に接続されている。情報処理装置２００は、画像処理装置の一例であり、例えばパーソナルコンピュータ等である。 The image reading device 100 is, for example, a scanner device or the like. The image reading device 100 is connected to the information processing device 200. The information processing device 200 is an example of an image processing device, such as a personal computer.

画像読取装置１００は、第１インタフェース装置１０１と、撮像装置１０２と、第１記憶装置１１０と、第１処理回路１２０とを有する。 The image reading device 100 includes a first interface device 101, an imaging device 102, a first storage device 110, and a first processing circuit 120.

第１インタフェース装置１０１は、ＵＳＢ（Universal Serial Bus）等のシリアルバスに準じるインタフェース回路を有し、情報処理装置２００と電気的に接続して画像データ及び各種の情報を送受信する。また、第１インタフェース装置１０１の代わりに、無線信号を送受信するアンテナと、所定の通信プロトコルに従って、無線通信回線を通じて信号の送受信を行うための無線通信インタフェース回路とを有する通信装置が用いられてもよい。所定の通信プロトコルは、例えば無線ＬＡＮ（Local Area Network）である。 The first interface device 101 has an interface circuit similar to a serial bus such as USB (Universal Serial Bus), and is electrically connected to the information processing device 200 to transmit and receive image data and various kinds of information. Further, instead of the first interface device 101, a communication device having an antenna for transmitting and receiving wireless signals and a wireless communication interface circuit for transmitting and receiving signals through a wireless communication line according to a predetermined communication protocol may be used. good. The predetermined communication protocol is, for example, a wireless LAN (Local Area Network).

撮像装置１０２は、主走査方向に直線状に配列されたＣＣＤ（Charge Coupled Device）による撮像素子を備える縮小光学系タイプの撮像センサを有する。さらに、撮像装置１０２は、光を照射する光源と、撮像素子上に像を結ぶレンズと、撮像素子から出力された電気信号を増幅してアナログ／デジタル（Ａ／Ｄ）変換するＡ／Ｄ変換器とを有する。撮像装置１０２において、撮像センサは、搬送される媒体を撮像してアナログの画像信号を生成して出力し、Ａ／Ｄ変換器は、このアナログの画像信号をＡ／Ｄ変換してデジタルの入力帳票画像を生成して出力する。入力帳票画像は、各画素データが、例えばＲＧＢ各色毎に８ｂｉｔで表される計２４ｂｉｔのＲ（赤色）値、Ｇ（緑色）値、Ｂ（青色）値からなるカラー多値画像である。なお、ＣＣＤの代わりにＣＭＯＳ（Complementary Metal Oxide Semiconductor）による撮像素子を備える等倍光学系タイプのＣＩＳ（Contact Image Sensor）が用いられてもよい。 The image pickup device 102 has a reduction optical system type image pickup sensor including an image pickup element by CCD (Charge Coupled Device) arranged linearly in the main scanning direction. Further, the image pickup device 102 includes a light source that irradiates light, a lens that forms an image on the image pickup element, and an A / D conversion that amplifies an electric signal output from the image pickup element and performs analog / digital (A / D) conversion. Has a vessel. In the image pickup apparatus 102, the image pickup sensor images the conveyed medium to generate and output an analog image signal, and the A / D converter A / D-converts the analog image signal and digitally inputs it. Generates and outputs a form image. The input form image is a color multi-valued image in which each pixel data is composed of, for example, a total of 24 bits of R (red) value, G (green) value, and B (blue) value represented by 8 bits for each RGB color. Instead of the CCD, a CIS (Contact Image Sensor) of the same magnification optical system type including an image sensor made of CMOS (Complementary Metal Oxide Semiconductor) may be used.

第１記憶装置１１０は、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）等のメモリ装置、ハードディスク等の固定ディスク装置、又はフレキシブルディスク、光ディスク等の可搬用の記憶装置等を有する。また、第１記憶装置１１０には、画像読取装置１００の各種処理に用いられるコンピュータプログラム、データベース、テーブル等が格納される。コンピュータプログラムは、コンピュータ読み取り可能な可搬型記録媒体から公知のセットアッププログラム等を用いて第１記憶装置１１０にインストールされてもよい。可搬型記録媒体は、例えばＣＤ−ＲＯＭ（compact disk read only memory）、ＤＶＤ−ＲＯＭ（digital versatile disk read only memory）等である。また、第１記憶装置１１０は、撮像装置１０２により生成された入力帳票画像等を記憶する。 The first storage device 110 includes a memory device such as a RAM (Random Access Memory) and a ROM (Read Only Memory), a fixed disk device such as a hard disk, or a portable storage device such as a flexible disk and an optical disk. Further, the first storage device 110 stores computer programs, databases, tables, etc. used for various processes of the image reading device 100. The computer program may be installed in the first storage device 110 from a computer-readable portable recording medium using a known setup program or the like. The portable recording medium is, for example, a CD-ROM (compact disk read only memory), a DVD-ROM (digital versatile disk read only memory), or the like. In addition, the first storage device 110 stores an input form image or the like generated by the image pickup device 102.

第１処理回路１２０は、予め第１記憶装置１１０に記憶されているプログラムに基づいて動作する。第１処理回路１２０は、例えばＣＰＵ（Control Processing Unit）である。なお、第１処理回路１２０として、ＤＳＰ（digital signal processor）、ＬＳＩ（large scale integration）、ＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field-Programming Gate Array）等が用いられてもよい。 The first processing circuit 120 operates based on a program stored in the first storage device 110 in advance. The first processing circuit 120 is, for example, a CPU (Control Processing Unit). As the first processing circuit 120, a DSP (digital signal processor), an LSI (large scale integration), an ASIC (Application Specific Integrated Circuit), an FPGA (Field-Programming Gate Array), or the like may be used.

第１処理回路１２０は、第１インタフェース装置１０１、撮像装置１０２及び第１記憶装置１１０等と接続され、これらの各部を制御する。第１処理回路１２０は、撮像装置１０２の媒体読取制御、第１インタフェース装置１０１を介した情報処理装置２００とのデータ送受信制御等を行う。 The first processing circuit 120 is connected to the first interface device 101, the image pickup device 102, the first storage device 110, and the like, and controls each of these parts. The first processing circuit 120 performs media reading control of the image pickup device 102, data transmission / reception control with the information processing device 200 via the first interface device 101, and the like.

情報処理装置２００は、第２インタフェース装置２０１と、入力装置２０２と、表示装置２０３と、第２記憶装置２１０と、第２処理回路２２０とを有する。以下、情報処理装置２００の各部について詳細に説明する。 The information processing device 200 includes a second interface device 201, an input device 202, a display device 203, a second storage device 210, and a second processing circuit 220. Hereinafter, each part of the information processing apparatus 200 will be described in detail.

第２インタフェース装置２０１は、画像読取装置１００の第１インタフェース装置１０１と同様のインタフェース回路を有し、情報処理装置２００と画像読取装置１００とを接続する。また、第２インタフェース装置２０１の代わりに、無線信号を送受信するアンテナと、無線ＬＡＮ等の所定の通信プロトコルに従って、無線通信回線を通じて信号の送受信を行うための無線通信インタフェース回路とを有する通信装置が用いられてもよい。 The second interface device 201 has an interface circuit similar to that of the first interface device 101 of the image reading device 100, and connects the information processing device 200 and the image reading device 100. Further, instead of the second interface device 201, a communication device having an antenna for transmitting and receiving wireless signals and a wireless communication interface circuit for transmitting and receiving signals through a wireless communication line according to a predetermined communication protocol such as a wireless LAN It may be used.

入力装置２０２は、キーボード、マウス等の入力装置及び入力装置から信号を取得するインタフェース回路を有し、利用者の操作に応じた信号を第２処理回路２２０に出力する。 The input device 202 has an input device such as a keyboard and a mouse, and an interface circuit for acquiring a signal from the input device, and outputs a signal according to the operation of the user to the second processing circuit 220.

表示装置２０３は、出力部の一例である。表示装置２０３は、液晶、有機ＥＬ（Electro-Luminescence）等から構成されるディスプレイ及びディスプレイに画像データを出力するインタフェース回路を有する。表示装置２０３は、第２処理回路２２０からの指示に従って、各種の情報をディスプレイに表示する。 The display device 203 is an example of an output unit. The display device 203 includes a display composed of a liquid crystal display, an organic EL (Electro-Luminescence), and the like, and an interface circuit for outputting image data to the display. The display device 203 displays various information on the display according to the instruction from the second processing circuit 220.

第２記憶装置２１０は、記憶部の一例であり、画像読取装置１００の第１記憶装置１１０と同様のメモリ装置、固定ディスク装置、可搬用の記憶装置等を有する。第２記憶装置２１０には、情報処理装置２００の各種処理に用いられるコンピュータプログラム、データベース、テーブル等が格納される。コンピュータプログラムは、例えばＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ等のコンピュータ読み取り可能な可搬型記録媒体から、公知のセットアッププログラム等を用いて第２記憶装置２１０にインストールされてもよい。 The second storage device 210 is an example of a storage unit, and has a memory device, a fixed disk device, a portable storage device, and the like similar to the first storage device 110 of the image reading device 100. The second storage device 210 stores computer programs, databases, tables, and the like used for various processes of the information processing device 200. The computer program may be installed in the second storage device 210 from a computer-readable portable recording medium such as a CD-ROM or a DVD-ROM using a known setup program or the like.

また、第２記憶装置２１０には、データとして、レイアウトテーブル及び辞書テーブル等が予め記憶される。各テーブルの詳細については後述する。 In addition, a layout table, a dictionary table, and the like are stored in advance as data in the second storage device 210. Details of each table will be described later.

第２処理回路２２０は、予め第２記憶装置２１０に記憶されているプログラムに基づいて動作する。第２処理回路２２０は、例えばＣＰＵである。なお、第２処理回路２２０として、ＤＳＰ、ＬＳＩ、ＡＳＩＣ、ＦＰＧＡ等が用いられてもよい。 The second processing circuit 220 operates based on a program stored in the second storage device 210 in advance. The second processing circuit 220 is, for example, a CPU. A DSP, LSI, ASIC, FPGA, or the like may be used as the second processing circuit 220.

第２処理回路２２０は、第２インタフェース装置２０１、入力装置２０２、表示装置２０３及び第２記憶装置２１０等と接続され、これらの各部を制御する。第２処理回路２２０は、第２インタフェース装置２０１を介した画像読取装置１００とのデータ送受信制御、入力装置２０２の入力制御、表示装置２０３の表示制御等を行う。 The second processing circuit 220 is connected to the second interface device 201, the input device 202, the display device 203, the second storage device 210, and the like, and controls each of these parts. The second processing circuit 220 performs data transmission / reception control with the image reading device 100 via the second interface device 201, input control of the input device 202, display control of the display device 203, and the like.

図２は、レイアウトテーブルのデータ構造の一例を示す図である。 FIG. 2 is a diagram showing an example of the data structure of the layout table.

レイアウトテーブルには、複数の帳票レイアウトの種類毎に、各種類の識別情報（種類ＩＤ）、各種類に対応する罫線情報、色情報及びキーワード情報等が関連付けて記憶される。 In the layout table, identification information (type ID) of each type, ruled line information corresponding to each type, color information, keyword information, etc. are stored in association with each type of a plurality of form layouts.

罫線情報は、帳票が撮像された帳票画像に含まれる罫線で示される図形の画像パターン及びその図形の帳票画像内の位置を示す。例えば、罫線で示される図形は表であり、画像パターンとして表全体の画像パターンが設定される。なお、画像パターンとして表内の水平方向に延伸する直線と垂直方向に延伸する直線との各交点の画像パターンが設定されてもよい。また、図形の位置として、帳票画像内のその図形の外接矩形の左上角及び右下角の座標等が設定される。なお、罫線情報として、帳票画像内の水平又は垂直方向に延伸する各直線の位置が設定されてもよい。また、罫線情報として、複数の図形の画像パターン及び各図形の帳票画像内の位置が設定されてもよい。 The ruled line information indicates the image pattern of the figure indicated by the ruled line included in the form image in which the form is captured and the position of the figure in the form image. For example, the figure indicated by the ruled line is a table, and the image pattern of the entire table is set as the image pattern. As an image pattern, an image pattern at each intersection of a straight line extending in the horizontal direction and a straight line extending in the vertical direction in the table may be set. Further, as the position of the figure, the coordinates of the upper left corner and the lower right corner of the circumscribed rectangle of the figure in the form image are set. As the ruled line information, the position of each straight line extending in the horizontal or vertical direction in the form image may be set. Further, as the ruled line information, image patterns of a plurality of figures and positions in the form image of each figure may be set.

色情報は、帳票画像に含まれる色に関する情報を示す。例えば、色情報として、画像の二種類の色差（Ｕ、Ｖ）のそれぞれについて、各色差値（Ｕ値、Ｖ値）を階級とし、帳票画像内で各色差値を示す画素の数を度数としたヒストグラムが設定される。 The color information indicates information about the color included in the form image. For example, as color information, for each of the two types of color difference (U, V) in the image, each color difference value (U value, V value) is set as a class, and the number of pixels indicating each color difference value in the form image is set as a frequency. The histogram is set.

キーワード情報は、一又は複数の文字（キーワード）及び各文字の帳票画像内の位置を示す。文字（キーワード）は、例えば請求書、領収書、金額等の単語、特にタイトルに用いられる単語である。文字の位置として、帳票画像内のその文字の外接矩形の左上角及び右下角の座標等が設定される。 The keyword information indicates one or more characters (keywords) and the position of each character in the form image. Characters (keywords) are words such as invoices, receipts, and amounts, especially words used in titles. As the position of the character, the coordinates of the upper left corner and the lower right corner of the circumscribed rectangle of the character in the form image are set.

図３は、辞書テーブルのデータ構造の一例を示す図である。 FIG. 3 is a diagram showing an example of the data structure of the dictionary table.

辞書テーブルには、複数の帳票レイアウトの種類毎に、各種類の種類ＩＤ及び各種類に対応する一又は複数の辞書が関連付けて記憶される。辞書は、情報群の一例であり、それぞれ部分画像、特徴量、位置情報、文字情報及び優先順位等を含む。各辞書は、後述する登録処理において利用者により設定される。なお、各辞書の内の一部又は全部は、情報処理装置２００の出荷時に事前に設定されてもよい。 In the dictionary table, each type ID and one or a plurality of dictionaries corresponding to each type are associated and stored for each type of a plurality of form layouts. The dictionary is an example of an information group, and each includes a partial image, a feature amount, a position information, character information, a priority order, and the like. Each dictionary is set by the user in the registration process described later. A part or all of each dictionary may be set in advance at the time of shipment of the information processing apparatus 200.

部分画像は、過去に検出対象である対象文字の認識に失敗した入力帳票画像内の一部の画像である。部分画像として、その入力帳票画像内で、その入力帳票画像の特徴、特にその対象文字に対応する特徴を含む領域の画像が設定される。例えば、部分画像として、対象文字が含まれる画像が設定される。なお、部分画像として、対象文字が含まれない画像が設定されてもよい。 The partial image is a part of the input form image in which the recognition of the target character to be detected has failed in the past. As a partial image, an image of an area including features of the input form image, particularly features corresponding to the target characters, is set in the input form image. For example, an image including a target character is set as a partial image. An image that does not include the target character may be set as the partial image.

特徴量は、対応する部分画像から算出される特徴量である。特徴量として、例えばＡ−ＫＡＺＥ特徴量又はＯＲＢ（Oriented FAST and Rotated Binary Robust Independent Elementary Features）特徴量等が使用される。なお、特徴量として、ハールライク（Haar-Like）特徴量、ＨＯＧ（Histograms of Oriented Gradients）特徴量等の他の特徴量が使用されてもよい。ハールライク特徴量は、画像領域中に任意に設定された複数の隣接矩形領域間の輝度値の差である。ＨＯＧ特徴量は、画像領域内の局所領域（セル）の画素値の勾配方向毎の勾配強度のヒストグラムである。なお、特徴量として、複数の種類の特徴量が設定されてもよい。また、部分画像又は特徴量の内の何れか一方は省略されてもよい。 The feature amount is a feature amount calculated from the corresponding partial image. As the feature amount, for example, A-KAZE feature amount or ORB (Oriented FAST and Rotated Binary Robust Independent Elementary Features) feature amount or the like is used. As the feature amount, other feature amounts such as Haar-Like feature amount and HOG (Histograms of Oriented Gradients) feature amount may be used. The Haar-like feature amount is the difference in luminance value between a plurality of adjacent rectangular areas arbitrarily set in the image area. The HOG feature amount is a histogram of the gradient intensity for each gradient direction of the pixel values of the local region (cell) in the image region. As the feature amount, a plurality of types of feature amount may be set. Moreover, either one of the partial image and the feature amount may be omitted.

位置情報は、その部分画像の帳票内の位置を示す。位置情報として、例えば入力帳票画像内のその部分画像の左上角及び右下角の座標等が設定される。 The position information indicates the position of the partial image in the form. As the position information, for example, the coordinates of the upper left corner and the lower right corner of the partial image in the input form image are set.

文字情報は、その部分画像に対応する対象文字を示す。対象文字は、その部分画像を含む入力帳票画像における検出対象であり、過去に認識又は特定に失敗された文字（正解文字）である。例えば、文字情報は、その部分画像に記載されている実際の文字を示す。なお、文字情報は、入力帳票画像内で部分画像以外の領域に記載されている実際の文字を示してもよい。例えば、帳票が請求書である場合、対象文字として請求元の会社名等が設定される。また、帳票が領収書である場合、対象文字として発行元の会社名等が設定される。 The character information indicates a target character corresponding to the partial image. The target character is a character to be detected in the input form image including the partial image, and is a character (correct answer character) that has failed to be recognized or specified in the past. For example, the character information indicates the actual characters described in the partial image. The character information may indicate actual characters described in an area other than the partial image in the input form image. For example, when the form is an invoice, the company name of the billing source is set as the target character. If the form is a receipt, the issuer's company name or the like is set as the target character.

優先順位は、各辞書が参照される順序を示す。 The priority indicates the order in which each dictionary is referenced.

図４は、第２記憶装置２１０及び第２処理回路２２０の概略構成を示す図である。 FIG. 4 is a diagram showing a schematic configuration of the second storage device 210 and the second processing circuit 220.

図４に示すように第２記憶装置２１０には、取得プログラム２１１、レイアウト検出プログラム２１２、抽出プログラム２１３、文字特定プログラム２１４、文字認識プログラム２１５、出力制御プログラム２１６及び登録プログラム２１７等の各プログラムが記憶される。これらの各プログラムは、プロセッサ上で動作するソフトウェアにより実装される機能モジュールである。第２処理回路２２０は、第２記憶装置２１０に記憶された各プログラムを読み取り、読み取った各プログラムに従って動作する。これにより、第２処理回路２２０は、取得部２２１、レイアウト検出部２２２、抽出部２２３、文字特定部２２４、文字認識部２２５、出力制御部２２６及び登録部２２７として機能する。 As shown in FIG. 4, the second storage device 210 includes programs such as an acquisition program 211, a layout detection program 212, an extraction program 213, a character identification program 214, a character recognition program 215, an output control program 216, and a registration program 217. Be remembered. Each of these programs is a functional module implemented by software running on the processor. The second processing circuit 220 reads each program stored in the second storage device 210, and operates according to each read program. As a result, the second processing circuit 220 functions as an acquisition unit 221, a layout detection unit 222, an extraction unit 223, a character identification unit 224, a character recognition unit 225, an output control unit 226, and a registration unit 227.

図５は、画像読取装置１００による画像読取処理の動作の例を示すフローチャートである。以下、図５に示したフローチャートを参照しつつ、画像読取処理の動作を説明する。なお、以下に説明する動作のフローは、予め第１記憶装置１１０に記憶されているプログラムに基づき主に第１処理回路１２０により画像読取装置１００の各要素と協働して実行される。 FIG. 5 is a flowchart showing an example of the operation of the image reading process by the image reading device 100. Hereinafter, the operation of the image reading process will be described with reference to the flowchart shown in FIG. The operation flow described below is mainly executed by the first processing circuit 120 in cooperation with each element of the image reading device 100 based on the program stored in the first storage device 110 in advance.

最初に、撮像装置１０２は、原稿として請求書、通知書又は証明書等の帳票を撮像して入力帳票画像を生成し、第１記憶装置１１０に保存する（ステップＳ１０１）。 First, the imaging device 102 captures a form such as an invoice, a notification, or a certificate as a manuscript, generates an input form image, and stores it in the first storage device 110 (step S101).

次に、第１処理回路１２０は、第１記憶装置１１０に保存された入力帳票画像を、第１インタフェース装置１０１を介して情報処理装置２００に送信し（ステップＳ１０２）、一連のステップを終了する。 Next, the first processing circuit 120 transmits the input form image stored in the first storage device 110 to the information processing device 200 via the first interface device 101 (step S102), and ends a series of steps. ..

図６は、情報処理装置２００による認識処理の動作の例を示すフローチャートである。以下、図６に示したフローチャートを参照しつつ、認識処理の動作を説明する。なお、以下に説明する動作のフローは、予め第２記憶装置２１０に記憶されているプログラムに基づき主に第２処理回路２２０により情報処理装置２００の各要素と協同して実行される。 FIG. 6 is a flowchart showing an example of the operation of the recognition process by the information processing apparatus 200. Hereinafter, the operation of the recognition process will be described with reference to the flowchart shown in FIG. The operation flow described below is mainly executed by the second processing circuit 220 in cooperation with each element of the information processing device 200 based on the program stored in the second storage device 210 in advance.

最初に、取得部２２１は、入力帳票画像を、第２インタフェース装置２０１を介して画像読取装置１００から取得し、第２記憶装置２１０に保存する（ステップＳ２０１）。 First, the acquisition unit 221 acquires the input form image from the image reading device 100 via the second interface device 201 and stores it in the second storage device 210 (step S201).

次に、レイアウト検出部２２２は、レイアウトテーブルを参照し、入力帳票画像に基づいて、帳票レイアウトの種類を検出する（ステップＳ２０２）。 Next, the layout detection unit 222 refers to the layout table and detects the type of the form layout based on the input form image (step S202).

レイアウト検出部２２２は、まず、入力帳票画像から罫線を検出する。レイアウト検出部２２２は、入力帳票画像からエッジ画素を抽出し、入力帳票画像をエッジ画素と非エッジ画素に二値化したエッジ画像を生成する。レイアウト検出部２２２は、入力帳票画像内の画素の水平方向の両隣の画素の階調値の差の絶対値（以下、隣接差分値と称する）を算出し、隣接差分値が第１閾値を越える場合、その入力帳票画像上の画素をエッジ画素として抽出する。階調値は、輝度値又は色値（Ｒ値、Ｇ値又はＢ値）である。第１閾値は、例えば、人が画像上の輝度の違いを目視により判別可能な輝度値の差（例えば２０）に設定することができる。レイアウト検出部２２２は、垂直方向についても隣接差分値を算出し、隣接差分値が第１閾値を越える場合、その入力帳票画像上の画素もエッジ画素として抽出する。一方、レイアウト検出部２２２は、エッジ画素として抽出されなかった画素を非エッジ画素として抽出する。 The layout detection unit 222 first detects a ruled line from the input form image. The layout detection unit 222 extracts edge pixels from the input form image and generates an edge image obtained by binarizing the input form image into edge pixels and non-edge pixels. The layout detection unit 222 calculates the absolute value of the difference between the gradation values of the pixels on both sides of the pixel in the horizontal direction in the input form image (hereinafter referred to as the adjacent difference value), and the adjacent difference value exceeds the first threshold value. In this case, the pixels on the input form image are extracted as edge pixels. The gradation value is a luminance value or a color value (R value, G value or B value). The first threshold value can be set, for example, to a difference in brightness value (for example, 20) that allows a person to visually discriminate the difference in brightness on the image. The layout detection unit 222 also calculates the adjacent difference value in the vertical direction, and when the adjacent difference value exceeds the first threshold value, the pixel on the input form image is also extracted as an edge pixel. On the other hand, the layout detection unit 222 extracts the pixels that were not extracted as edge pixels as non-edge pixels.

なお、レイアウト検出部２２２は、入力帳票画像内の画素から水平又は垂直方向に所定距離だけ離れた画素の階調値の差の絶対値を隣接差分値として算出してもよい。また、レイアウト検出部２２２は、特定の画素の階調値が第１閾値未満であり、その特定の画素に隣接する画素又はその特定の画素から所定距離だけ離れた画素の階調値が第１閾値以上である場合、その特定の画素をエッジ画素として抽出してもよい。 The layout detection unit 222 may calculate the absolute value of the difference in the gradation values of the pixels horizontally or vertically separated from the pixels in the input form image by a predetermined distance as the adjacent difference value. Further, in the layout detection unit 222, the gradation value of a specific pixel is less than the first threshold value, and the gradation value of a pixel adjacent to the specific pixel or a pixel separated by a predetermined distance from the specific pixel is the first. If it is equal to or more than the threshold value, the specific pixel may be extracted as an edge pixel.

次に、レイアウト検出部２２２は、例えば、エッジ画像内でエッジ画素により非エッジ画素が囲まれた領域を、表のセルに対応するセル領域として検出する。レイアウト検出部２２２は、エッジ画像内で相互に隣接するエッジ画素で囲まれる第１連結領域をラベリングによりグループ化し、各第１連結領域の内、水平又は垂直方向のサイズが第１サイズ以上である第１連結領域を抽出する。第１サイズは、内部に文字を含むことが可能なサイズに設定され、例えば１６ポイントに相当する画素数に設定される。レイアウト検出部２２２は、抽出した各第１連結領域内で、隣接する非エッジ画素で囲まれる第２連結領域をラベリングによりグループ化し、各第２連結領域の内、水平又は垂直方向の長さが第２サイズ以上である第２連結領域を抽出する。第２サイズは、文字の最低サイズに設定され、例えば８ポイントに相当する画素数に設定される。レイアウト検出部２２２は、抽出した第２連結領域に隣接し且つその第２連結領域を囲むエッジ画素で囲まれた領域（第２連結領域を除く領域）をセル領域として検出する。 Next, the layout detection unit 222 detects, for example, a region in the edge image in which the non-edge pixels are surrounded by the edge pixels as a cell region corresponding to the cells in the table. The layout detection unit 222 groups the first connected regions surrounded by edge pixels adjacent to each other in the edge image by labeling, and the size of each first connected region in the horizontal or vertical direction is the first size or more. The first connection region is extracted. The first size is set to a size that can include characters inside, and is set to, for example, the number of pixels corresponding to 16 points. The layout detection unit 222 groups the second connection region surrounded by the adjacent non-edge pixels in each extracted first connection region by labeling, and the length of each second connection region in the horizontal or vertical direction is set. The second connecting region having a second size or larger is extracted. The second size is set to the minimum size of characters, for example, the number of pixels corresponding to 8 points. The layout detection unit 222 detects a region (area excluding the second connection region) adjacent to the extracted second connection region and surrounded by edge pixels surrounding the second connection region as a cell region.

なお、レイアウト検出部２２２は、エッジ画像内でエッジ画素が連続する直線を抽出し、抽出した直線で囲まれる領域（その内側領域を除く領域）をセル領域として検出してもよい。その場合、レイアウト検出部２２２は、例えばモルフォロジー変換を用いて、直線を抽出する。レイアウト検出部２２２は、エッジ画像内で水平方向において非エッジ画素と隣接するエッジ画素を非エッジ画素に変換する収縮処理を所定回数（第１サイズ分）実行した後、エッジ画素と隣接する非エッジ画素をエッジ画素に変換する膨張処理を所定回数実行する。レイアウト検出部２２２は、残ったエッジ画素を水平方向に延伸する直線として抽出する。同様に、レイアウト検出部２２２は、エッジ画像内で、垂直方向において非エッジ画素と隣接するエッジ画素を非エッジ画素に変換する収縮処理を所定回数実行した後、エッジ画素と隣接する非エッジ画素をエッジ画素に変換する膨張処理を所定回数実行する。レイアウト検出部２２２は、残ったエッジ画素を垂直方向に延伸する直線として抽出する。 The layout detection unit 222 may extract a straight line in which edge pixels are continuous in the edge image, and detect a region surrounded by the extracted straight line (a region excluding the inner region thereof) as a cell region. In that case, the layout detection unit 222 extracts a straight line by using, for example, a morphology transformation. The layout detection unit 222 executes shrinkage processing for converting edge pixels adjacent to non-edge pixels into non-edge pixels in the edge image a predetermined number of times (for the first size) in the horizontal direction, and then performs non-edges adjacent to the edge pixels. The expansion process for converting pixels into edge pixels is executed a predetermined number of times. The layout detection unit 222 extracts the remaining edge pixels as straight lines extending in the horizontal direction. Similarly, the layout detection unit 222 executes a shrinkage process for converting an edge pixel adjacent to a non-edge pixel into a non-edge pixel in the edge image a predetermined number of times, and then performs a shrinkage process to convert the non-edge pixel adjacent to the edge pixel into a non-edge pixel. The expansion process for converting to edge pixels is executed a predetermined number of times. The layout detection unit 222 extracts the remaining edge pixels as straight lines extending in the vertical direction.

次に、レイアウト検出部２２２は、レイアウトテーブルに記憶されたレイアウトの種類毎に、対応する罫線情報に示される画像パターンと、エッジ画像から検出されたセル領域との罫線類似度を算出する。レイアウト検出部２２２は、エッジ画像から、各罫線情報に示される位置に対応し且つ画像パターンと同一の大きさを有する領域を、その位置をずらしながら切り出した切り出し画像と、画像パターンとの類似の程度を算出する。類似の程度は、例えば正規化相互相関値である。なお、類似の程度は、ＳＳＤ（Sum of Squared Difference）の逆数又はＳＡＤ（Sum of Absolute Difference）の逆数でもよい。レイアウト検出部２２２は、各切り出し画像と画像パターンとの類似の程度の最大値をそのレイアウトの種類に対する罫線類似度として算出する。なお、罫線情報として複数の図形の画像パターンが設定されている場合、レイアウト検出部２２２は、複数の図形の画像パターン毎に算出した類似の程度の最大値の平均値、中央値、最小値又は最大値をその帳票データに対する罫線類似度として算出する。 Next, the layout detection unit 222 calculates the ruled line similarity between the image pattern shown in the corresponding ruled line information and the cell area detected from the edge image for each type of layout stored in the layout table. The layout detection unit 222 cuts out a region corresponding to the position shown in each ruled line information and having the same size as the image pattern from the edge image while shifting the position, and the cut-out image similar to the image pattern. Calculate the degree. The degree of similarity is, for example, the normalized cross-correlation value. The degree of similarity may be the reciprocal of SSD (Sum of Squared Difference) or the reciprocal of SAD (Sum of Absolute Difference). The layout detection unit 222 calculates the maximum value of the degree of similarity between each cropped image and the image pattern as the ruled line similarity with respect to the layout type. When the image patterns of a plurality of figures are set as the ruled line information, the layout detection unit 222 calculates the average value, the median value, the minimum value, or the maximum value of the similar degree calculated for each image pattern of the plurality of figures. The maximum value is calculated as the ruled line similarity to the form data.

また、罫線情報として、画像内の水平又は垂直方向に延伸する各直線の位置が設定されている場合、レイアウト検出部２２２は、公知の画像処理技術を利用して、入力帳票画像から直線を検出する。レイアウト検出部２２２は、各レイアウトの種類に対する罫線情報において設定された直線の総数に対する、入力帳票画像の対応する位置から検出された直線の数の割合を、そのレイアウトの種類に対する罫線類似度として算出する。 Further, when the position of each straight line extending in the horizontal or vertical direction in the image is set as the ruled line information, the layout detection unit 222 detects the straight line from the input form image by using a known image processing technique. do. The layout detection unit 222 calculates the ratio of the number of straight lines detected from the corresponding positions of the input form image to the total number of straight lines set in the ruled line information for each layout type as the ruled line similarity with respect to the layout type. do.

また、レイアウト検出部２２２は、画像の二種類の色差のそれぞれについて、各色差値を階級とし、入力帳票画像内で各色差値を示す画素数を度数とするヒストグラムを生成する。次に、レイアウト検出部２２２は、レイアウトテーブルに記憶されたレイアウトの種類毎に、対応する色情報に示されるヒストグラムと、入力帳票画像から生成されたヒストグラムとの色類似度を算出する。レイアウト検出部２２２は、色情報に示される各ヒストグラムと、入力帳票画像から生成した各ヒストグラムとの類似の程度を算出し、算出した類似の程度の平均値又は合計値等を、各レイアウトの種類に対する色類似度として算出する。類似の程度は、例えば各ヒストグラムの各階級の度数を要素とする各ベクトルの内積値である。 Further, the layout detection unit 222 generates a histogram in which each color difference value is a class and the number of pixels indicating each color difference value in the input form image is a frequency for each of the two types of color differences in the image. Next, the layout detection unit 222 calculates the color similarity between the histogram shown in the corresponding color information and the histogram generated from the input form image for each type of layout stored in the layout table. The layout detection unit 222 calculates the degree of similarity between each histogram shown in the color information and each histogram generated from the input form image, and calculates the average value or the total value of the calculated degree of similarity for each layout type. It is calculated as the color similarity to. The degree of similarity is, for example, the dot product value of each vector having the frequency of each class of each histogram as an element.

また、レイアウト検出部２２２は、レイアウトテーブルに記憶されたレイアウトの種類毎に、対応するキーワード情報に示される位置に対応する入力帳票画像内の位置から文字を検出する。レイアウト検出部２２２は、公知のＯＣＲ（Optical Character Recognition）技術を利用して、文字を検出する。レイアウト検出部２２２は、検出した文字がキーワード情報に示されるキーワードと一致するか否かを判定する。レイアウト検出部２２２は、キーワード情報において設定されたキーワードの総数に対する、入力帳票画像から検出した文字と一致したキーワードの数の割合を、各レイアウトの種類に対するキーワード類似度として算出する。 Further, the layout detection unit 222 detects characters from the positions in the input form image corresponding to the positions shown in the corresponding keyword information for each type of layout stored in the layout table. The layout detection unit 222 detects characters by using a known OCR (Optical Character Recognition) technique. The layout detection unit 222 determines whether or not the detected characters match the keywords shown in the keyword information. The layout detection unit 222 calculates the ratio of the number of keywords that match the characters detected from the input form image to the total number of keywords set in the keyword information as the keyword similarity for each layout type.

レイアウト検出部２２２は、レイアウトテーブルに記憶されたレイアウトの種類毎に、算出した罫線類似度、色類似度及びキーワード類似度の平均値を、各レイアウトの種類に対する類似度として算出する。一般に、種類が異なるレイアウトでは、表または直線等の罫線の配置が異なっている可能性が高いが、色は類似している可能性が高い。そこで、レイアウト検出部２２２は、各レイアウトの種類に対するレイアウト類似度として、罫線類似度、キーワード類似度、色類似度の順に重みが大きくなるように罫線類似度、キーワード類似度及び色類似度の重み付け和を算出してもよい。なお、レイアウト検出部２２２は、罫線類似度、色類似度及びキーワード類似度の内の何れか一つ又は二つに基づいて類似度を算出してもよい。レイアウト検出部２２２は、レイアウトテーブルに記憶された帳票レイアウトの種類の内、類似度が最大である帳票レイアウトの種類を、入力帳票画像の帳票レイアウトの種類として検出する。 The layout detection unit 222 calculates the average values of the calculated ruled line similarity, color similarity, and keyword similarity for each layout type stored in the layout table as the similarity for each layout type. In general, different types of layouts are likely to have different arrangements of ruled lines such as tables or straight lines, but are likely to be similar in color. Therefore, the layout detection unit 222 weights the ruled line similarity, the keyword similarity, and the color similarity so that the weight increases in the order of the ruled line similarity, the keyword similarity, and the color similarity as the layout similarity for each layout type. The sum may be calculated. The layout detection unit 222 may calculate the similarity based on any one or two of the ruled line similarity, the color similarity, and the keyword similarity. The layout detection unit 222 detects the type of the form layout having the maximum similarity among the types of the form layout stored in the layout table as the type of the form layout of the input form image.

次に、抽出部２２３は、レイアウト検出部２２２により検出された帳票レイアウトの種類に対応する辞書が辞書テーブルに記憶されているか否かを判定する（ステップＳ２０３）。検出された帳票レイアウトの種類に対応する辞書が辞書テーブルに記憶されていない場合、抽出部２２３は、処理をステップＳ２０９へ移行する。 Next, the extraction unit 223 determines whether or not the dictionary corresponding to the type of form layout detected by the layout detection unit 222 is stored in the dictionary table (step S203). If the dictionary corresponding to the detected form layout type is not stored in the dictionary table, the extraction unit 223 shifts the process to step S209.

一方、検出された帳票レイアウトの種類に対応する辞書が辞書テーブルに記憶されている場合、抽出部２２３は、検出された帳票レイアウトの種類に対応する辞書の中から特定の辞書を抽出する（ステップＳ２０４）。例えば、抽出部２２３は、各辞書に対応付けられた優先順位が高い順に、辞書を抽出する。 On the other hand, when the dictionary corresponding to the detected form layout type is stored in the dictionary table, the extraction unit 223 extracts a specific dictionary from the dictionaries corresponding to the detected form layout type (step). S204). For example, the extraction unit 223 extracts dictionaries in descending order of priority associated with each dictionary.

次に、抽出部２２３は、抽出した辞書に記憶されている位置情報に基づいて、入力帳票画像から入力部分画像を抽出する（ステップＳ２０５）。抽出部２２３は、入力帳票画像内で、抽出した辞書に記憶されている位置情報に示される領域の画像を入力部分画像として抽出する。このように、抽出部２２３は、検出された帳票レイアウトの種類に対応して辞書テーブルに記憶されている位置情報に基づいて、入力帳票画像から入力部分画像を抽出する。 Next, the extraction unit 223 extracts an input partial image from the input form image based on the position information stored in the extracted dictionary (step S205). The extraction unit 223 extracts an image of a region shown in the position information stored in the extracted dictionary as an input partial image in the input form image. In this way, the extraction unit 223 extracts the input partial image from the input form image based on the position information stored in the dictionary table corresponding to the detected type of form layout.

次に、文字特定部２２４は、抽出された入力部分画像と、抽出された辞書に記憶されている部分画像とが類似するか否かを判定する（ステップＳ２０６）。文字特定部２２４は、入力部分画像と部分画像との類似の程度を算出し、算出した類似の程度が閾値以上であるか否かにより、入力部分画像と部分画像とが類似するか否かを判定する。類似の程度は、例えば正規化相互相関値である。なお、類似の程度は、ＳＳＤの逆数又はＳＡＤの逆数でもよい。このように、文字特定部２２４は、抽出された入力部分画像と、検出された帳票レイアウトの種類に対応して辞書テーブルに記憶されている部分画像とが類似するか否かを判定する。 Next, the character identification unit 224 determines whether or not the extracted input partial image and the partial image stored in the extracted dictionary are similar (step S206). The character identification unit 224 calculates the degree of similarity between the input partial image and the partial image, and determines whether or not the input partial image and the partial image are similar depending on whether or not the calculated degree of similarity is equal to or greater than the threshold value. judge. The degree of similarity is, for example, the normalized cross-correlation value. The degree of similarity may be the reciprocal of SSD or the reciprocal of SAD. In this way, the character identification unit 224 determines whether or not the extracted input partial image and the partial image stored in the dictionary table corresponding to the detected type of form layout are similar.

入力部分画像と部分画像が類似する場合、文字特定部２２４は、抽出された辞書に記憶されている文字情報に示される文字を、入力帳票画像から検出する対象文字として特定し（ステップＳ２０７）、処理をステップＳ２１０へ移行する。このように、文字特定部２２４は、入力部分画像と、検出された帳票レイアウトの種類に対応して記憶されている部分画像とが類似する場合、入力帳票画像に、その帳票レイアウトの種類に対応して記憶されている文字情報が記載されているものとする。 When the input partial image and the partial image are similar, the character identification unit 224 identifies the character shown in the character information stored in the extracted dictionary as the target character to be detected from the input form image (step S207). The process proceeds to step S210. In this way, when the input partial image and the partial image stored corresponding to the detected form layout type are similar, the character identification unit 224 corresponds to the input form image and the type of the form layout. It is assumed that the character information stored in the above is described.

一方、入力部分画像と部分画像が類似しない場合、文字特定部２２４は、検出された帳票レイアウトの種類に対応する辞書の内、まだ処理されていない辞書が存在するか否かを判定する（ステップＳ２０８）。 On the other hand, when the input partial image and the partial image are not similar, the character identification unit 224 determines whether or not there is a dictionary that has not been processed yet among the dictionaries corresponding to the detected form layout type (step). S208).

まだ処理されていない辞書が存在する場合、文字特定部２２４は、処理をステップＳ２０４へ戻し、ステップＳ２０４〜Ｓ２０８の処理を繰り返す。この場合、ステップＳ２０４において、抽出部２２３は、検出された帳票レイアウトの種類に対応し且つまだ処理されていない辞書の中で、優先順位が最も高い辞書を抽出する。ステップＳ２０５において、抽出部２２３は、新たに抽出された辞書に含まれる位置情報に基づいて、入力帳票画像から第２入力部分画像を抽出する。第２入力部分画像と新たに抽出された辞書に含まれる部分画像とが類似する場合、ステップＳ２０７において、文字特定部２２４は、新たに抽出された辞書に記憶されている文字情報に示される文字を、入力帳票画像から検出する対象文字として特定する。 If there is a dictionary that has not been processed yet, the character identification unit 224 returns the processing to step S204 and repeats the processing of steps S204 to S208. In this case, in step S204, the extraction unit 223 extracts the dictionary having the highest priority among the dictionaries corresponding to the detected form layout type and which have not been processed yet. In step S205, the extraction unit 223 extracts the second input partial image from the input form image based on the position information included in the newly extracted dictionary. When the second input partial image and the partial image included in the newly extracted dictionary are similar, in step S207, the character identification unit 224 is the character indicated in the character information stored in the newly extracted dictionary. Is specified as the target character to be detected from the input form image.

このように、入力部分画像と、検出された帳票レイアウトの種類に対応して辞書テーブルに記憶されている特定の辞書に含まれる部分画像とが類似しない場合、抽出部２２３は、その帳票レイアウトの種類に対応して記憶されている他の辞書を抽出する。そして、抽出部２２３は、抽出した他の辞書に含まれる位置情報に基づいて、入力帳票画像から第２入力部分画像を抽出する。文字特定部２２４は、第２入力部分画像とその他の辞書に含まれる部分画像とが類似する場合、入力帳票画像には、その他の辞書に含まれる文字情報が記載されているものとする。これにより、文字特定部２２４は、それぞれ別個の領域に対応する複数の辞書を用いて、検出対象の文字をより精度良く特定することができる。 In this way, when the input partial image and the partial image included in the specific dictionary stored in the dictionary table corresponding to the type of the detected form layout are not similar, the extraction unit 223 determines the form layout. Extract other dictionaries stored according to the type. Then, the extraction unit 223 extracts the second input partial image from the input form image based on the position information included in the other extracted dictionaries. When the second input partial image and the partial image included in the other dictionary are similar to each other, the character identification unit 224 assumes that the character information included in the other dictionary is described in the input form image. As a result, the character identification unit 224 can more accurately identify the character to be detected by using a plurality of dictionaries corresponding to different regions.

一方、検出された帳票レイアウトの種類に対応する全ての辞書について既に処理された場合、文字認識部２２５は、入力帳票画像からＯＣＲにより、対象文字を示す文字情報を認識する（ステップＳ２０９）。 On the other hand, when all the dictionaries corresponding to the detected form layout types have already been processed, the character recognition unit 225 recognizes the character information indicating the target character from the input form image by OCR (step S209).

例えば、文字認識部２２５は、公知のＯＣＲ技術を利用して、入力帳票画像から「会社名」等のキーワードを検出する。文字認識部２２５は、検出した文字列に対して所定の位置関係を有する文字を対象文字として検出する。所定の位置関係は、方向（例えば右側、下側、右下側）及び距離（例えば３０ｍｍに相当する画素内）を含み、事前に設定される。 For example, the character recognition unit 225 detects a keyword such as "company name" from the input form image by using a known OCR technique. The character recognition unit 225 detects a character having a predetermined positional relationship with respect to the detected character string as a target character. The predetermined positional relationship includes a direction (for example, right side, lower side, lower right side) and a distance (for example, within a pixel corresponding to 30 mm) and is set in advance.

なお、文字認識部２２５は、入力帳票画像から「会社」等のキーワードを含む文字列を検出してもよい。文字認識部２２５は、所定の優先順位に従って、検出した文字列の中から対象文字を抽出する。所定の優先順位は、位置条件（例えば検出した文字列の内、最も右側又は最も下側に位置する文字列）等を含み、事前に設定される。例えば、帳票が請求書であり、対象文字が請求元の会社名である場合、請求書には、請求元の会社名と請求先の会社名が含まれる可能性が高い。一般に、請求先の会社名は左上側に記載され、請求元の会社名は右下側に記載される可能性が高い。そのため、文字認識部２２５は、検出した文字列の内、最も右側又は最も下側に位置する文字列を対象文字として検出することにより、請求元の会社名を精度良く検出することができる。 The character recognition unit 225 may detect a character string including a keyword such as "company" from the input form image. The character recognition unit 225 extracts a target character from the detected character strings according to a predetermined priority. The predetermined priority order includes a position condition (for example, a character string located on the rightmost side or the lowest side of the detected character strings) and is set in advance. For example, if the form is an invoice and the target characters are the company name of the billing source, the invoice is likely to include the company name of the billing source and the company name of the billing destination. Generally, the billing company name is likely to be listed on the upper left side, and the billing company name is likely to be listed on the lower right side. Therefore, the character recognition unit 225 can accurately detect the company name of the billing source by detecting the character string located on the rightmost side or the lowermost side of the detected character strings as the target character.

このように、文字認識部２２５は、入力部分画像が、検出された帳票レイアウトの種類に対応する全ての辞書に記憶された部分画像と類似しない場合、入力帳票画像からＯＣＲにより文字情報を認識する。また、文字認識部２２５は、検出された帳票レイアウトの種類に対応する辞書が辞書テーブルに記憶されていない場合も、入力帳票画像からＯＣＲにより文字情報を認識する。これにより、文字認識部２２５は、入力帳票画像に対応する辞書がまだ登録されていない場合でも、検出対象の文字を認識することができる。 As described above, when the input partial image is not similar to the partial image stored in all the dictionaries corresponding to the detected form layout type, the character recognition unit 225 recognizes the character information from the input form image by OCR. .. Further, the character recognition unit 225 recognizes the character information by OCR from the input form image even when the dictionary corresponding to the detected form layout type is not stored in the dictionary table. As a result, the character recognition unit 225 can recognize the character to be detected even when the dictionary corresponding to the input form image has not been registered yet.

次に、出力制御部２２６は、ステップＳ２０７で特定された文字情報又はステップＳ２０９で認識された文字情報を表示装置２０３に表示することにより出力する（ステップＳ２１０）。なお、出力制御部２２６は、ステップＳ２０７で特定された文字情報又はステップＳ２０９で認識された文字情報を、第２インタフェース装置２０１を介して他の情報処理装置に送信することにより出力してもよい。 Next, the output control unit 226 outputs the character information specified in step S207 or the character information recognized in step S209 by displaying it on the display device 203 (step S210). The output control unit 226 may output the character information specified in step S207 or the character information recognized in step S209 by transmitting the character information to another information processing device via the second interface device 201. ..

次に、登録部２２７は、利用者から入力装置２０２を用いて又は第２インタフェース装置２０１を介して、出力された文字情報が誤りであるか否かの指定を受け付ける（ステップＳ２１１）。 Next, the registration unit 227 receives a designation from the user whether or not the output character information is incorrect using the input device 202 or via the second interface device 201 (step S211).

次に、登録部２２７は、利用者により、出力された文字情報、即ちステップＳ２０７で特定された辞書テーブルに記憶されている文字情報又はステップＳ２０９で認識された文字情報が誤りであると判定されたか否かを判定する（ステップＳ２１２）。利用者により、出力された文字情報が誤りであると判定されなかった場合、登録部２２７は、一連のステップを終了する。 Next, the registration unit 227 determines that the character information output by the user, that is, the character information stored in the dictionary table specified in step S207 or the character information recognized in step S209 is incorrect. It is determined whether or not (step S212). If the user does not determine that the output character information is incorrect, the registration unit 227 ends a series of steps.

一方、利用者により、出力された文字情報が誤りであると判定された場合、登録部２２７は、登録処理を実行し（ステップＳ２１３）、一連のステップを終了する。即ち、辞書テーブルに、検出された帳票レイアウトの種類に対応する辞書が記憶されておらず且つ認識された文字情報が誤りであった場合、登録部２２７は、登録処理を実行する。なお、登録部２２７は、検出された帳票レイアウトの種類に対応する辞書が記憶されていない場合、認識された文字情報が正しかったか誤りであったかに関わらず、登録処理を実行してもよい。登録処理の詳細については後述する。 On the other hand, when the user determines that the output character information is incorrect, the registration unit 227 executes the registration process (step S213) and ends a series of steps. That is, when the dictionary corresponding to the detected form layout type is not stored in the dictionary table and the recognized character information is incorrect, the registration unit 227 executes the registration process. If the dictionary corresponding to the detected form layout type is not stored, the registration unit 227 may execute the registration process regardless of whether the recognized character information is correct or incorrect. The details of the registration process will be described later.

なお、ステップＳ２０６において、文字特定部２２４は、画像が類似するか否かを判定する代わりに、特徴量が類似するか否かを判定してもよい。その場合、文字特定部２２４は、抽出された入力部分画像の特徴量を算出し、算出した特徴量と、抽出された辞書に記憶されている特徴量とが類似するか否かを判定する。文字特定部２２４が算出する特徴量は、辞書に記憶されている特徴量と同じ種類の特徴量である。文字特定部２２４は、入力部分画像の特徴量と辞書に記憶されている特徴量との類似の程度を算出する。文字特定部２２４は、算出した類似の程度が閾値以上であるか否かにより、入力部分画像の特徴量と辞書に記憶されている特徴量とが類似するか否かを判定する。類似の程度は、例えば各特徴量（特徴ベクトル）の内積値である。 In step S206, the character identification unit 224 may determine whether or not the features are similar, instead of determining whether or not the images are similar. In that case, the character identification unit 224 calculates the feature amount of the extracted input partial image, and determines whether or not the calculated feature amount and the feature amount stored in the extracted dictionary are similar. The feature amount calculated by the character identification unit 224 is the same type of feature amount as the feature amount stored in the dictionary. The character identification unit 224 calculates the degree of similarity between the feature amount of the input partial image and the feature amount stored in the dictionary. The character identification unit 224 determines whether or not the feature amount of the input partial image and the feature amount stored in the dictionary are similar depending on whether or not the calculated degree of similarity is equal to or higher than the threshold value. The degree of similarity is, for example, the internal product value of each feature quantity (feature vector).

入力部分画像の特徴量と辞書に記憶されている特徴量とが類似する場合、ステップＳ２０７で、文字特定部２２４は、その辞書に記憶されている文字情報に示される文字を対象文字として特定する。また、入力部分画像の特徴量と辞書に記憶されている特徴量とが類似しない場合、ステップＳ２０４で、抽出部２２３は、検出された帳票レイアウトの種類に対応する他の辞書を新たに抽出する。そして、ステップＳ２０５で、抽出部２２３は、新たに抽出した辞書に含まれる位置情報に基づいて、入力帳票画像から第２入力部分画像を抽出する。第２入力部分画像の特徴量と新たに抽出された辞書に記憶されている特徴量とが類似する場合、ステップＳ２０７で、文字特定部２２４は、入力帳票画像には、新たに抽出された辞書に記憶されている文字情報に示される文字を対象文字として特定する。 When the feature amount of the input partial image and the feature amount stored in the dictionary are similar, in step S207, the character identification unit 224 specifies the character shown in the character information stored in the dictionary as the target character. .. If the feature amount of the input partial image and the feature amount stored in the dictionary are not similar, in step S204, the extraction unit 223 newly extracts another dictionary corresponding to the detected form layout type. .. Then, in step S205, the extraction unit 223 extracts the second input partial image from the input form image based on the position information included in the newly extracted dictionary. When the feature amount of the second input partial image and the feature amount stored in the newly extracted dictionary are similar, in step S207, the character identification unit 224 puts the newly extracted dictionary in the input form image. The character shown in the character information stored in is specified as the target character.

このように、文字特定部２２４は、入力部分画像の特徴量と、検出された帳票レイアウトの種類に対応して辞書テーブルに記憶されている部分画像の特徴量とが類似するか否かを判定する。文字特定部２２４は、入力部分画像の特徴量と、検出された帳票レイアウトの種類に対応して記憶されている部分画像の特徴量とが類似する場合、入力帳票画像には、その帳票レイアウトの種類に対応して記憶されている文字情報が記載されているものとする。また、入力部分画像の特徴量と、特定の辞書に含まれる特徴量とが類似しない場合、抽出部２２３は、検出された帳票レイアウトの種類に対応して辞書テーブルに記憶されている他の辞書を抽出する。そして、抽出部２２３は、抽出した他の辞書に含まれる位置情報に基づいて、入力帳票画像から第２入力部分画像を抽出する。文字特定部２２４は、第２入力部分画像の特徴量とその他の辞書に含まれる特徴量とが類似する場合、入力帳票画像には、その他の辞書に含まれる文字情報が記載されているものとする。 In this way, the character identification unit 224 determines whether or not the feature amount of the input partial image is similar to the feature amount of the partial image stored in the dictionary table corresponding to the type of the detected form layout. do. When the feature amount of the input partial image and the feature amount of the partial image stored corresponding to the type of the detected form layout are similar to each other, the character identification unit 224 sets the input form image to the feature amount of the form layout. It is assumed that the character information stored corresponding to the type is described. Further, when the feature amount of the input partial image and the feature amount included in the specific dictionary are not similar, the extraction unit 223 uses another dictionary stored in the dictionary table corresponding to the detected form layout type. Is extracted. Then, the extraction unit 223 extracts the second input partial image from the input form image based on the position information included in the other extracted dictionaries. In the character identification unit 224, when the feature amount of the second input partial image and the feature amount included in the other dictionary are similar, the character information included in the other dictionary is described in the input form image. do.

文字特定部２２４は、特徴量を用いることによって、より精度良く且つより高速に、入力部分画像と部分画像が類似しているか否かを判定することができる。 By using the feature amount, the character identification unit 224 can determine whether or not the input partial image and the partial image are similar with each other more accurately and faster.

また、ステップＳ２０７において、文字特定部２２４は、入力部分画像と最も類似する部分画像が記憶された辞書を用いて対象文字を特定してもよい。その場合、文字特定部２２４は、検出された帳票レイアウトの種類に対応する全ての辞書について、各位置情報に対応する入力部分画像又はその特徴量と、各部分画像又はその特徴量との類似の程度を算出する。そして、文字特定部２２４は、類似の程度が最大である部分画像又は特徴量が記憶された辞書を用いて対象文字を特定する。これにより、文字特定部２２４は、より精度良く、検出対象の文字を特定することができる。 Further, in step S207, the character identification unit 224 may specify the target character by using a dictionary in which the partial image most similar to the input partial image is stored. In that case, the character identification unit 224 resembles the input partial image or its feature amount corresponding to each position information and each partial image or its feature amount for all the dictionaries corresponding to the detected form layout type. Calculate the degree. Then, the character identification unit 224 identifies the target character by using a partial image having the maximum degree of similarity or a dictionary in which the feature amount is stored. As a result, the character identification unit 224 can specify the character to be detected with higher accuracy.

また、ステップＳ２０２において、レイアウト検出部２２２は、公知のＯＣＲ技術を利用して、入力帳票画像から文字情報を認識してもよい。その場合、レイアウトテーブルには、各帳票レイアウトの種類に関連付けて、検出対象の文字情報が記載される画像内の位置と、その文字情報のフォーマット（例えば「株式会社」又は「有限会社」を含む等）とが記憶される。レイアウト検出部２２２は、入力帳票画像内の、検出した帳票レイアウトの種類に関連付けられた位置から文字情報を認識する。認識した文字情報が、検出した帳票レイアウトの種類に関連付けられたフォーマットに対応する場合、第２処理回路２２０は、処理をステップＳ２１０へ移行し、認識された文字情報を出力する。一方、認識した文字情報が、検出した帳票レイアウトの種類に関連付けられたフォーマットに対応しない場合、第２処理回路２２０は、ステップＳ２０４〜Ｓ２０９の処理を実行し、辞書を用いて文字情報を特定する。 Further, in step S202, the layout detection unit 222 may recognize character information from the input form image by using a known OCR technique. In that case, the layout table includes the position in the image in which the character information to be detected is described and the format of the character information (for example, "Co., Ltd." or "Co., Ltd."" in association with each form layout type. Etc.) and are memorized. The layout detection unit 222 recognizes the character information from the position associated with the detected form layout type in the input form image. When the recognized character information corresponds to the format associated with the detected form layout type, the second processing circuit 220 shifts the processing to step S210 and outputs the recognized character information. On the other hand, when the recognized character information does not correspond to the format associated with the detected form layout type, the second processing circuit 220 executes the processes of steps S204 to S209 and specifies the character information using the dictionary. ..

レイアウトテーブルは、帳票レイアウトの種類毎に、検出対象の文字情報を検出するための概略的な情報が記憶された全体辞書として機能する。一方、辞書テーブルに記憶される各辞書は、個別の局所領域毎に、検出対象の文字情報を特定するための詳細な情報が記憶された個別辞書として機能する。情報処理装置２００は、全体辞書と個別辞書とを組合せて使用することにより、検出対象の文字情報を精度良く特定することができる。 The layout table functions as an overall dictionary in which general information for detecting character information to be detected is stored for each type of form layout. On the other hand, each dictionary stored in the dictionary table functions as an individual dictionary in which detailed information for specifying character information to be detected is stored for each individual local area. The information processing device 200 can accurately identify the character information to be detected by using the general dictionary and the individual dictionary in combination.

図７は、登録処理の動作の例を示すフローチャートである。図７に示す動作のフローは、図６に示すフローチャートのステップＳ２１３において実行される。 FIG. 7 is a flowchart showing an example of the operation of the registration process. The flow of the operation shown in FIG. 7 is executed in step S213 of the flowchart shown in FIG.

最初に、登録部２２７は、ステップＳ２１０で出力された文字情報が辞書を使用して特定された文字情報であるか否か、即ちステップＳ２０７で特定された文字情報であるか否かを判定する（ステップＳ３０１）。出力された文字情報が辞書を使用して特定された文字情報でない場合、即ち出力された文字情報がＯＣＲで認識された文字情報である場合、登録部２２７は、処理をステップＳ３０３へ移行する。 First, the registration unit 227 determines whether or not the character information output in step S210 is the character information specified by using the dictionary, that is, whether or not the character information is the character information specified in step S207. (Step S301). If the output character information is not the character information specified by using the dictionary, that is, if the output character information is the character information recognized by OCR, the registration unit 227 shifts the process to step S303.

一方、出力された文字情報が辞書を使用して特定された文字情報である場合、登録部２２７は、入力帳票画像内で、その辞書に含まれる位置情報に示される位置を表示装置２０３に表示して、利用者に通知する（ステップＳ３０２）。これにより、利用者は、誤って特定された文字情報が、入力帳票画像内のどの領域に基づいて特定されたかを認識することができ、新たに辞書として登録する領域を入力帳票画像内のどの領域に設定するかを適切に決定することができる。したがって、情報処理装置２００は、新たに登録される辞書の品質を向上させることができる。 On the other hand, when the output character information is the character information specified by using the dictionary, the registration unit 227 displays the position indicated by the position information included in the dictionary on the display device 203 in the input form image. Then, the user is notified (step S302). As a result, the user can recognize which area in the input form image the erroneously specified character information was identified based on, and which area in the input form image is newly registered as a dictionary. It is possible to appropriately decide whether to set it in the area. Therefore, the information processing device 200 can improve the quality of the newly registered dictionary.

なお、情報処理装置２００は、辞書テーブルにおいて、各辞書に対応する入力帳票画像を記憶しておき、登録部２２７は、その辞書に対応する入力帳票画像を表示装置２０３に表示してもよい。これにより、利用者は、誤って特定された文字情報が、どのような入力帳票画像に基づいて特定されたかを認識することができ、新たに辞書として登録する領域を入力帳票画像内のどの領域に設定するかをより適切に決定することができる。 The information processing device 200 may store the input form image corresponding to each dictionary in the dictionary table, and the registration unit 227 may display the input form image corresponding to the dictionary on the display device 203. As a result, the user can recognize what kind of input form image the erroneously specified character information is based on, and which area in the input form image is newly registered as a dictionary. It is possible to more appropriately decide whether to set to.

次に、登録部２２７は、入力装置２０２を用いて利用者から、入力帳票画像内で領域及び対象文字の指定を受け付ける（ステップＳ３０３）。指定されるべき領域は、その入力帳票画像内で特徴的な画像を含む領域である。指定されるべき対象文字は、その入力帳票画像から検出されるべき文字である。利用者により、対象文字が含まれる領域が指定された場合、登録部２２７は、実際の文字情報が記載されている部分画像を辞書として登録することができ、文字特定部２２４は、検出対象の文字をより精度良く特定することができる。但し、指定される領域は、対象文字が含まれない領域でもよい。また、指定される領域は、既に辞書に登録されている領域と同一の領域又は重複する領域でもよい。 Next, the registration unit 227 receives the designation of the area and the target character in the input form image from the user by using the input device 202 (step S303). The area to be designated is an area including a characteristic image in the input form image. The target character to be specified is a character to be detected from the input form image. When the area including the target character is specified by the user, the registration unit 227 can register the partial image in which the actual character information is described as a dictionary, and the character identification unit 224 is the detection target. Characters can be identified more accurately. However, the designated area may be an area that does not include the target character. Further, the designated area may be the same area or an overlapping area as the area already registered in the dictionary.

次に、登録部２２７は、受け付けた領域及び対象文字に対応する辞書を辞書テーブルに登録し（ステップＳ３０４）、一連のステップを終了する。登録部２２７は、入力帳票画像から、受け付けた領域に対応する部分画像を切り出し、又は、その部分画像の特徴量を算出する。また、登録部２２７は、入力帳票画像における、受け付けた領域の位置情報を特定する。また、登録部２２７は、受け付けた対象文字を示す文字情報を生成する。そして、登録部２２７は、切り出した部分画像又は算出した特徴量、特定した位置情報、及び、生成した文字情報を含む辞書を生成する。登録部２２７は、新たに生成する辞書の優先順位として、既に存在する全ての辞書の優先順位より低い順位を設定する。なお、登録部２２７は、新たに生成する辞書の優先順位として、既に存在する全ての辞書の優先順位より高い順位を設定してもよい。登録部２２７は、生成した辞書を、レイアウト検出部２２２により検出された帳票レイアウトの種類に対応付けて辞書テーブルに登録する。 Next, the registration unit 227 registers the dictionary corresponding to the received area and the target character in the dictionary table (step S304), and ends a series of steps. The registration unit 227 cuts out a partial image corresponding to the received area from the input form image, or calculates the feature amount of the partial image. In addition, the registration unit 227 specifies the position information of the received area in the input form image. In addition, the registration unit 227 generates character information indicating the received target character. Then, the registration unit 227 generates a dictionary including the cut out partial image or the calculated feature amount, the specified position information, and the generated character information. The registration unit 227 sets the priority of the newly generated dictionary to be lower than the priority of all the existing dictionaries. The registration unit 227 may set the priority of the newly generated dictionary to be higher than the priority of all the existing dictionaries. The registration unit 227 registers the generated dictionary in the dictionary table in association with the type of form layout detected by the layout detection unit 222.

情報処理装置２００は、認識又は特定に失敗した入力帳票画像に基づいて新たに辞書を登録することにより、以降にその入力帳票画像と同一の帳票フォーマットを有する入力帳票画像を処理する際に、検出対象の文字を精度良く特定することができる。 The information processing device 200 detects when processing an input form image having the same form format as the input form image by registering a new dictionary based on the input form image that has failed to be recognized or specified. The target character can be specified with high accuracy.

なお、登録部２２７は、一つの対象文字に対して複数の領域を登録してもよい。その場合、ステップＳ３０３において、登録部２２７は、利用者から一つの対象文字と複数の領域の指定を受け付ける。ステップＳ３０４において、登録部２２７は、指定された複数の領域にそれぞれ対応する複数の部分画像、複数の特徴量及び複数の位置情報を、対象文字に対応する文字情報と関連付けた辞書を生成する。また、図６のステップＳ２０５において、抽出部２２３は、複数の入力部分画像を抽出する。ステップＳ２０６において、文字特定部２２４は、複数の入力部分画像又はその特徴量と、複数の部分画像又はその特徴量とが類似するか否かを判定する。文字特定部２２４は、各入力部分画像又はその特徴量と、対応する各部分画像又はその特徴量の類似の程度の平均値又は重み付け和が閾値以上であるか否かにより、複数の入力部分画像又はその特徴量と複数の部分画像又はその特徴量とが類似するか否かを判定する。 The registration unit 227 may register a plurality of areas for one target character. In that case, in step S303, the registration unit 227 accepts the designation of one target character and a plurality of areas from the user. In step S304, the registration unit 227 generates a dictionary in which a plurality of partial images, a plurality of feature amounts, and a plurality of position information corresponding to the designated plurality of areas are associated with the character information corresponding to the target character. Further, in step S205 of FIG. 6, the extraction unit 223 extracts a plurality of input partial images. In step S206, the character identification unit 224 determines whether or not the plurality of input partial images or their feature amounts are similar to the plurality of partial images or their feature amounts. The character identification unit 224 has a plurality of input partial images depending on whether or not the average value or the weighted sum of each input partial image or its feature amount and the corresponding degree of similarity of each partial image or its feature amount is equal to or more than a threshold value. Alternatively, it is determined whether or not the feature amount and the plurality of partial images or the feature amount are similar.

図８は、入力帳票画像８００の一例を示す模式図である。 FIG. 8 is a schematic diagram showing an example of the input form image 800.

図８に示すように、入力帳票画像８００は、請求書が撮像された画像である。入力帳票画像８００には、請求元の会社名８０１及び請求先の会社名８０２が含まれている。入力帳票画像８００では、請求元の会社名８０１が対象文字である。請求元の会社名８０１上には印鑑８０３が重畳されている。 As shown in FIG. 8, the input form image 800 is an image in which the invoice is captured. The input form image 800 includes the company name 801 of the billing source and the company name 802 of the billing destination. In the input form image 800, the billing company name 801 is the target character. The seal 803 is superimposed on the company name 801 of the billing source.

例えば、図６のステップＳ２０３で入力帳票画像８００の帳票レイアウトの種類に対応する辞書が辞書テーブルに記憶されていなかった場合、ステップＳ２０９で文字情報が認識される。仮に、キーワード「会社」を含む請求元の会社名８０１と請求先の会社名８０２の二つの文字列が認識され、右側に位置する請求元の会社名８０１が対象文字として検出された場合、ステップＳ２１２で文字情報が正しいと判定され、認識処理は終了する。一方、請求元の会社名８０１上に印鑑８０３が重畳されていることにより、請求元の会社名８０１に含まれるキーワード「会社」が認識されなかった場合、請求先の会社名８０２が対象文字として検出される。その場合、登録処理が実行され、例えば、利用者により、請求元の会社名８０１を含む領域８０４と、請求元の会社名８０１を示す対象文字「ＸＹＺ開発株式会社」とが指定され、指定された領域及び対象文字に対応する辞書が辞書テーブルに登録される。 For example, if the dictionary corresponding to the type of the form layout of the input form image 800 is not stored in the dictionary table in step S203 of FIG. 6, the character information is recognized in step S209. If two character strings of the billing company name 801 including the keyword "company" and the billing company name 802 are recognized and the billing company name 801 located on the right side is detected as the target character, the step In S212, it is determined that the character information is correct, and the recognition process ends. On the other hand, if the keyword "company" included in the billing company name 801 is not recognized because the seal 803 is superimposed on the billing company name 801, the billing company name 802 is used as the target character. Detected. In that case, the registration process is executed, and for example, the user specifies and specifies the area 804 including the billing company name 801 and the target character "XYZ Development Co., Ltd." indicating the billing company name 801. The dictionary corresponding to the area and the target character is registered in the dictionary table.

一方、ステップＳ２０３で入力帳票画像８００の帳票レイアウトの種類に対応する辞書が辞書テーブルに記憶されていた場合、ステップＳ２０４で優先順位が最も高い辞書が抽出される。ステップＳ２０５でその辞書に含まれる位置情報に示される位置から入力部分画像が抽出され、ステップＳ２０６でその入力部分画像とその辞書に含まれる部分画像とが類似するか否かが判定される。仮に、その辞書に含まれる位置情報に示される位置が請求元の会社名８０１を含む領域８０４を示す場合、入力部分画像として請求元の会社名８０１を含む領域８０４が抽出される。その入力部分画像とその辞書に含まれる部分画像とが類似すると判定された場合、その辞書に含まれる文字情報「ＸＹＺ開発株式会社」が出力される。その場合、ステップＳ２１２で文字情報が正しいと判定され、認識処理は終了する。一方、請求元の会社名８０１上に重畳された印鑑８０３の位置と、その辞書に含まれる部分画像内の印鑑の位置とが異なることにより、入力部分画像と部分画像が類似しないと判定された場合、ステップＳ２０４で優先順位がより低い辞書が抽出される。 On the other hand, when the dictionary corresponding to the type of the form layout of the input form image 800 is stored in the dictionary table in step S203, the dictionary having the highest priority is extracted in step S204. In step S205, the input partial image is extracted from the position indicated by the position information included in the dictionary, and in step S206, it is determined whether or not the input partial image and the partial image included in the dictionary are similar. If the position shown in the position information included in the dictionary indicates the area 804 including the company name 801 of the billing source, the area 804 including the company name 801 of the billing source is extracted as an input partial image. When it is determined that the input partial image and the partial image included in the dictionary are similar, the character information "XYZ Development Co., Ltd." included in the dictionary is output. In that case, it is determined in step S212 that the character information is correct, and the recognition process ends. On the other hand, it was determined that the input partial image and the partial image are not similar because the position of the seal stamp 803 superimposed on the billing company name 801 and the position of the seal stamp in the partial image included in the dictionary are different. If so, the dictionary with the lower priority is extracted in step S204.

仮に他の辞書が存在しない場合、ステップＳ２０９で文字情報が認識される。そして、文字情報の認識に失敗した場合、登録処理が実行される。なお、利用者は、印鑑８０３が重畳された請求元の会社名８０１でなく、例えば、振込先の口座番号を含む領域８０５のように他の特徴を含む領域と、請求元の会社名８０１を示す対象文字「ＸＹＺ開発株式会社」とを指定することができる。この場合も、利用者により指定された領域及び対象文字に対応する辞書が辞書テーブルに新たに登録される。 If no other dictionary exists, the character information is recognized in step S209. Then, when the recognition of the character information fails, the registration process is executed. In addition, the user does not have the company name 801 of the billing source on which the seal 803 is superimposed, but an area including other features such as the area 805 including the account number of the transfer destination and the company name 801 of the billing source. The target character "XYZ Development Co., Ltd." to be indicated can be specified. In this case as well, the dictionary corresponding to the area specified by the user and the target character is newly registered in the dictionary table.

一方、他の辞書が存在する場合、ステップＳ２０５でその辞書に含まれる位置情報に示される位置から第２入力部分画像が抽出され、ステップＳ２０６で第２入力部分画像とその辞書に含まれる部分画像とが類似するか否かが判定される。仮に、その辞書に含まれる位置情報に示される位置が振込先の口座を含む領域８０５を示す場合、第２入力部分画像として振込先の口座番号を含む領域８０５が抽出される。第２入力部分画像とその辞書に含まれる部分画像とが類似すると判定された場合、その辞書に含まれる文字情報「ＸＹＺ開発株式会社」が出力される。その場合も、ステップＳ２１２で文字情報が正しいと判定され、認識処理は終了する。 On the other hand, when another dictionary exists, the second input partial image is extracted from the position indicated by the position information included in the dictionary in step S205, and the second input partial image and the partial image included in the dictionary are included in step S206. It is determined whether or not is similar to. If the position shown in the location information included in the dictionary indicates the area 805 including the transfer destination account, the area 805 including the transfer destination account number is extracted as the second input partial image. When it is determined that the second input partial image and the partial image included in the dictionary are similar, the character information "XYZ Development Co., Ltd." included in the dictionary is output. In that case as well, it is determined in step S212 that the character information is correct, and the recognition process ends.

また仮に、登録された辞書を用いて特定された文字情報が誤っていた場合も、登録処理が実行される。その場合、ステップＳ３０２において、入力帳票画像内で、その辞書に含まれる位置情報で示される位置が表示されるため、利用者は、その位置を除く特徴的な領域を指定することができる。これにより、情報処理装置２００は、より品質の高い辞書を登録することができる。 Further, even if the character information specified by using the registered dictionary is incorrect, the registration process is executed. In that case, in step S302, since the position indicated by the position information included in the dictionary is displayed in the input form image, the user can specify a characteristic area excluding the position. As a result, the information processing apparatus 200 can register a higher quality dictionary.

以上詳述したように、情報処理装置２００は、入力帳票画像の帳票レイアウトの種類に対応する、過去に認識に失敗した部分画像、部分画像の位置情報及び部分画像に対応する文字情報を規定する個別辞書を用いて、対象文字を検出する。これにより、情報処理装置２００は、入力帳票画像に記載されている文字情報をより正しく出力することが可能となった。 As described in detail above, the information processing apparatus 200 defines partial images that have failed to be recognized in the past, position information of the partial images, and character information corresponding to the partial images, which correspond to the type of form layout of the input form image. The target character is detected using an individual dictionary. As a result, the information processing apparatus 200 can more correctly output the character information described in the input form image.

その結果、情報処理装置２００は、入力装置２０２を用いて利用者から検出対象の文字情報の修正を受け付ける回数が減少し、入力帳票画像を扱う処理におけるプロセッサの処理負荷を低減させることが可能となった。また情報処理装置２００は、他の情報処理装置に検出対象の文字列に関する情報を送信する場合、他の情報処理装置から検出対象の文字列の修正要求を受信する回数が減少し、情報処理装置２００と他の情報処理装置の間の通信量を低減させることが可能となった。 As a result, the information processing device 200 can reduce the number of times the input device 202 receives the correction of the character information to be detected from the user, and can reduce the processing load of the processor in the process of handling the input form image. became. Further, when the information processing device 200 transmits information about the character string to be detected to another information processing device, the number of times of receiving a correction request for the character string to be detected from the other information processing device is reduced, and the information processing device 200 receives the correction request. It has become possible to reduce the amount of communication between the 200 and other information processing devices.

また、画像処理システム１は、ＯＣＲ及びＲＰＡ（Robotic Process Automation）技術を利用して帳票入力業務を自動化する企業において、帳票入力業務の効率化を図り、担当者の業務負担を軽減させることが可能となった。特に、画像処理システム１は、請求書等の帳票を電子化して支払い依頼等の業務を自動化する場合、請求元の会社名を精度良く特定する必要がある。しかしながら、請求元の会社名には、一般的なフォントでなくデザイン性が高い特殊なフォントが使用されている場合や、社印等が重畳されている場合がある。そのような場合、ＯＣＲを利用して、請求元の会社名を正しく認識できない可能性がある。情報処理装置２００は、過去に認識に失敗した部分画像を用いた個別辞書を利用することにより、請求元の会社名を精度良く特定することが可能となった。したがって、情報処理装置２００は、請求元の会社名毎に、入力帳票画像を適切に分類して仕分けることが可能となった。 In addition, the image processing system 1 can improve the efficiency of the form input work and reduce the work burden of the person in charge in a company that automates the form input work by using OCR and RPA (Robotic Process Automation) technology. It became. In particular, when the image processing system 1 digitizes a form such as an invoice to automate operations such as a payment request, it is necessary to accurately specify the company name of the billing source. However, the billing company name may use a special font with high design rather than a general font, or may have a company seal or the like superimposed on it. In such a case, OCR may not be used to correctly recognize the company name of the billing source. The information processing device 200 can accurately identify the company name of the billing source by using an individual dictionary using partial images that have failed to be recognized in the past. Therefore, the information processing apparatus 200 can appropriately classify and sort the input form images for each billing company name.

特に、請求書、領収書等の帳票における帳票レイアウトは会社毎に異なり、一つの会社の帳票は同一の帳票レイアウトに従って作成されている可能性が高い。情報処理装置２００は、複数の辞書を帳票レイアウトの種類と対応付けて記憶しておき、入力帳票画像の帳票レイアウトに応じて、使用する辞書を変更する。これにより、情報処理装置２００は、検出対象の会社名を精度良く検出することが可能となった。 In particular, the form layout of invoices, receipts, and other forms differs from company to company, and there is a high possibility that the forms of one company are created according to the same form layout. The information processing device 200 stores a plurality of dictionaries in association with the type of form layout, and changes the dictionary to be used according to the form layout of the input form image. As a result, the information processing apparatus 200 can accurately detect the company name to be detected.

また、情報処理装置２００では、認識に失敗した入力帳票画像から、辞書として登録する部分画像を利用者が任意に選択することができる。これにより、利用者は、対象文字が含まれる画像、又は、銀行口座、住所、電話番号、ＦＡＸ番号、ロゴ等のように対象文字に一意に対応する情報が含まれる画像を部分画像として選択することができる。したがって、情報処理装置２００は、より品質の高い辞書を登録することが可能となった。 Further, in the information processing apparatus 200, the user can arbitrarily select a partial image to be registered as a dictionary from the input form image that has failed to be recognized. As a result, the user selects an image containing the target character or an image containing information uniquely corresponding to the target character such as a bank account, an address, a telephone number, a fax number, a logo, etc. as a partial image. be able to. Therefore, the information processing apparatus 200 can register a higher quality dictionary.

図９は、他の実施形態に係る情報処理装置における第２処理回路２３０の概略構成を示すブロック図である。 FIG. 9 is a block diagram showing a schematic configuration of the second processing circuit 230 in the information processing apparatus according to another embodiment.

第２処理回路２３０は、第２処理回路２２０の代わりに、認識処理を実行する。第２処理回路２３０は、取得回路２３１、レイアウト検出回路２３２、抽出回路２３３、文字特定回路２３４、文字認識回路２３５、出力制御回路２３６及び登録回路２３７等を有する。 The second processing circuit 230 executes the recognition process instead of the second processing circuit 220. The second processing circuit 230 includes an acquisition circuit 231, a layout detection circuit 232, an extraction circuit 233, a character identification circuit 234, a character recognition circuit 235, an output control circuit 236, a registration circuit 237, and the like.

取得回路２３１は、取得部の一例であり、取得部２２１と同様の機能を有する。取得回路２３１は、入力帳票画像を、第２インタフェース装置２０１を介して画像読取装置１００から取得し、第２記憶装置２１０に保存する。 The acquisition circuit 231 is an example of the acquisition unit, and has the same function as the acquisition unit 221. The acquisition circuit 231 acquires the input form image from the image reading device 100 via the second interface device 201, and stores it in the second storage device 210.

レイアウト検出回路２３２は、レイアウト検出部の一例であり、レイアウト検出部２２２と同様の機能を有する。レイアウト検出回路２３２は、第２記憶装置２１０からレイアウトテーブル及び入力帳票画像を読み出し、帳票レイアウトの種類を検出し、検出結果を第２記憶装置２１０に保存する。 The layout detection circuit 232 is an example of the layout detection unit, and has the same function as the layout detection unit 222. The layout detection circuit 232 reads the layout table and the input form image from the second storage device 210, detects the type of the form layout, and stores the detection result in the second storage device 210.

抽出回路２３３は、抽出部の一例であり、抽出部２２３と同様の機能を有する。抽出回路２３３は、第２記憶装置２１０から辞書テーブル、入力帳票画像及び帳票レイアウトの種類の検出結果を読み出し、入力帳票画像から入力部分画像を抽出し、第２記憶装置２１０に保存する。 The extraction circuit 233 is an example of the extraction unit, and has the same function as the extraction unit 223. The extraction circuit 233 reads the detection result of the dictionary table, the input form image, and the type of the form layout from the second storage device 210, extracts the input partial image from the input form image, and stores it in the second storage device 210.

文字特定回路２３４は、文字特定部の一例であり、文字特定部２２４と同様の機能を有する。文字特定回路２３４は、第２記憶装置２１０から辞書テーブル、入力帳票画像及び入力部分画像を読み出し、入力部分画像と部分画像が類似するか否かに応じて文字情報を特定し、特定結果を第２記憶装置２１０に保存する。 The character identification circuit 234 is an example of the character identification unit, and has the same function as the character identification unit 224. The character identification circuit 234 reads a dictionary table, an input form image, and an input partial image from the second storage device 210, specifies character information according to whether or not the input partial image and the partial image are similar, and obtains a specific result. 2 Store in the storage device 210.

文字認識回路２３５は、文字認識部の一例であり、文字認識部２２５と同様の機能を有する。文字認識回路２３５は、第２記憶装置２１０から入力帳票画像を読み出し、文字情報を認識し、認識結果を第２記憶装置２１０に保存する。 The character recognition circuit 235 is an example of the character recognition unit, and has the same function as the character recognition unit 225. The character recognition circuit 235 reads the input form image from the second storage device 210, recognizes the character information, and stores the recognition result in the second storage device 210.

出力制御回路２３６は、出力制御部の一例であり、出力制御部２２６と同様の機能を有する。出力制御回路２３６は、第２記憶装置２１０から文字情報の特定結果及び認識結果を読み出し、文字情報を表示装置２０３に出力する。 The output control circuit 236 is an example of the output control unit, and has the same function as the output control unit 226. The output control circuit 236 reads the character information specific result and the recognition result from the second storage device 210, and outputs the character information to the display device 203.

登録回路２３７は、登録部の一例であり、登録部２２７と同様の機能を有する。登録回路２３７は、入力装置２０２から入力帳票画像内の領域及び対象文字の指定を受け付け、受け付けた各情報に対応する辞書を第２記憶装置２１０に登録する。 The registration circuit 237 is an example of the registration unit and has the same function as the registration unit 227. The registration circuit 237 receives the designation of the area and the target character in the input form image from the input device 202, and registers the dictionary corresponding to each received information in the second storage device 210.

以上詳述したように、情報処理装置は、第２処理回路２３０を用いる場合も、入力帳票画像に記載されている文字情報をより正しく出力することが可能となった。 As described in detail above, the information processing apparatus can more accurately output the character information described in the input form image even when the second processing circuit 230 is used.

以上、好適な実施形態について説明してきたが、実施形態はこれらに限定されない。例えば、画像読取装置１００と情報処理装置２００の機能分担は、図１に示す画像処理システム１の例に限られず、画像読取装置１００及び情報処理装置２００の各部を画像読取装置１００と情報処理装置２００の何れに配置するかは適宜変更可能である。または、画像読取装置１００と情報処理装置２００を一つの装置で構成してもよい。 Although suitable embodiments have been described above, the embodiments are not limited thereto. For example, the division of functions between the image reading device 100 and the information processing device 200 is not limited to the example of the image processing system 1 shown in FIG. 1, and each part of the image reading device 100 and the information processing device 200 is divided into the image reading device 100 and the information processing device. Which of the 200 is arranged can be changed as appropriate. Alternatively, the image reading device 100 and the information processing device 200 may be configured by one device.

例えば、画像読取装置１００の第１記憶装置１１０が、情報処理装置２００の第２記憶装置２１０に記憶された各プログラム及び各データを記憶してもよい。また、画像読取装置１００の第１処理回路１２０が、情報処理装置２００の第２処理回路２２０により実現される各部として動作してもよい。また、画像読取装置１００が、情報処理装置２００の第２処理回路２３０と同様の処理回路を有してもよい。 For example, the first storage device 110 of the image reading device 100 may store each program and each data stored in the second storage device 210 of the information processing device 200. Further, the first processing circuit 120 of the image reading device 100 may operate as each part realized by the second processing circuit 220 of the information processing device 200. Further, the image reading device 100 may have a processing circuit similar to that of the second processing circuit 230 of the information processing device 200.

その場合、画像読取装置１００は、入力装置２０２と同様の入力装置及び表示装置２０３と同様の表示装置を有する。認識処理は画像読取装置１００で実行されるため、ステップＳ１０２、Ｓ２０１の帳票画像の送受信処理は省略される。ステップＳ２０２〜Ｓ２１３の各処理は、画像読取装置１００の第１処理回路１２０によって実行される。これらの処理の動作は、情報処理装置２００の第２処理回路２２０又は第２処理回路２３０によって実行される場合と同様である。この場合、画像読取装置１００が画像処理装置として動作する。 In that case, the image reading device 100 has an input device similar to the input device 202 and a display device similar to the display device 203. Since the recognition process is executed by the image reading device 100, the process of transmitting and receiving the form image in steps S102 and S201 is omitted. Each process of steps S202 to S213 is executed by the first processing circuit 120 of the image reading device 100. The operation of these processes is the same as that executed by the second processing circuit 220 or the second processing circuit 230 of the information processing apparatus 200. In this case, the image reading device 100 operates as an image processing device.

また、画像処理システム１において、第１インタフェース装置１０１と第２インタフェース装置２０１は、インターネット、電話回線網（携帯端末回線網、一般電話回線網を含む）、イントラネット等のネットワークを介して接続してもよい。その場合、第１インタフェース装置１０１及び第２インタフェース装置２０１に、接続するネットワークの通信インタフェース回路を備える。また、その場合、クラウドコンピューティングの形態で画像処理のサービスを提供できるように、ネットワーク上に複数の情報処理装置を分散して配置し、各情報処理装置が協働して、認識処理等を分担するようにしてもよい。これにより、画像処理システム１は、複数の画像読取装置が読み取った帳票画像について、効率よく認識処理を実行できる。 Further, in the image processing system 1, the first interface device 101 and the second interface device 201 are connected via a network such as the Internet, a telephone line network (including a mobile terminal line network and a general telephone line network), and an intranet. May be good. In that case, the first interface device 101 and the second interface device 201 are provided with a communication interface circuit of the network to be connected. In that case, a plurality of information processing devices are distributed and arranged on the network so that the image processing service can be provided in the form of cloud computing, and each information processing device cooperates to perform recognition processing and the like. You may share it. As a result, the image processing system 1 can efficiently execute the recognition process for the form image read by the plurality of image reading devices.

２００情報処理装置
２０３表示装置
２１０第２記憶装置
２２１取得部
２２２レイアウト検出部
２２３抽出部
２２５文字認識部
２２７登録部 200 Information processing device 203 Display device 210 Second storage device 221 Acquisition unit 222 Layout detection unit 223 Extraction unit 225 Character recognition unit 227 Registration unit

Claims

The partial image or the feature amount of the partial image that failed to be recognized in the past, the position information in the form of the partial image, and the character information corresponding to the partial image are stored for each type of a plurality of form layouts. Memory and
The acquisition unit that acquires the input form image and
A layout detection unit that detects the type of form layout based on the input form image, and
An extraction unit that extracts an input partial image from the input form image based on the position information stored in the storage unit according to the type of the detected form layout.
When the input partial image and the partial image stored in the storage unit corresponding to the type of the detected form layout are similar, or when the feature amount of the input partial image and the detected form layout are used. When the feature amount of the partial image stored in the storage unit is similar to the type of the input form image, the input form image is stored in the storage unit according to the type of the detected form layout. Assuming that the character information is described, an output unit that outputs the character information and
An image processing device characterized by having.

The image processing apparatus according to claim 1, wherein the character information corresponding to the partial image indicates an actual character described in the partial image.

When the partial image or feature amount, position information, and character information corresponding to the detected type of form layout are not stored in the storage unit, the area and the target character are designated in the input form image, and the area is concerned. A registration unit that registers a partial image or a feature amount of the partial image, position information of the area, and character information indicating the target character in the storage unit in association with the type of the detected form layout. The image processing apparatus according to claim 1 or 2, further comprising.

When the partial image or feature amount, position information, and character information corresponding to the detected form layout type are not stored in the storage unit, a character recognition unit that recognizes character information by OCR from the input form image is further added. Have and
The output unit outputs the recognized character information and outputs the recognized character information.
The image processing device according to claim 3, wherein the registration unit accepts designation of an area and a target character in the input form image when it is determined by the user that the recognized character information is incorrect.

When the user determines that the character information stored in the storage unit output from the output unit is incorrect, the registration unit accepts designation of an area and a target character in the input form image. , The partial image corresponding to the area or the feature amount of the partial image, the position information of the area, and the character information indicating the target character are further associated with the type of the detected form layout and registered in the storage unit. The image processing apparatus according to claim 3 or 4.

The storage unit contains a plurality of information groups including a partial image that has failed to be recognized in the past, a feature amount of the partial image, position information of the partial image in a form, and character information corresponding to the partial image. , Stored for each type of the plurality of form layouts
When the input partial image and the partial image included in the specific information group stored in the storage unit corresponding to the type of the detected form layout are not similar, or the feature amount of the input partial image. And the feature amount contained in the specific information group are not similar to each other.
The extraction unit obtains a second input partial image from the input form image based on the position information included in other information groups stored in the storage unit according to the type of the detected form layout. Extract and
When the second input partial image and the partial image included in the other information group are similar to each other, or the feature amount of the second input partial image and the partial image included in the other information group are included in the output unit. When the feature amount is similar, it is assumed that the character information included in the other information group is described in the input form image, and the character information is output. Any one of claims 1 to 5. The image processing apparatus according to the section.

A control method for an image processing device having a storage unit and an output unit, wherein the image processing device
The partial image or the feature amount of the partial image that failed to be recognized in the past, the position information in the form of the partial image, and the character information corresponding to the partial image are stored in the storage unit for each type of a plurality of form layouts. Remember,
Get the input form image and
Based on the input form image, the type of form layout is detected.
An input partial image is extracted from the input form image based on the position information stored in the storage unit corresponding to the type of the detected form layout.
When the input partial image and the partial image stored in the storage unit corresponding to the type of the detected form layout are similar, or when the feature amount of the input partial image and the detected form layout are used. When the feature amount of the partial image stored in the storage unit is similar to the type of the input partial image, the input partial image is stored in the storage unit according to the type of the detected form layout. Assuming that the character information is described, the character information is output from the output unit.
A control method characterized by that.

A computer control program having a storage unit and an output unit.
The partial image or the feature amount of the partial image that failed to be recognized in the past, the position information in the form of the partial image, and the character information corresponding to the partial image are stored in the storage unit for each type of a plurality of form layouts. Remember,
Get the input form image and
Based on the input form image, the type of form layout is detected.
An input partial image is extracted from the input form image based on the position information stored in the storage unit corresponding to the type of the detected form layout.
When the input partial image and the partial image stored in the storage unit corresponding to the type of the detected form layout are similar, or when the feature amount of the input partial image and the detected form layout are used. When the feature amount of the partial image stored in the storage unit is similar to the type of the input partial image, the input partial image is stored in the storage unit according to the type of the detected form layout. Assuming that the character information is described, the character information is output from the output unit.
A control program characterized by causing the computer to execute such a thing.