JP2012194879A

JP2012194879A - Information processing apparatus, information processing method and program

Info

Publication number: JP2012194879A
Application number: JP2011059362A
Authority: JP
Inventors: Masamitsu Ito; 修光伊藤; Takashi Sawada; 敬澤田; Shigehiro Fujitsuka; 誠弘藤塚; Tatsuya Mogi; 達也毛木
Original assignee: PFU Ltd
Current assignee: PFU Ltd
Priority date: 2011-03-17
Filing date: 2011-03-17
Publication date: 2012-10-11
Also published as: US20120237131A1; CN102708365A

Abstract

PROBLEM TO BE SOLVED: To provide a technique for improving efficiency of definition information creation used for OCR software and the like.SOLUTION: An information processing apparatus according to the invention includes: an area recognition part for, in an area designated by a predetermined expression in image data, recognizing a first area designated by a first area designation expression and a second area designated by a second area designation expression that is different from the first area designation expression; a position information acquisition part for, in the image data, acquiring position information on the first area as position information to designate an area that is to be an object of character recognition; and an item name acquisition part for acquiring character information that is acquired by recognizing characters in the second area as an item name of an area that is to be an object of character recognition designated by the position information acquired by the position information acquisition part.

Description

本発明は、情報処理装置、情報処理方法、及び、プログラムの技術に関する。 The present invention relates to an information processing apparatus, an information processing method, and a program technique.

近年、業務改善及びコスト削減の観点から、様々な業務においてペーパーレス化が進められている。その一方で、例えば、取引書類等、未だ紙を利用する場面が数多く存在する。従来、このような紙が用いられる業務の効率を改善するためにＯＣＲ（Optical Character Recognition）ソフトが用いられてきた。 In recent years, paperlessness has been promoted in various businesses from the viewpoint of business improvement and cost reduction. On the other hand, there are still many scenes where paper is still used, for example, transaction documents. Conventionally, OCR (Optical Character Recognition) software has been used to improve the efficiency of operations in which such paper is used.

このようなＯＣＲソフトにおいて読取領域等を指定するためには、該読取領域等の定義情報が必要となる。以下で挙げられる特許文献１及び特許文献２には、該定義情報に関する技術が開示されている。 In order to designate a reading area or the like in such OCR software, definition information of the reading area or the like is required. Patent Document 1 and Patent Document 2 listed below disclose techniques relating to the definition information.

特許文献１には、イメージデータを色別に走査することにより、色に対応する文字種の読み取りを行う技術が開示されている。また、特許文献２には、所定色枠で囲われた領域に記入されている属性情報を認識して、読み取り項目の属性情報定義体を作成する技術が開示されている。 Japanese Patent Application Laid-Open No. H10-260260 discloses a technique for reading a character type corresponding to a color by scanning image data for each color. Patent Document 2 discloses a technique for recognizing attribute information entered in a region surrounded by a predetermined color frame and creating an attribute information definition body for a read item.

実開平０５−００８６７０号公報Japanese Utility Model Publication No. 05-008670 特開平０５−０８１４７２号公報JP 05-081472 A

しかしながら、従来の技術では、ユーザは、ＯＣＲソフトの定義情報を作成する際、イメージデータから取得した読取領域の位置情報に対して、該読取領域の記載内容を示す項目名を手入力により設定する必要があった。 However, in the conventional technique, when creating definition information of OCR software, the user manually sets an item name indicating the description content of the reading area for the position information of the reading area acquired from the image data. There was a need.

本発明は、このような点を考慮してなされたものであり、ＯＣＲソフト等に用いられる定義情報作成の効率化を図ることができる技術を提供することを目的とする。 The present invention has been made in consideration of such points, and an object of the present invention is to provide a technique capable of improving the efficiency of creating definition information used in OCR software or the like.

本発明は、上述した課題を解決するために、以下の構成を採用する。 The present invention employs the following configuration in order to solve the above-described problems.

すなわち、本発明の情報処理装置は、
イメージデータ内において所定の表現によって指定された領域について、第１の領域指定表現により指定された第１領域と、前記第１の領域指定表現とは異なる第２の領域指定表現により指定された第２領域とを認識する領域認識部と、
前記イメージデータ内において、文字認識の対象となる領域を指定するための位置情報として、前記領域認識部により認識された前記第１領域の位置情報を取得する位置情報取得部と、
前記領域認識部により認識された前記第２領域内に存在する文字を認識することで得られる文字情報を、前記位置情報取得部により取得された前記位置情報により指定される前記文字認識の対象となる領域についての項目名として取得する項目名取得部と、
を備えることを特徴とする。 That is, the information processing apparatus of the present invention
For the area specified by the predetermined expression in the image data, the first area specified by the first area specifying expression and the second area specifying expression different from the first area specifying expression are designated. An area recognition unit for recognizing two areas;
In the image data, a position information acquisition unit that acquires position information of the first region recognized by the region recognition unit as position information for designating a region that is a target of character recognition;
Character information obtained by recognizing characters existing in the second area recognized by the area recognition unit is the character recognition target specified by the position information acquired by the position information acquisition unit. An item name acquisition unit to acquire as an item name for the area
It is characterized by providing.

ここで、領域指定表現とは、領域を指定するための表現である、枠、塗りつぶし、ハッチング等を指す。 Here, the area designation expression refers to an expression for designating an area, such as a frame, painting, or hatching.

上記構成によれば、イメージデータ内における第１領域と第２領域が認識される。そして、第１領域からは、文字認識の対象となる領域を指定するための位置情報が取得される。また、第２領域からは、該文字認識の対象となる領域についての項目名が取得される。そのため、ユーザは、取得された位置情報に係る文字認識の対象となる領域についての項目名を手入力により設定する必要がなくなる。したがって、上記構成によれば、ＯＣＲソフト等に用いられる定義情報作成の効率化を図ることができる。 According to the above configuration, the first area and the second area in the image data are recognized. And the positional information for designating the area | region used as the object of character recognition is acquired from a 1st area | region. Moreover, the item name about the area | region used as the object of this character recognition is acquired from a 2nd area | region. This eliminates the need for the user to manually set item names for the areas for character recognition related to the acquired position information. Therefore, according to the above configuration, it is possible to improve the efficiency of creating definition information used for OCR software or the like.

また、本発明の別の形態として、本発明の情報処理装置は、
前記第１領域と前記第２領域とを対応付ける対応付け部を更に備え、
前記項目名取得部は、前記第２領域から得られた前記文字情報を、前記対応付け部により該第２領域に対応付けられた前記第１領域から取得された位置情報により指定される前記文字認識の対象となる領域についての項目名として取得してもよい。 As another form of the present invention, the information processing apparatus of the present invention
An association unit that associates the first area with the second area;
The item name acquisition unit is configured to specify the character information obtained from the second area by the position information acquired from the first area associated with the second area by the association unit. You may acquire as an item name about the area | region used as recognition object.

上記構成によれば、文字認識の対象となる領域を指定するための位置情報と、該文字認識の対象となる領域についての項目名との対応付けが行われる。そのため、ユーザは、取得された位置情報と項目名との対応付けを行う必要がなくなる。したがって、上記構成によれば、ＯＣＲソフト等に用いられる定義情報作成の効率化を図ることができる。 According to the above configuration, the position information for designating the area to be character-recognized is associated with the item name for the area to be character-recognized. This eliminates the need for the user to associate the acquired position information with the item name. Therefore, according to the above configuration, it is possible to improve the efficiency of creating definition information used for OCR software or the like.

また、本発明の別の形態として、前記対応付け部は、前記第１領域と、イメージデータ上前記第１領域の最も近くにある前記第２領域とを対応付けてもよい。 As another form of the present invention, the association unit may associate the first region with the second region closest to the first region in image data.

また、本発明の別の形態として、前記対応付け部は、前記第１領域の位置と前記第２領域の位置との位置関係が所定の条件を満たすか否かを判定し、所定の条件を満たすと判定した前記第１領域と前記第２領域とを対応付けてもよい。 As another embodiment of the present invention, the association unit determines whether a positional relationship between the position of the first region and the position of the second region satisfies a predetermined condition, and sets the predetermined condition. The first area determined to satisfy the second area may be associated with the second area.

また、本発明の別の形態として、前記対応付け部は、イメージデータ内において縦方向に並ぶ複数の第１領域と縦方向に並ぶ複数の第２領域のうち、横方向に並ぶ１つの第１領域と１つの第２領域に対して前記所定の条件を満たすと判定してもよい。 As another form of the present invention, the associating unit includes one first line arranged in the horizontal direction among a plurality of first areas arranged in the vertical direction and a plurality of second areas arranged in the vertical direction in the image data. It may be determined that the predetermined condition is satisfied for the region and one second region.

また、本発明の別の形態として、前記対応付け部は、イメージデータ内において横方向に並ぶ複数の第１領域と横方向に並ぶ複数の第２領域のうち、縦方向に並ぶ１つの第１領域と１つの第２領域に対して前記所定の条件を満たすと判定してもよい。 As another form of the present invention, the associating unit includes one first line arranged in the vertical direction among a plurality of first areas arranged in the horizontal direction and a plurality of second areas arranged in the horizontal direction in the image data. It may be determined that the predetermined condition is satisfied for the region and one second region.

また、本発明の別の形態として、前記対応付け部は、イメージデータ内に存在する、前記第１領域と前記第２領域の対応関係を示す所定の対応関係指示表現を認識し、該認識した対応関係に基づいて、前記第１領域と前記第２領域とを対応付けてもよい。 As another embodiment of the present invention, the association unit recognizes and recognizes a predetermined correspondence instruction expression indicating the correspondence between the first area and the second area, which exists in the image data. Based on the correspondence, the first area and the second area may be associated with each other.

また、本発明の別の形態として、本発明の情報処理装置は、前記位置情報取得部により取得された前記文字認識の対象となる領域を指定するための前記位置情報と、前記項目名取得部により取得された、前記位置情報により指定される前記文字認識の対象となる領域についての前記項目名とを含む項目定義情報を作成する項目定義情報作成部を更に備えてもよい。 As another form of the present invention, the information processing apparatus according to the present invention includes the position information acquired by the position information acquisition unit for designating an area to be subjected to character recognition, and the item name acquisition unit. And an item definition information creating unit that creates item definition information including the item name for the area to be recognized by the character specified by the position information.

なお、本発明の別態様としては、以上の各構成を実現する情報処理方法であってもよいし、プログラムであってもよいし、このようなプログラムを記録したコンピュータが読み取り可能な記憶媒体であってもよい。また、本発明の別態様として、以上の各構成を実現する複数の装置が通信可能に構成された情報処理システムであってもよい。 As another aspect of the present invention, an information processing method that implements each of the above configurations, a program, or a computer-readable storage medium that records such a program may be used. There may be. Further, as another aspect of the present invention, an information processing system in which a plurality of devices that realize each of the above configurations is configured to be communicable may be used.

本発明によれば、ＯＣＲソフト等に用いられる定義情報作成の効率化を図ることができる技術を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the technique which can aim at the efficiency improvement of the definition information used for OCR software etc. can be provided.

図１は、実施の形態に係る情報処理装置の処理を例示する。FIG. 1 illustrates the processing of the information processing apparatus according to the embodiment. 図２は、実施の形態に係る情報処理装置の構成を例示する。FIG. 2 illustrates the configuration of the information processing apparatus according to the embodiment. 図３は、実施の形態に係る情報処理装置の処理手順の一例を示したフローチャートである。FIG. 3 is a flowchart illustrating an example of a processing procedure of the information processing apparatus according to the embodiment. 図４は、実施の形態に係る情報処理装置により処理されるイメージデータの一例を示す。FIG. 4 shows an example of image data processed by the information processing apparatus according to the embodiment. 図５は、第１領域と第２領域の走査順の一例を示す。FIG. 5 shows an example of the scanning order of the first region and the second region. 図６は、第１領域と第２領域の対応付けの一例を示す。FIG. 6 shows an example of the association between the first area and the second area. 図７は、第１領域と第２領域の対応付けの一例を示す。FIG. 7 shows an example of the association between the first area and the second area. 図８は、第１領域と第２領域の対応付けの一例を示すFIG. 8 shows an example of the association between the first area and the second area. 図９は、第１領域と第２領域の対応付けの一例を示す。FIG. 9 shows an example of the association between the first area and the second area. 図１０は、図４に示されるイメージデータから取得される項目定義情報の一例を示す。FIG. 10 shows an example of item definition information acquired from the image data shown in FIG.

以下、本発明の一側面に係る情報処理装置、情報処理方法及びプログラム等の実施の形態（以下、「本実施形態」とも表記する）を説明する。ただし、本実施形態は例示であり、本発明は本実施形態の構成に限定されない。 Hereinafter, embodiments of an information processing apparatus, an information processing method, a program, and the like according to one aspect of the present invention (hereinafter also referred to as “this embodiment”) will be described. However, the present embodiment is an exemplification, and the present invention is not limited to the configuration of the present embodiment.

なお、本実施形態において登場するデータを自然言語（日本語等）により説明しているが、より具体的には、コンピュータが認識可能な疑似言語、コマンド、パラメタ、マシン語等で指定される。 Although the data appearing in the present embodiment is described in a natural language (such as Japanese), more specifically, it is specified in a pseudo language, a command, a parameter, a machine language, or the like that can be recognized by a computer.

§１情報処理装置
本実施形態に係る情報処理装置を図１及び図２を用いて説明する。 §1 Information processing apparatus An information processing apparatus according to the present embodiment will be described with reference to FIGS. 1 and 2.

＜概要＞
図１は、本実施形態に係る情報処理装置が実行する処理を例示する。本実施形態に係る情報処理装置は、イメージデータ内において所定の表現によって指定された領域である第１領域５０と第２領域６０とを認識する。 <Overview>
FIG. 1 illustrates processing executed by the information processing apparatus according to the present embodiment. The information processing apparatus according to the present embodiment recognizes the first area 50 and the second area 60 that are areas designated by predetermined expressions in the image data.

第１領域５０は、第１の領域指定表現により指定される。他方、第２領域６０は、第２の領域指定表現により指定される。つまり、第１領域５０と第２領域６０とは領域指定表現が異なる。領域指定表現とは、領域を指定するための表現であり、例えば、枠、塗りつぶし、各種のハッチング等である。図１に示される例では、第１の領域指定表現は、枠のみである。つまり、第１の領域指定表現では、枠内において、塗りつぶし及び各種ハッチング等がなされていない。他方、第２の領域指定表現は、図１に示される例では、塗りつぶしである。 The first area 50 is designated by the first area designation expression. On the other hand, the second area 60 is designated by the second area designation expression. That is, the area designation expression is different between the first area 50 and the second area 60. The area designation expression is an expression for designating an area, and is, for example, a frame, painting, various types of hatching, and the like. In the example shown in FIG. 1, the first area designation expression is only a frame. That is, in the first area designation expression, the fill and various hatching are not performed within the frame. On the other hand, the second area designation expression is a fill in the example shown in FIG.

第１領域５０は、イメージデータ内において文字認識の対象として指定される領域である。また、第２領域６０は、文字認識の対象として指定される領域についての項目名が存在する領域である。 The first area 50 is an area designated as a character recognition target in the image data. The second area 60 is an area in which item names for areas designated as character recognition targets exist.

例えば、ユーザは、帳票又はカルテ等の紙面上に、マーカ、シール、又は、印刷等によ
り、枠、塗りつぶし、又は、各種のハッチング等を描画し、第１領域５０及び第２領域６０の指定を行う。情報処理装置は、このようにして第１領域５０及び第２領域６０が指定された紙をスキャナ等により読み取ることで、第１領域５０及び第２領域６０が指定されたイメージデータを取得する。 For example, the user draws a frame, a fill, or various hatchings on a paper surface such as a form or a chart by a marker, a seal, or printing, and designates the first area 50 and the second area 60. Do. The information processing apparatus acquires image data in which the first area 50 and the second area 60 are specified by reading the paper in which the first area 50 and the second area 60 are specified in this way by a scanner or the like.

本実施形態に係る情報処理装置は、異なる領域指定表現により指定されている第１領域５０と第２領域６０とを認識する。そして、本実施形態に係る情報処理装置は、第１領域５０から、文字認識の対象となる領域を指定するための位置情報を取得する。また、本実施形態に係る情報処理装置は、第２領域６０から、該文字認識の対象となる領域についての項目名を取得する。 The information processing apparatus according to the present embodiment recognizes the first area 50 and the second area 60 that are designated by different area designation expressions. Then, the information processing apparatus according to the present embodiment acquires position information for designating an area that is a character recognition target from the first area 50. In addition, the information processing apparatus according to the present embodiment acquires the item name for the area that is the target of character recognition from the second area 60.

このように、本実施形態に係る情報処理装置は、イメージデータ上に指定された第１領域及び第２領域から、文字認識の対象となる領域についての位置情報と項目名とをそれぞれ取得することにより、ユーザによる定義情報作成の効率化を図る。 As described above, the information processing apparatus according to the present embodiment acquires the position information and the item name about the area that is the target of character recognition from the first area and the second area specified on the image data. As a result, the efficiency of the definition information creation by the user is improved.

なお、ユーザは、イメージデータを描画ソフト等により編集することで、該イメージデータ上に第１領域５０及び第２領域６０を指定してもよい。 The user may specify the first area 50 and the second area 60 on the image data by editing the image data with drawing software or the like.

＜構成例＞
図２は、本実施形態に係る情報処理装置１の構成例を示す。情報処理装置１は、図２に示されるとおり、そのハードウェア構成として、バス１３に接続される、記憶部１１、制御部１２、入出力部１４等を有している。 <Configuration example>
FIG. 2 shows a configuration example of the information processing apparatus 1 according to the present embodiment. As illustrated in FIG. 2, the information processing apparatus 1 includes a storage unit 11, a control unit 12, an input / output unit 14, and the like that are connected to the bus 13 as a hardware configuration.

記憶部１１は、制御部１２で実行される処理で利用される各種データ及びプログラムを記憶する（不図示）。記憶部１１は、例えば、ハードディスクによって実現される。記憶部１１は、ＵＳＢメモリ等の記録媒体により実現されてもよい。 The storage unit 11 stores various data and programs used in processing executed by the control unit 12 (not shown). The storage unit 11 is realized by a hard disk, for example. The storage unit 11 may be realized by a recording medium such as a USB memory.

なお、記憶部１１が格納する当該各種データ及びプログラムは、ＣＤ（Compact Disc）又はＤＶＤ（Digital Versatile Disc）等の記録媒体から取得されてもよい。また、記憶部１１は、補助記憶装置と呼ばれてもよい。 The various data and programs stored in the storage unit 11 may be obtained from a recording medium such as a CD (Compact Disc) or a DVD (Digital Versatile Disc). The storage unit 11 may be referred to as an auxiliary storage device.

制御部１２は、マイクロプロセッサ又はＣＰＵ（Central Processing Unit）等の１又
は複数のプロセッサと、このプロセッサの処理に利用される周辺回路（ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）、インタフェース回路等）と、を有する。制御部１２は、記憶部１１に格納されている各種データ及びプログラムを実行することにより、本実施形態における情報処理装置１の処理を実現する。ＲＯＭ、ＲＡＭ等は、制御部１２内のプロセッサが取り扱うアドレス空間に配置されているという意味で主記憶装置と呼ばれてもよい。 The control unit 12 includes one or a plurality of processors such as a microprocessor or a CPU (Central Processing Unit), and peripheral circuits (ROM (Read Only Memory), RAM (Random Access Memory), interface circuits) used for processing of the processor. Etc.). The control unit 12 implements the processing of the information processing apparatus 1 in the present embodiment by executing various data and programs stored in the storage unit 11. ROM, RAM, and the like may be referred to as a main storage device in the sense that they are arranged in an address space handled by a processor in the control unit 12.

入出力部１４は、情報処理装置１の外部に存在する装置とデータの送受信を行うための１又は複数のインタフェースである。入出力部１４は、例えば、ＬＡＮ（Local Area Network）ケーブルを接続するためのインタフェース、入力装置及び出力装置等のユーザインタフェースと接続するためのインタフェース、又はＵＳＢ（Universal Serial Bus）等のインタフェースである。 The input / output unit 14 is one or a plurality of interfaces for transmitting / receiving data to / from a device existing outside the information processing device 1. The input / output unit 14 is, for example, an interface for connecting a LAN (Local Area Network) cable, an interface for connecting to a user interface such as an input device and an output device, or an interface such as USB (Universal Serial Bus). .

入出力部１４は、図２に示されるように、例えば、スキャナ２と接続してもよい。また、入出力部１４は、不図示のユーザインタフェース（タッチパネル、テンキー、キーボード、マウス、ディスプレイ等の入出力装置）と接続してもよい。更に、入出力部１４は、ＣＤドライブ、ＤＶＤドライブ等の着脱可能な記録媒体の入出力装置、或いはメモリカード等の不揮発性の可搬型の記録媒体等の入出力装置と接続してもよい。入出力部１４は、
ネットワーク接続を行うインタフェース（通信部）としての機能を有してもよい。 As shown in FIG. 2, the input / output unit 14 may be connected to the scanner 2, for example. The input / output unit 14 may be connected to a user interface (not shown) (input / output devices such as a touch panel, a numeric keypad, a keyboard, a mouse, and a display). Further, the input / output unit 14 may be connected to an input / output device such as a CD drive or a DVD drive or a removable recording medium, or a non-volatile portable recording medium such as a memory card. The input / output unit 14
You may have a function as an interface (communication part) which performs network connection.

本実施形態に係る情報処理装置は、文字認識の対象となる領域についての位置情報と項目名とをそれぞれ取得することにより、ユーザによる定義情報作成の効率化を図る。該処理は、制御部１２の処理として実現される。 The information processing apparatus according to the present embodiment obtains position information and item names for an area that is a character recognition target, thereby improving the efficiency of definition information creation by the user. This process is realized as a process of the control unit 12.

図２に示されるとおり、制御部１２は、上記処理を実現するために、領域認識部３１、位置情報取得部３２、項目名取得部３３、対応付け部３４、及び、項目定義情報作成部３５を含む。領域認識部３１、位置情報取得部３２、項目名取得部３３、対応付け部３４、及び、項目定義情報作成部３５は、例えば、記憶部１１に格納されたプログラム等が制御部１２の周辺回路であるＲＡＭ等に展開され、制御部１２のプロセッサにより実行されることによって実現される。 As illustrated in FIG. 2, the control unit 12 includes a region recognition unit 31, a position information acquisition unit 32, an item name acquisition unit 33, an association unit 34, and an item definition information creation unit 35 in order to realize the above processing. including. For example, the area recognition unit 31, the position information acquisition unit 32, the item name acquisition unit 33, the association unit 34, and the item definition information creation unit 35 may be configured such that a program stored in the storage unit 11 is a peripheral circuit of the control unit 12. This is realized by being expanded in a RAM or the like and executed by the processor of the control unit 12.

領域認識部３１は、イメージデータ内において所定の表現によって指定された領域について、第１の領域指定表現により指定された第１領域と、前記第１の領域指定表現とは異なる第２の領域指定表現により指定された第２領域とを認識する。領域認識部３１は、例えば、図１に示される、第１領域５０及び第２領域６０を区別して認識する。 The area recognition unit 31 includes a first area designated by the first area designation expression and a second area designation different from the first area designation expression for the area designated by the predetermined expression in the image data. Recognizing the second area designated by the expression. For example, the region recognition unit 31 distinguishes and recognizes the first region 50 and the second region 60 shown in FIG.

位置情報取得部３２は、イメージデータ内において、文字認識の対象となる領域を指定するための位置情報として、領域認識部により認識された第１領域の位置情報を取得する。図１に示されるように、位置情報取得部３２は、例えば、文字認識の対象となる領域を指定するための位置情報として、イメージデータ内における第１領域５０の位置情報を取得する。 The position information acquisition unit 32 acquires the position information of the first area recognized by the area recognition unit as position information for designating an area to be character-recognized in the image data. As illustrated in FIG. 1, the position information acquisition unit 32 acquires, for example, position information of the first area 50 in the image data as position information for designating an area that is a character recognition target.

なお、位置情報取得部３２は、後述する対応付け部３４の処理のために、第２領域の位置情報を取得してもよい。位置情報取得部３２は、例えば、図１に示されるイメージデータ内の第２領域６０の位置情報を取得してもよい。 Note that the position information acquisition unit 32 may acquire the position information of the second region for the processing of the association unit 34 described later. For example, the position information acquisition unit 32 may acquire position information of the second region 60 in the image data illustrated in FIG.

項目名取得部３３は、領域認識部３１により認識された第２領域内に存在する文字を認識することにより得られる文字情報を、位置情報取得部３２により取得された位置情報により指定される文字認識の対象となる領域についての項目名として取得する。図１に示されるように、例えば、項目名取得部３３は、第２領域内に存在する文字を文字認識することにより得られる文字情報を、第１領域５０についての項目名として取得する。 The item name acquisition unit 33 uses the character information specified by the position information acquired by the position information acquisition unit 32 as the character information obtained by recognizing the character existing in the second region recognized by the region recognition unit 31. Acquired as the item name for the area to be recognized. As shown in FIG. 1, for example, the item name acquisition unit 33 acquires character information obtained by recognizing characters existing in the second area as an item name for the first area 50.

なお、後述するとおり、第１領域と第２領域は、対応付け部３４により対応づけられる。本実施形態に係る項目名取得部３３は、第２領域から得られる文字情報を、対応付け部３４により該第２領域に対応付けられた第１領域から取得された位置情報により指定される文字認識の対象となる領域についての項目名として取得する。 As will be described later, the first area and the second area are associated by the association unit 34. The item name acquisition unit 33 according to the present embodiment uses the character information obtained from the second area as the character specified by the position information acquired from the first area associated with the second area by the association unit 34. Acquired as the item name for the area to be recognized.

対応付け部３４は、第１領域と第２領域とを対応付ける。 The association unit associates the first area with the second area.

例えば、対応付け部３４は、第１領域と、イメージデータ上該第１領域の最も近くにある第２領域とを対応付ける。 For example, the associating unit 34 associates the first region with the second region closest to the first region in the image data.

また、例えば、対応付け部３４は、第１領域の位置と第２領域の位置との位置関係が所定の条件を満たすか否かを判定し、所定の条件を満たすと判定した第１領域と第２領域とを対応付ける。所定の条件とは、対応関係にある第１領域と第２領域の位置関係を条件付ける。詳細は、後述する。 Further, for example, the associating unit 34 determines whether or not the positional relationship between the position of the first region and the position of the second region satisfies a predetermined condition, and the first region determined to satisfy the predetermined condition The second area is associated. The predetermined condition conditions the positional relationship between the first region and the second region that are in a corresponding relationship. Details will be described later.

また、例えば、対応付け部３４は、イメージデータ内に存在する、第１領域と第２領域
の対応付けを示す所定の対応関係指示表現を認識する。そして、対応付け部３４は、該認識した対応関係に基づいて、第１領域と第２領域とを対応付ける。 Further, for example, the association unit 34 recognizes a predetermined correspondence instruction expression indicating the association between the first area and the second area, which exists in the image data. Then, the associating unit 34 associates the first area with the second area based on the recognized correspondence relationship.

対応関係指示表現は、第１領域と第２領域の対応付けを示すものである。例えば、対応関係指示表現は、第１領域と第２領域の間に設けられた矢印、第１領域と第２領域とを結ぶ線分、第１領域と第２領域に記された同じ記号又は印である。対応関係指示表現は、第１領域と第２領域の対応関係を示すことができるものであるならば、いかなるものであってもよい。 The correspondence relationship instruction expression indicates the correspondence between the first area and the second area. For example, the correspondence relationship instruction expression includes an arrow provided between the first region and the second region, a line segment connecting the first region and the second region, the same symbol written in the first region and the second region, or It is a mark. The correspondence relationship instruction expression may be anything as long as it can indicate the correspondence relationship between the first region and the second region.

項目定義情報作成部３５は、位置情報取得部３２により取得された文字認識の対象となる領域を指定するための位置情報と、項目名取得部３３により取得された、該位置情報により指定される文字認識の対象となる領域についての項目名とを含む項目定義情報を作成する。作成された項目定義情報は、文字認識の対象となる領域の位置及び項目名を指定する情報である。該項目定義情報は、例えば、ＯＣＲソフト等により用いられる。 The item definition information creation unit 35 is specified by the position information acquired by the position information acquisition unit 32 and the position information for specifying the area to be recognized by the character, and the position information acquired by the item name acquisition unit 33. Create item definition information that includes the item name for the area to be recognized. The created item definition information is information for designating the position and item name of an area that is a character recognition target. The item definition information is used by, for example, OCR software.

§２動作例
次に、図３を用いて、本実施形態に係る情報処理装置１の動作例を説明する。図３は、本実施形態に係る情報処理装置１の処理手順の一例を示す。なお、図３では、ステップを「Ｓ」と略称する。 §2 Operation Example Next, an operation example of the information processing apparatus 1 according to the present embodiment will be described with reference to FIG. FIG. 3 shows an example of a processing procedure of the information processing apparatus 1 according to the present embodiment. In FIG. 3, the step is abbreviated as “S”.

＜スタート＞
まず、例えば、ユーザによる操作に応じて、記憶部１１に格納されたプログラムが、制御部１２のＲＡＭ等に展開される。そして、制御部１２のＲＡＭ等に展開された該プログラムが、制御部１２のプロセッサにより実行される。このようにして、情報処理装置１は、処理を開始する。 <Start>
First, for example, a program stored in the storage unit 11 is expanded in the RAM or the like of the control unit 12 in accordance with a user operation. Then, the program developed in the RAM or the like of the control unit 12 is executed by the processor of the control unit 12. In this way, the information processing apparatus 1 starts processing.

＜ステップ１０１＞
次に、制御部１２は、当該処理に用いられるイメージデータを取得する（ステップ１０１）。取得されるイメージデータは、例えば、図２に示されるスキャナ２により取り込まれたデータであってもよい。また、取得されるイメージデータは、記憶部１１に格納されたデータであってもよい。このようなイメージデータは、ネットワークを介して取得されてもよい。また、イメージデータは、メモリカード等の不揮発性の可搬型の記録媒体等から取得されてもよい。 <Step 101>
Next, the control part 12 acquires the image data used for the said process (step 101). The acquired image data may be, for example, data captured by the scanner 2 shown in FIG. Further, the acquired image data may be data stored in the storage unit 11. Such image data may be acquired via a network. The image data may be acquired from a non-volatile portable recording medium such as a memory card.

図４は、この時に取得されるイメージデータの一例を示す。イメージデータは、例えば、帳票及びカルテ等の紙媒体を電子化することにより得られるデータである。図４に示されるとおり、第１領域（５０ａ、５０ｂ）及び第２領域（６０ａ、６０ｂ）は、帳票及びカルテ等に記載される欄及び文字等の上に指定される。第１領域（５０ａ、５０ｂ）及び第２領域（６０ａ、６０ｂ）は、帳票及びカルテ等に記載される欄及び文字等とは区別可能に表現される。 FIG. 4 shows an example of image data acquired at this time. The image data is data obtained by digitizing paper media such as forms and medical records, for example. As shown in FIG. 4, the first area (50a, 50b) and the second area (60a, 60b) are designated on columns, characters, and the like described in the form and medical record. The first area (50a, 50b) and the second area (60a, 60b) are expressed so as to be distinguishable from columns, characters, and the like described in the form and medical record.

例えば、第１領域（５０ａ、５０ｂ）及び第２領域（６０ａ、６０ｂ）は、帳票及びカルテ等に記載される欄及び文字等と明確に区別するために、帳票及びカルテ等に記載される欄及び文字等の色とは異なる色で表現されてもよい。このように表現されていれば、該異なる色を検知して読み取るＯＣＲエンジンにより、イメージデータ内において描画されているものの中から、第１領域（５０ａ、５０ｂ）及び第２領域（６０ａ、６０ｂ）に係る領域指定表現のみを抽出することができる。例えば、帳票及びカルテ等に記載されている欄及び文字等が黒色であるとすると、該ＯＣＲエンジンは、該黒色以外の色を検知して読み取ることで、第１領域（５０ａ、５０ｂ）及び第２領域（６０ａ、６０ｂ）を抽出する。 For example, the first area (50a, 50b) and the second area (60a, 60b) are fields described in the form and medical record in order to clearly distinguish them from the fields and characters described in the form and medical record. Also, it may be expressed in a color different from the color of characters and the like. If expressed in this way, the first area (50a, 50b) and the second area (60a, 60b) out of those drawn in the image data by the OCR engine that detects and reads the different colors. It is possible to extract only the area designation expression related to. For example, assuming that the fields and characters described in the form and medical record are black, the OCR engine detects and reads a color other than the black color, so that the first region (50a, 50b) and the first Two regions (60a, 60b) are extracted.

ただし、第１領域（５０ａ、５０ｂ）及び第２領域（６０ａ、６０ｂ）は、帳票及びカルテ等に記載される欄及び文字等の色と必ず異なる色で表現されなければならない訳ではない。例えば、第１領域（５０ａ、５０ｂ）及び第２領域（６０ａ、６０ｂ）は、帳票及びカルテ等に記載される欄等の領域指定表現と区別可能な領域指定表現により表現されているならば、帳票及びカルテ等に記載される欄及び文字等の色と同じ色で表現されてもよい。 However, the first area (50a, 50b) and the second area (60a, 60b) do not necessarily have to be expressed in a color different from the color of the columns and characters described in the form and medical record. For example, if the first area (50a, 50b) and the second area (60a, 60b) are expressed by an area designation expression that can be distinguished from an area designation expression such as a column described in a form and a medical record, It may be expressed in the same color as the color of the fields and characters described in the form and medical record.

＜ステップ１０２＞
次に、図３に示されるとおり、制御部１２は、ステップ１０１において取得したイメージデータ内における第１領域を認識する（ステップ１０２）。 <Step 102>
Next, as shown in FIG. 3, the control unit 12 recognizes the first region in the image data acquired in step 101 (step 102).

図４に示されるイメージデータにおいて、枠が、第１の領域指定表現として用いられている。言いかえると、図４に示されるイメージデータにおいて、第１領域（５０ａ、５０ｂ）は、枠によって表現されている。制御部１２は、当該枠によって表現されている第１領域（５０ａ、５０ｂ）を認識する。 In the image data shown in FIG. 4, a frame is used as the first area designation expression. In other words, in the image data shown in FIG. 4, the first regions (50a, 50b) are represented by frames. The control unit 12 recognizes the first area (50a, 50b) represented by the frame.

例えば、制御部１２は、イメージデータ内において描画されているものの中から、第１領域及び第２領域に係る領域指定表現を抽出する。当該抽出は、第１領域（５０ａ、５０ｂ）及び第２領域（６０ａ、６０ｂ）は、帳票及びカルテ等に記載される欄及び文字等とは区別可能に表現されているため、実行可能である。続いて、制御部１２は、抽出した第１領域及び第２領域に係る領域指定表現から、第１の領域指定表現に係る領域を特定する。当該特定は、例えば、パターンマッチング等により実現される。そして、制御部１２は、特定された領域を、第１領域として認識する。このようにして、制御部１２は、図４に示されるイメージデータ内において、枠によって表現されている第１領域（５０ａ、５０ｂ）を認識する。 For example, the control unit 12 extracts region designation expressions related to the first region and the second region from those drawn in the image data. The extraction is feasible because the first area (50a, 50b) and the second area (60a, 60b) are expressed so as to be distinguishable from the columns and characters described in the form and medical record. . Subsequently, the control unit 12 specifies an area related to the first area designation expression from the extracted area designation expressions related to the first area and the second area. The identification is realized by pattern matching or the like, for example. Then, the control unit 12 recognizes the specified area as the first area. In this way, the control unit 12 recognizes the first areas (50a, 50b) represented by the frames in the image data shown in FIG.

＜ステップ１０３＞
次に、制御部１２は、ステップ１０２において認識した第１領域のイメージデータ内における位置情報を取得する（ステップ１０３）。 <Step 103>
Next, the control unit 12 acquires position information in the image data of the first area recognized in step 102 (step 103).

位置情報は、イメージデータ内における位置を示す情報であれば、いかなる情報であっても構わない。本実施形態では、位置情報は、イメージデータの左上端を原点、横軸をｘ軸、縦軸をｙ軸としたｘｙ座標系で表現される。ただし、位置情報の表現は、ｘｙ座標系に限定される訳ではない。例えば、位置情報の表現は、イメージデータのある一点（例えば、イメージデータの中心）を原点とする極座標系であってもよい。 The position information may be any information as long as it is information indicating a position in the image data. In the present embodiment, the position information is expressed in an xy coordinate system in which the upper left corner of the image data is the origin, the horizontal axis is the x axis, and the vertical axis is the y axis. However, the representation of the position information is not limited to the xy coordinate system. For example, the representation of the position information may be a polar coordinate system having an origin at a certain point in the image data (for example, the center of the image data).

また、本実施形態に係る第１領域の位置情報は、第１領域の左上端の位置（座標）、横の長さ、及び、縦の長さを含む。当該位置情報は、後述する図９において例示される。制御部１２は、ステップ１０２において認識した第１領域の左上端の位置座標を特定する。また、制御部１２は、認識した第１領域の横の長さと縦の長さを特定する。これにより、制御部１２は、認識した第１領域のイメージデータ内における位置情報を取得する。 Further, the position information of the first region according to the present embodiment includes the position (coordinates) of the upper left end of the first region, the horizontal length, and the vertical length. The position information is exemplified in FIG. 9 described later. The control unit 12 specifies the position coordinates of the upper left end of the first area recognized in step 102. Further, the control unit 12 specifies the horizontal length and the vertical length of the recognized first region. Thereby, the control part 12 acquires the positional information in the image data of the recognized 1st area | region.

＜ステップ１０４＞
次に、制御部１２は、ステップ１０１において取得したイメージデータ内における第２領域を認識する（ステップ１０４）。 <Step 104>
Next, the control unit 12 recognizes the second area in the image data acquired in Step 101 (Step 104).

図４に示されるイメージデータにおいて、塗りつぶしが、第２の領域指定表現として用いられている。言いかえると、図４に示されるイメージデータにおいて、第２領域（６０ａ、６０ｂ）は、塗りつぶしによって表現されている。制御部１２は、当該塗りつぶしに
よって表現されている第２領域（６０ａ、６０ｂ）を認識する。なお、該第２領域の認識は、ステップ１０２における第１領域の認識方法と同様の方法で行われる。 In the image data shown in FIG. 4, the fill is used as the second area designation expression. In other words, in the image data shown in FIG. 4, the second area (60a, 60b) is expressed by painting. The control unit 12 recognizes the second area (60a, 60b) expressed by the filling. The second area is recognized by the same method as the first area recognition method in step 102.

＜ステップ１０５＞
次に、制御部１２は、ステップ１０４において認識した第２領域のイメージデータ内における位置情報を取得する（ステップ１０５）。なお、当該ステップ１０５は、省略されてもよい。本実施形態では、後述するステップ１０７における対応付けにおいて第２領域の位置情報が用いられるため、該第２領域の位置情報が取得される。なお、第２領域の位置情報は、ステップ１０３における第１領域の位置情報と同様である。 <Step 105>
Next, the control unit 12 acquires position information in the image data of the second area recognized in Step 104 (Step 105). Note that step 105 may be omitted. In the present embodiment, since the position information of the second area is used in the association in step 107 described later, the position information of the second area is acquired. The position information of the second area is the same as the position information of the first area in step 103.

＜ステップ１０６＞
次に、制御部１２は、ステップ１０４において認識した第２領域内に存在する文字を文字認識することで、該第２領域内に存在する文字の文字情報を取得する（ステップ１０６）。 <Step 106>
Next, the control unit 12 recognizes characters existing in the second area recognized in step 104, thereby acquiring character information of the characters existing in the second area (step 106).

文字認識は、いかなる方法によって実行されてもよい。本ステップ１０６において、制御部１２は、第２領域内に記載された文字を文字認識することで、該第２領域内に記載された文字の文字情報を取得する。 Character recognition may be performed by any method. In step 106, the control unit 12 recognizes the characters described in the second area, thereby acquiring character information of the characters described in the second area.

なお、文字情報は、文字認識の対象となる第１領域についての項目名として取得される。第１領域と第２領域とが１つずつしか存在しない場合、第１領域と第２領域の組合せは１通りしか考えられないため、第１領域と第２領域の対応関係を特定する必要はない。すなわち、本ステップ１０６において第２領域から取得される文字情報が、どの第１領域についての項目名であるかを特定する必要はない。本ステップ１０６において文字情報が取得された時点において、該文字情報は、ステップ１０２及び１０３に係る第１領域についての項目名として特定される。 Note that the character information is acquired as an item name for the first area to be character-recognized. If there is only one each of the first area and the second area, only one combination of the first area and the second area can be considered, so it is necessary to specify the correspondence between the first area and the second area. Absent. That is, it is not necessary to specify which first area the character information acquired from the second area in this step 106 is the item name. When the character information is acquired in this step 106, the character information is specified as an item name for the first area according to steps 102 and 103.

他方、第１領域と第２領域とがそれぞれ複数存在する場合、第２領域から取得される文字情報が、どの第１領域についての項目名であるか特定される必要がある。本実施形態では、後述するステップ１０７において、第１領域と第２領域とが対応づけられることによって、第２領域から取得される文字情報が、どの第１領域についての項目名であるか特定される。 On the other hand, when there are a plurality of first areas and a plurality of second areas, it is necessary to specify which first area the character information acquired from the second area is the item name of. In the present embodiment, in step 107 to be described later, the first area and the second area are associated with each other, thereby specifying which first area the character information acquired from the second area is the item name of. The

しかしながら、このような対応付けが常に必要となる訳ではない。例えば、図５に示されるように、制御部１２が、イメージデータの上部から順番に走査し、ステップ１０２の第１領域の認識、及び、ステップ１０４の第２領域の認識を実行するとする。そして、制御部１２は、第１領域を１つ、第２領域を１つ見つけるたびに、ステップ１０２〜１０６の処理を繰り返すとする。この時、処理に係る第１領域と第２領域は常に１つずつとなるため、上記対応付けの処理は不要となる。 However, such association is not always necessary. For example, as illustrated in FIG. 5, it is assumed that the control unit 12 sequentially scans from the upper part of the image data and executes recognition of the first area in step 102 and recognition of the second area in step 104. And it is assumed that the control part 12 repeats the process of steps 102-106 whenever it finds one 1st area | region and one 2nd area | region. At this time, since the first area and the second area related to the process are always one by one, the above-described association process becomes unnecessary.

なお、例えば、このように処理が実行されると、図５に示される例では、第２領域６０ａから取得される文字情報は、第１領域５０ａについての項目名として特定される。また、第２領域６０ｂから取得される文字情報は、第１領域５０ｂについての項目名として特定される。第２領域６０ｃから取得される文字情報は、第１領域５０ｃについての項目名として特定される。なお、当該処理は、第１領域と第２領域が発見される順序により、ステップ１０２〜１０３とステップ１０４〜１０６は入れ替わりうる。 For example, when the process is executed in this way, in the example shown in FIG. 5, the character information acquired from the second area 60a is specified as the item name for the first area 50a. Moreover, the character information acquired from the 2nd area | region 60b is specified as an item name about the 1st area | region 50b. The character information acquired from the second area 60c is specified as the item name for the first area 50c. In this process, steps 102 to 103 and steps 104 to 106 can be interchanged depending on the order in which the first area and the second area are found.

＜ステップ１０７＞
次に、制御部１２は、ステップ１０２において認識した第１領域と、ステップ１０４において認識した第２領域との対応関係を特定するため、該第１領域と該第２領域とを対応
付ける。本ステップ１０７は、例えば、対応付けに係る第１領域と第２領域が１つずつしかない場合、省略されてもよい。本ステップ１０７は、上述のとおり、第２領域から取得される文字情報が、どの第１領域についての項目名であるかを特定するための処理である。 <Step 107>
Next, the control unit 12 associates the first region with the second region in order to specify the correspondence between the first region recognized at step 102 and the second region recognized at step 104. This step 107 may be omitted, for example, when there is only one first area and second area related to the association. As described above, this step 107 is a process for specifying which first area the character information acquired from the second area is the item name.

制御部１２による対応付けに係る処理の例を、図６〜９を用いて説明する。 An example of processing related to association by the control unit 12 will be described with reference to FIGS.

例えば、制御部１２は、第１領域と、イメージデータ上該第１領域の最も近くにある第２領域とを対応付ける。図６は、当該処理の例を示す。本実施形態では、ステップ１０３及び１０５において、第１領域と第２領域の位置情報が取得されている。当該位置情報には、各領域の左上端の位置座標が含まれている。制御部１２は、当該位置座標を用いて、第１領域と第２領域の距離をそれぞれ計算する。すなわち、制御部１２は、第１領域の左上端の位置座標と第２領域の左上端の位置座標との間の距離をそれぞれ計算する。そして、制御部１２は、当該距離が最短である第１領域と第２領域とを対応付ける。 For example, the control unit 12 associates the first area with the second area closest to the first area in the image data. FIG. 6 shows an example of the processing. In this embodiment, in Steps 103 and 105, position information of the first area and the second area is acquired. The position information includes the position coordinates of the upper left corner of each area. The control unit 12 calculates the distance between the first area and the second area using the position coordinates. That is, the control unit 12 calculates the distance between the position coordinates of the upper left end of the first area and the position coordinates of the upper left end of the second area. Then, the control unit 12 associates the first area with the shortest distance with the second area.

図６において示される例では、制御部１２は、第１領域５０ａと、イメージデータ上該第１領域５０ａに最も近くにある第２領域６０ａとを対応付ける。また、第１領域５０ｂと、イメージデータ上該第１領域５０ｂに最も近くにある第２領域６０ｂとを対応付ける。 In the example shown in FIG. 6, the control unit 12 associates the first area 50a with the second area 60a closest to the first area 50a in the image data. Further, the first area 50b is associated with the second area 60b closest to the first area 50b in the image data.

なお、当該処理における第１領域と第２領域は入れ替わってもよい。すなわち、制御部１２は、第２領域と、イメージデータ上該第２領域の最も近くにある第１領域とを対応付けてもよい。 Note that the first area and the second area in the processing may be interchanged. That is, the control unit 12 may associate the second area with the first area that is closest to the second area in the image data.

また、例えば、制御部１２は、第１領域の位置と第２領域の位置との位置関係が所定の条件を満たすか否かを判定し、所定の条件を満たすと判定した第１領域と第２領域とを対応付けてもよい。 Further, for example, the control unit 12 determines whether or not the positional relationship between the position of the first region and the position of the second region satisfies a predetermined condition, and the first region and the first region determined to satisfy the predetermined condition Two areas may be associated with each other.

所定の条件は、対応関係にある第１領域と第２領域の位置関係を条件付ける。 The predetermined condition conditions the positional relationship between the first region and the second region that are in a correspondence relationship.

例えば、所定の条件は、対応関係にある第１領域と第２領域の距離に関する。制御部１２は、イメージデータ内にある第１領域と第２領域のうち、ユーザによって設定及び変更可能な閾値以内の距離にある第１領域と第２領域に対して所定の条件を満たすと判定する。 For example, the predetermined condition relates to the distance between the first area and the second area that are in a correspondence relationship. The control unit 12 determines that a predetermined condition is satisfied for the first area and the second area that are within a threshold that can be set and changed by the user, among the first area and the second area in the image data. To do.

また、例えば、所定の条件は、対応関係にある第１領域と第２領域の相対的な位置関係に関する。制御部１２は、イメージデータ内にある第１領域と第２領域のうち、ある特定の相対的な位置関係にある第１領域と第２領域に対して所定の条件を満たすと判定する。ここで、本実施形態において、相対的な位置関係は、イメージデータの左上端を原点として、第１領域の左上端を指すベクトルと第２領域の左上端を指すベクトルとの差分ベクトルとして表現されうる。また、ある特定の相対的な位置関係は、当該差分ベクトルが満たすべき条件ベクトルとして表現されうる。そして、例えば、当該差分ベクトルと条件ベクトルとの内積がユーザにより設定及び変更可能な値の範囲に含まれる場合、当該差分ベクトルに係る第１領域と第２領域は、ある特定の相対的な位置関係にあると判定される。 For example, the predetermined condition relates to a relative positional relationship between the first region and the second region that are in a correspondence relationship. The control unit 12 determines that a predetermined condition is satisfied for the first area and the second area that are in a specific relative positional relationship among the first area and the second area in the image data. Here, in the present embodiment, the relative positional relationship is expressed as a difference vector between a vector indicating the upper left end of the first area and a vector indicating the upper left end of the second area with the upper left end of the image data as the origin. sell. Further, a specific relative positional relationship can be expressed as a condition vector that the difference vector should satisfy. For example, when the inner product of the difference vector and the condition vector is included in the range of values that can be set and changed by the user, the first area and the second area related to the difference vector are in a certain relative position. It is determined that there is a relationship.

また、例えば、所定の条件は、対応関係にある第１領域と第２領域の横方向の並び方に関する。制御部１２は、イメージデータ内において縦方向に並ぶ第１領域と縦方向に並ぶ第２領域のうち、横方向に並ぶ第１領域と第２領域に対して所定の条件を満たすと判定する。図７は、該条件を満たす第１領域と第２領域を例示する。なお、図７における座標（ｘ、ｙ）におけるｘは、横軸（ｘ軸）の座標を示す。また、ｙは、縦軸（ｙ軸）の座標を
示す。 Further, for example, the predetermined condition relates to a horizontal arrangement of the first area and the second area that are in a correspondence relationship. The control unit 12 determines that a predetermined condition is satisfied for the first region and the second region arranged in the horizontal direction among the first region arranged in the vertical direction and the second region arranged in the vertical direction in the image data. FIG. 7 illustrates a first region and a second region that satisfy the condition. In addition, x in the coordinate (x, y) in FIG. 7 shows the coordinate of a horizontal axis (x-axis). Moreover, y shows the coordinate of a vertical axis | shaft (y axis).

ここで、本実施形態において、縦方向に並ぶ第１領域とは、第１領域の左上端の横軸（ｘ軸）に関する位置座標（ｘ座標）が、ユーザによって設定及び変更可能な閾値以内の誤差範囲に存在する第１領域のことである。例えば、図７において示される第１領域５０ａのｘ座標は７０である。第１領域５０ｂのｘ座標は６８である。第１領域５０ｃのｘ座標は７０である。この時、例えば、閾値が５であるとすると、第１領域５０ａ、第１領域５０ｂ、及び、第１領域５０ｃは、それぞれ縦方向に並ぶ第１領域である。 Here, in the present embodiment, the first region arranged in the vertical direction means that the position coordinate (x coordinate) related to the horizontal axis (x axis) at the upper left end of the first region is within a threshold that can be set and changed by the user. It is the first region existing in the error range. For example, the x coordinate of the first region 50a shown in FIG. The x coordinate of the first region 50b is 68. The x coordinate of the first region 50c is 70. At this time, for example, if the threshold is 5, the first region 50a, the first region 50b, and the first region 50c are first regions arranged in the vertical direction.

第２領域についても同様である。本実施形態において、縦方向に並ぶ第２領域とは、第２領域の左上端の横軸（ｘ軸）に関する位置座標（ｘ座標）が、ユーザによって設定及び変更可能な閾値以内の誤差範囲に存在する第２領域のことである。例えば、図７において示される第２領域６０ａのｘ座標は２０である。第２領域６０ｂのｘ座標は２１である。第２領域６０ｃのｘ座標は１９である。この時、例えば、閾値が５であるとすると、第２領域６０ａ、第２領域６０ｂ、及び、第２領域６０ｃは、それぞれ縦方向に並ぶ第２領域である。 The same applies to the second region. In the present embodiment, the second region arranged in the vertical direction is an error range in which the position coordinate (x coordinate) related to the horizontal axis (x axis) at the upper left end of the second region is within a threshold that can be set and changed by the user. It is the second region that exists. For example, the x coordinate of the second region 60a shown in FIG. The x coordinate of the second region 60b is 21. The x coordinate of the second region 60c is 19. At this time, for example, if the threshold is 5, the second region 60a, the second region 60b, and the second region 60c are second regions arranged in the vertical direction.

制御部１２は、このように縦方向に並ぶ第１領域と縦方向に並ぶ第２領域を取得する。そして、制御部１２は、縦方向に並ぶ第１領域及び第２領域のうち、横方向に並ぶ第１領域と第２領域に対して上記所定の条件を満たすと判定する。 In this way, the control unit 12 acquires the first region arranged in the vertical direction and the second region arranged in the vertical direction. And the control part 12 determines with satisfy | filling the said predetermined conditions with respect to the 1st area | region and 2nd area | region which are located in a horizontal direction among the 1st area | regions and 2nd area | regions arranged in a vertical direction.

ここで、本実施形態において、第１領域と第２領域とが横方向に並ぶとは、第１領域の左上端の縦軸（ｙ軸）に関する位置座標（ｙ座標）と第２領域の左上端の縦軸に関する位置座標の差分が、ユーザによって設定及び変更可能な閾値以内である状態を指す。 Here, in the present embodiment, the first area and the second area are arranged in the horizontal direction. The position coordinate (y coordinate) on the vertical axis (y axis) at the upper left corner of the first area and the upper left corner of the second area. This refers to a state in which the difference in position coordinates regarding the vertical axis at the end is within a threshold that can be set and changed by the user.

例えば、図７において示される第１領域５０ａのｙ座標は５９である。第１領域５０ｂのｙ座標は９８である。第１領域５０ｃのｙ座標は１４０である。これに対して、図７において示される第２領域６０ａのｙ座標は６０である。第２領域６０ｂのｙ座標は１００である。第２領域６０ｃのｙ座標は１４１である。 For example, the y coordinate of the first region 50a shown in FIG. The y coordinate of the first region 50b is 98. The y coordinate of the first region 50c is 140. On the other hand, the y coordinate of the second region 60a shown in FIG. The y coordinate of the second region 60b is 100. The y coordinate of the second region 60c is 141.

この時、例えば、閾値が５であるとすると、制御部１２は、第１領域５０ａと第２領域６０ａとが横方向に並び、所定の条件を満たすと判定する。また、制御部１２は、第１領域５０ｂと第２領域６０ｂとが横方向に並び、所定の条件を満たすと判定する。更に、制御部１２は、第１領域５０ｃと第２領域６０ｃとが横方向に並び、所定の条件を満たすと判定する。すなわち、制御部１２は、第１領域５０ａと第２領域６０ａとを対応付ける。また、制御部１２は、第１領域５０ｂと第２領域６０ｂとを対応付ける。更に、制御部１２は、第１領域５０ｃと第２領域６０ｃとを対応付ける。 At this time, for example, if the threshold value is 5, the control unit 12 determines that the first region 50a and the second region 60a are arranged in the horizontal direction and satisfy a predetermined condition. Further, the control unit 12 determines that the first region 50b and the second region 60b are arranged in the horizontal direction and satisfy a predetermined condition. Further, the control unit 12 determines that the first region 50c and the second region 60c are arranged in the horizontal direction and satisfy a predetermined condition. That is, the control unit 12 associates the first area 50a with the second area 60a. In addition, the control unit 12 associates the first area 50b with the second area 60b. Furthermore, the control unit 12 associates the first area 50c with the second area 60c.

また、例えば、所定の条件は、対応関係にある第１領域と第２領域の縦方向の並び方に関する。制御部１２は、イメージデータ内において横方向に並ぶ第１領域と横方向に並ぶ第２領域のうち、縦方向に並ぶ第１領域と第２領域に対して所定の条件を満たすと判定する。図８は、該条件を満たす第１領域と第２領域を例示する。図８における座標（ｘ、ｙ）は、図７における座標と同様である。 Further, for example, the predetermined condition relates to a vertical arrangement of the first area and the second area that are in a correspondence relationship. The control unit 12 determines that a predetermined condition is satisfied for the first region and the second region arranged in the vertical direction among the first region arranged in the horizontal direction and the second region arranged in the horizontal direction in the image data. FIG. 8 illustrates a first region and a second region that satisfy the condition. The coordinates (x, y) in FIG. 8 are the same as the coordinates in FIG.

ここで、第１領域が横方向に並ぶか否かの判定、及び、第２領域が横方向に並ぶか否かの判定は、上記第１領域と第２領域とが横方向に並ぶか否かの判定と同様である。また、第１領域と第２領域とが縦方向に並ぶか否かの判定は、上記第１領域が縦方向に並ぶか否かの判定、及び、上記第２領域が縦方向に並ぶか否かの判定と同様である。 Here, whether or not the first region is aligned in the horizontal direction and whether or not the second region is aligned in the horizontal direction are determined based on whether the first region and the second region are aligned in the horizontal direction. This is the same as the determination. Whether the first area and the second area are arranged in the vertical direction is determined by determining whether the first area is arranged in the vertical direction, and whether the second area is arranged in the vertical direction. This is the same as the determination.

例えば、閾値が５であるとすると、制御部１２は、図８における、第１領域５０ａと第
２領域６０ａとが縦方向に並び、所定の条件を満たすと判定する。また、制御部１２は、第１領域５０ｂと第２領域６０ｂとが縦方向に並び、所定の条件を満たすと判定する。更に、制御部１２は、第１領域５０ｃと第２領域６０ｃとが縦方向に並び、所定の条件を満たすと判定する。すなわち、制御部１２は、第１領域５０ａと第２領域６０ａとを対応付ける。また、制御部１２は、第１領域５０ｂと第２領域６０ｂとを対応付ける。更に、制御部１２は、第１領域５０ｃと第２領域６０ｃとを対応付ける。 For example, if the threshold value is 5, the control unit 12 determines that the first region 50a and the second region 60a in FIG. Further, the control unit 12 determines that the first region 50b and the second region 60b are arranged in the vertical direction and satisfy a predetermined condition. Further, the control unit 12 determines that the first region 50c and the second region 60c are arranged in the vertical direction and satisfy a predetermined condition. That is, the control unit 12 associates the first area 50a with the second area 60a. In addition, the control unit 12 associates the first area 50b with the second area 60b. Furthermore, the control unit 12 associates the first area 50c with the second area 60c.

また、例えば、制御部１２は、イメージデータ内に存在する、第１領域と第２領域の対応関係を示す所定の対応関係指示表現を認識する。そして、制御部１２は、該認識した対応関係指示表現により示される対応関係に基づいて、第１領域と第２領域とを対応付ける。 For example, the control unit 12 recognizes a predetermined correspondence instruction expression indicating the correspondence between the first area and the second area, which exists in the image data. Then, the control unit 12 associates the first region with the second region based on the correspondence relationship indicated by the recognized correspondence relationship instruction expression.

対応関係指示表現は、第１領域と第２領域の対応付けを示すものである。図９は、該対応関係指示表現を例示する。 The correspondence relationship instruction expression indicates the correspondence between the first area and the second area. FIG. 9 illustrates the correspondence relationship instruction expression.

例えば、対応関係指示表現は、図９により示される、矢印７０である。例えば、制御部１２は、イメージデータ内に存在する矢印７０を認識する。そして、制御部１２は、認識した矢印７０から、該矢印７０が指し示す方向についてのベクトル情報を取得する。更に、制御部１２は、該取得したベクトル情報を用いて、該矢印７０によって指示される第１領域５０ａ及び第２領域６０ａを特定する。その結果、制御部１２は、特定した第１領域５０ａ及び第２領域６０ａを対応付ける。 For example, the correspondence relationship instruction expression is an arrow 70 shown by FIG. For example, the control unit 12 recognizes an arrow 70 existing in the image data. Then, the control unit 12 acquires vector information about the direction indicated by the arrow 70 from the recognized arrow 70. Further, the control unit 12 specifies the first region 50a and the second region 60a indicated by the arrow 70 using the acquired vector information. As a result, the control unit 12 associates the identified first region 50a and second region 60a with each other.

また、例えば、対応関係指示表現は、図９に示される、線分７１である。例えば、制御部１２は、イメージデータ内に存在する線分７１を認識する。そして、制御部１２は、線分７１により繋げられている第１領域５０ｂ及び第２領域６０ｂを特定する。その結果、制御部１２は、特定した第１領域５０ｂ及び第２領域６０ｂを対応付ける。 Further, for example, the correspondence relationship instruction expression is a line segment 71 shown in FIG. For example, the control unit 12 recognizes a line segment 71 existing in the image data. Then, the control unit 12 specifies the first region 50b and the second region 60b connected by the line segment 71. As a result, the control unit 12 associates the identified first region 50b and second region 60b with each other.

また、例えば、対応関係指示表現は、図９に示される、記号７２ａ及び記号７２ｂである。例えば、制御部１２は、イメージデータ内に存在する同一の記号である記号７２ａ及び記号７２ｂを認識する。そして、制御部１２は、同一の記号である記号７２ａ及び記号７２ｂが付されている第１領域５０ｃ及び第２領域６０ｃを特定する。その結果、制御部１２は、特定した第１領域５０ｃ及び第２領域６０ｃを対応付ける。 Further, for example, the correspondence relationship instruction expression is a symbol 72a and a symbol 72b shown in FIG. For example, the control unit 12 recognizes the symbols 72a and 72b that are the same symbols existing in the image data. And the control part 12 specifies the 1st area | region 50c and the 2nd area | region 60c to which the symbol 72a and the symbol 72b which are the same symbols are attached | subjected. As a result, the control unit 12 associates the identified first region 50c and second region 60c with each other.

制御部１２は、これまでに例示した対応付けの方法により、ステップ１０２において認識した第１領域と、ステップ１０４において認識した第２領域とを対応付ける。なお、制御部１２は、これまでに例示した対応付けの方法を複数組み合わせて、第１領域と第２領域とを対応付けてもよい。 The control unit 12 associates the first area recognized in step 102 with the second area recognized in step 104 by the association method exemplified so far. The control unit 12 may associate the first region with the second region by combining a plurality of association methods exemplified so far.

＜ステップ１０８＞
次に、制御部１２は、ステップ１０３において取得した位置情報と、ステップ１０６において取得した項目名とを含む項目定義情報を作成する。図１０は、図４に示されるイメージデータに対して上記ステップ１０２〜１０７までの処理が実行された結果、当該ステップ１０８において生成される項目定義情報を例示する。 <Step 108>
Next, the control unit 12 creates item definition information including the position information acquired in step 103 and the item name acquired in step 106. FIG. 10 exemplifies the item definition information generated in step 108 as a result of the processing from step 102 to step 107 being executed on the image data shown in FIG.

図１０において示されるとおり、第１領域５０ａと第２領域６０ａとが対応付けられている。また、第１領域５０ｂと第２領域６０ｂとが対応付けられている。 As shown in FIG. 10, the first area 50a and the second area 60a are associated with each other. Further, the first area 50b and the second area 60b are associated with each other.

そして、第１領域５０ａのｘ座標（Left）、ｙ座標（Top）、横軸の長さ（Width）、及び、縦軸の長さ（Height）は、それぞれ、１２０、８０、３２０、及び、３０である。第１領域５０ｂのｘ座標、ｙ座標、横軸の長さ、及び、縦軸の長さは、それぞれ、１２０、
１２０、３２０、及び、３０である。また、第２領域６０ａのｘ座標、ｙ座標、横軸の長さ、及び、縦軸の長さは、それぞれ、２０、８０、９０、及び、３０である。第２領域６０ｂのｘ座標、ｙ座標、横軸の長さ、及び、縦軸の長さは、それぞれ、２０、１２０、９０、及び、３０である。 The x coordinate (Left), the y coordinate (Top), the length of the horizontal axis (Width), and the length of the vertical axis (Height) of the first region 50a are 120, 80, 320, and 30. The x-coordinate, y-coordinate, the length of the horizontal axis, and the length of the vertical axis of the first region 50b are 120,
120, 320, and 30. Further, the x coordinate, the y coordinate, the length of the horizontal axis, and the length of the vertical axis of the second region 60a are 20, 80, 90, and 30, respectively. The x coordinate, the y coordinate, the length of the horizontal axis, and the length of the vertical axis of the second region 60b are 20, 120, 90, and 30, respectively.

図１０は、このような第１領域５０ａと第２領域６０ａ、及び、第１領域５０ｂと第２領域６０ｂから取得される項目定義情報を例示する。なお、図１０において例示される項目定義情報における「項目名」フィールドは、第２領域から取得される文字情報を格納する。「Left」フィールドは、第１領域の左上端のｘ座標を格納する。「Top」フィールド
は、第１領域の左上端のｙ座標を格納する。「Width」フィールドは、第１領域の横軸の
長さを格納する。「Height」フィールドは、第１領域の縦軸の長さを格納する。 FIG. 10 exemplifies item definition information acquired from the first area 50a and the second area 60a, and the first area 50b and the second area 60b. The “item name” field in the item definition information illustrated in FIG. 10 stores character information acquired from the second area. The “Left” field stores the x coordinate of the upper left corner of the first area. The “Top” field stores the y coordinate of the upper left corner of the first area. The “Width” field stores the length of the horizontal axis of the first area. The “Height” field stores the length of the vertical axis of the first area.

ここで、項目定義情報の行データ（レコード）は、対応関係にある第１領域と第２領域に係る情報を示す。つまり、項目定義情報のレコードは、文字認識の対象となる領域の位置情報と、当該領域についての項目名を含む。 Here, the row data (record) of the item definition information indicates information related to the first area and the second area that are in a correspondence relationship. That is, the record of the item definition information includes the position information of the area that is the object of character recognition and the item name for the area.

なお、ＯＣＲソフト等は、項目定義情報のレコードから、文字認識の対象となる領域の位置情報、及び、当該領域についての項目名を取得してもよい。つまり、項目定義情報は、ＯＣＲソフト等において、文字認識の対象となる領域に係る情報を特定するために用いられてもよい。 Note that the OCR software or the like may acquire the position information of the area to be character-recognized and the item name for the area from the record of the item definition information. That is, the item definition information may be used in OCR software or the like to specify information related to a region that is a character recognition target.

また、制御部１２は、項目定義情報のレコードから得られる文字認識の対象となる領域に係る位置情報と項目名を、これらの情報を取得したイメージデータとともに、情報処理装置１に接続される表示装置に表示してもよい。 In addition, the control unit 12 displays the position information and item names related to the character recognition target area obtained from the item definition information record, together with the image data obtained from the information, connected to the information processing apparatus 1. It may be displayed on the device.

＜エンド＞
最後に、制御部１２は、例えば、ステップ１０８において生成した項目定義情報を記憶部１１に格納する。そして、情報処理装置１は、本動作例に係る処理を終了する。 <End>
Finally, the control unit 12 stores the item definition information generated in step 108 in the storage unit 11, for example. Then, the information processing apparatus 1 ends the process according to this operation example.

＜その他＞
なお、制御部１２による上記ステップ１０２及び１０４における第１領域及び第２領域の認識に係る処理は、領域認識部３１の処理に相当する。 <Others>
Note that the processing related to the recognition of the first region and the second region in steps 102 and 104 by the control unit 12 corresponds to the processing of the region recognition unit 31.

制御部１２による上記ステップ１０３による位置情報取得に係る処理は、位置情報取得部３２の処理に相当する。 The process related to the position information acquisition in step 103 performed by the control unit 12 corresponds to the process of the position information acquisition unit 32.

制御部１２による上記ステップ１０６における項目名取得に係る処理は、項目名取得部３３の処理に相当する。 The process related to the item name acquisition in step 106 by the control unit 12 corresponds to the process of the item name acquisition unit 33.

制御部１２による上記ステップ１０７における対応付けに係る処理は、対応付け部３４の処理に係る。 The process related to the association in step 107 by the control unit 12 relates to the process of the association unit 34.

制御部１２による上記ステップ１０８における項目定義情報の作成に係る処理は、項目定義情報作成部３５の処理に係る。 The processing related to the creation of the item definition information in step 108 by the control unit 12 relates to the processing of the item definition information creation unit 35.

§３実施の形態に係る作用及び効果
以上によれば、本実施形態に係る情報処理装置１では、イメージデータ内における第１領域と第２領域が認識される（ステップ１０２及び１０４）。そして、第１領域からは、文字認識の対象となる領域を指定するための位置情報が取得される（ステップ１０３）。また、第２領域からは、該文字認識の対象となる領域についての項目名が取得される（ス
テップ１０６）。 §3 Actions and effects according to the embodiment As described above, the information processing apparatus 1 according to the present embodiment recognizes the first area and the second area in the image data (steps 102 and 104). Then, from the first area, position information for designating an area for character recognition is acquired (step 103). Further, the item name for the area that is the target of character recognition is acquired from the second area (step 106).

そのため、本実施形態に係る情報処理装置１によれば、ユーザは、取得された位置情報に係る文字認識の対象となる領域についての項目名を手入力により設定する必要がなくなる。したがって、本実施形態に係る情報処理装置１によれば、ＯＣＲソフト等に用いられる定義情報作成の効率化を図ることができる。 Therefore, according to the information processing apparatus 1 according to the present embodiment, the user does not need to manually set an item name for an area that is a target of character recognition related to the acquired position information. Therefore, according to the information processing apparatus 1 according to the present embodiment, it is possible to improve the efficiency of creating definition information used for OCR software or the like.

また、本実施形態に係る情報処理装置１では、文字認識の対象となる領域を指定するための位置情報と、該文字認識の対象となる領域についての項目名との対応付けが行われる（ステップ１０７）。そのため、ユーザは、取得された位置情報と項目名との対応付けを行う必要がなくなる。したがって、本実施形態に係る情報処理装置１によれば、ＯＣＲソフト等に用いられる定義情報作成の効率化を図ることができる。 Further, in the information processing apparatus 1 according to the present embodiment, the position information for designating the area that is the target of character recognition is associated with the item name for the area that is the target of character recognition (step) 107). This eliminates the need for the user to associate the acquired position information with the item name. Therefore, according to the information processing apparatus 1 according to the present embodiment, it is possible to improve the efficiency of creating definition information used for OCR software or the like.

§４補足
以上、本発明の実施の形態を詳細に説明してきたが、前述までの説明はあらゆる点において本発明の例示に過ぎず、その範囲を限定しようとするものではない。本発明の範囲を逸脱することなく種々の改良や変形を行うことができることは言うまでもない。 §4 Supplement Although the embodiment of the present invention has been described in detail above, the above description is merely an example of the present invention in all respects and is not intended to limit the scope thereof. It goes without saying that various improvements and modifications can be made without departing from the scope of the present invention.

当業者は、上記本実施形態の記載から、特許請求の範囲の記載および技術常識に基づいて等価な範囲を実施することができる。また、本明細書において使用される用語は、特に言及しない限り、当該分野で通常用いられる意味で用いられる。したがって、他に定義されない限り、本明細書中で使用される全ての専門用語および技術用語は、本発明の属する分野の当業者によって一般的に理解される意味と同じ意味を有する。両者が矛盾する場合、本明細書において使用される用語は、本明細書（定義を含めて）に記載された意味において理解される。 A person skilled in the art can implement an equivalent range from the description of the present embodiment based on the description of the claims and the common general technical knowledge. Moreover, the term used in this specification is used by the meaning normally used in the said field unless there is particular mention. Thus, unless defined otherwise, all technical and technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. In the event of a conflict, terms used herein will be understood in the meaning set forth herein (including definitions).

１情報処理装置
２スキャナ
１１記憶部
１２制御部
１３バス
１４入出力部
３１領域認識部
３２位置情報取得部
３３項目名取得部
３４対応付け部
３５項目定義情報作成部
５０、５０ａ、５０ｂ、５０ｃ第１領域
６０、６０ａ、６０ｂ、６０ｃ第２領域
７０対応関係指示表現（矢印）
７１対応関係指示表現（線分）
７２ａ、７２ｂ対応関係指示表現（記号） DESCRIPTION OF SYMBOLS 1 Information processing apparatus 2 Scanner 11 Storage part 12 Control part 13 Bus 14 Input / output part 31 Area recognition part 32 Position information acquisition part 33 Item name acquisition part 34 Correlation part 35 Item definition information creation part 50, 50a, 50b, 50c 1 area 60, 60a, 60b, 60c 2nd area 70 Correspondence relation instruction expression (arrow)
71 Corresponding relationship instruction expression (line segment)
72a, 72b Corresponding relationship instruction expression (symbol)

Claims

For the area specified by the predetermined expression in the image data, the first area specified by the first area specifying expression and the second area specifying expression different from the first area specifying expression are designated. An area recognition unit for recognizing two areas;
In the image data, a position information acquisition unit that acquires position information of the first region recognized by the region recognition unit as position information for designating a region that is a target of character recognition;
Character information obtained by recognizing characters existing in the second area recognized by the area recognition unit is the character recognition target specified by the position information acquired by the position information acquisition unit. An item name acquisition unit to acquire as an item name for the area
An information processing apparatus comprising:

An association unit that associates the first area with the second area;
The item name acquisition unit is configured to specify the character information obtained from the second area by the position information acquired from the first area associated with the second area by the association unit. The information processing apparatus according to claim 1, wherein the information processing apparatus acquires an item name for an area to be recognized.

The information processing apparatus according to claim 2, wherein the association unit associates the first region with the second region that is closest to the first region in image data.

The associating unit determines whether or not a positional relationship between the position of the first region and the position of the second region satisfies a predetermined condition, and the first region that has been determined to satisfy the predetermined condition and the The information processing apparatus according to claim 2, wherein the second area is associated with the information processing apparatus.

The associating unit includes a plurality of first regions arranged in the vertical direction and a plurality of second regions arranged in the vertical direction in the image data, with respect to one first region and one second region arranged in the horizontal direction. The information processing apparatus according to claim 4, wherein the information processing apparatus determines that the predetermined condition is satisfied.

The associating unit is configured to detect one first region and one second region arranged in the vertical direction among a plurality of first regions arranged in the horizontal direction and a plurality of second regions arranged in the horizontal direction in the image data. The information processing apparatus according to claim 4, wherein the information processing apparatus determines that the predetermined condition is satisfied.

The association unit recognizes a predetermined correspondence instruction expression indicating the correspondence between the first region and the second region, which exists in the image data, and based on the recognized correspondence, the first region The information processing apparatus according to claim 2, wherein the second area is associated with the second area.

The position information for designating the area for character recognition acquired by the position information acquisition unit, and the character recognition target specified by the position information acquired by the item name acquisition unit; The information processing apparatus according to claim 1, further comprising: an item definition information creating unit that creates item definition information including the item name for a region to be formed.

Computer
For the area specified by the predetermined expression in the image data, the first area specified by the first area specifying expression and the second area specifying expression different from the first area specifying expression are designated. Recognizing two regions;
Obtaining position information of the recognized first region as position information for designating a region for character recognition in the image data;
Obtaining character information obtained by recognizing characters existing in the recognized second area as an item name for the area to be subjected to character recognition specified by the obtained position information; ,
The information processing method characterized by performing.

On the computer,
For the area specified by the predetermined expression in the image data, the first area specified by the first area specifying expression and the second area specifying expression different from the first area specifying expression are designated. Recognizing two regions;
Obtaining position information of the recognized first region as position information for designating a region for character recognition in the image data;
Obtaining character information obtained by recognizing characters existing in the recognized second area as an item name for the area to be subjected to character recognition specified by the obtained position information; ,
A program for running