JP2013182459A

JP2013182459A - Information processing apparatus, information processing method, and program

Info

Publication number: JP2013182459A
Application number: JP2012046408A
Authority: JP
Inventors: Tatsuya Mogi; 達也毛木; Takashi Sawada; 敬澤田; Masamitsu Ito; 修光伊藤
Original assignee: PFU Ltd
Current assignee: PFU Ltd
Priority date: 2012-03-02
Filing date: 2012-03-02
Publication date: 2013-09-12

Abstract

PROBLEM TO BE SOLVED: To provide a technique capable of reducing time and labor for preparing definition data of a form.SOLUTION: An information processing device according to one aspect of the present invention includes: an acquisition part for acquiring image data of a first form, and image data of a second form which is the same type kind of form as the first form and includes information not existing in the first form; an extracting part for extracting an area in the second form including the information from a difference between the image data of the first form and the image data of the second form; and a definition preparing part for preparing definition data used for reading processing for the same kind of form as the first form and the second form, in which a position of the extracted area in the second form is set as a position for reading information.

Description

本発明は、情報処理装置、情報処理方法、及び、プログラムの技術に関する。 The present invention relates to an information processing apparatus, an information processing method, and a program technique.

特許文献１には、予めフォームが定義されている、手書きされた筆記情報を有する紙文書から、手書きされた部分の情報である筆記画像情報を抽出する技術が開示されている。 Patent Document 1 discloses a technique for extracting handwritten image information, which is information on a handwritten portion, from a paper document having handwritten writing information in which a form is defined in advance.

特開２００５−３４６４５９号公報JP-A-2005-346459

従来、ユーザは、手作業で、帳票の認識処理に用いる定義データを作成していた。この定義データを作成するためには、ユーザは、該定義データ作成に関する知識を習得しなければならなかった。そのため、従来の技術では、当該帳票の定義データを作成する手間がかかるという問題点があった。 Conventionally, a user manually creates definition data used for form recognition processing. In order to create the definition data, the user has to acquire knowledge about the definition data creation. Therefore, the conventional technique has a problem that it takes time to create definition data of the form.

一側面では、本発明は、このような問題点を考慮してなされたものであり、帳票の定義データを作成する手間を削減することを目的とする。 In one aspect, the present invention has been made in consideration of such a problem, and an object thereof is to reduce the trouble of creating form definition data.

本発明の一側面に係る情報処理装置は、第１帳票のイメージデータと、該第１帳票と同種の帳票であって、該第１帳票には存在しない情報が存在する帳票である第２帳票のイメージデータとを取得する取得部と、前記第１帳票のイメージデータと前記第２帳票のイメージデータとの差分から、前記第２帳票において前記情報が存在する領域を抽出する抽出部と、前記第２帳票における前記抽出された領域の位置を情報の読み取り位置として設定した、前記第１帳票及び第２帳票と同種の帳票の読み取り処理に用いる定義データを作成する定義作成部と、を備える。 An information processing apparatus according to an aspect of the present invention provides a second form that is an image data of a first form and a form that is the same type of form as the first form and includes information that does not exist in the first form. An acquisition unit for acquiring the image data, an extraction unit for extracting an area where the information exists in the second form from a difference between the image data of the first form and the image data of the second form, A definition creating unit configured to create definition data used for reading the same form as the first form and the second form, in which the position of the extracted area in the second form is set as an information reading position.

上記本発明の一側面に係る情報処理装置によれば、第１帳票と第２帳票との差分から得られる情報を用いて、該第１帳票及び第２帳票と同種の帳票の認識処理に用いる定義データが作成される。 According to the information processing apparatus according to one aspect of the present invention, information obtained from the difference between the first form and the second form is used for recognition processing of the same type of form as the first form and the second form. Definition data is created.

従って、ユーザは、手作業で、帳票の認識処理に用いる定義データを作成する必要が無くなる。また、ユーザは、定義データを作成するためには、第１帳票と第２帳票のイメージデータを用意するだけでよい。そのため、ユーザは、定義データの作成に関する知識が無くても、定義データを作成することができる。その結果、上記本発明の一側面に係る情報処理装置によれば、当該帳票の定義データを作成する手間を削減することが可能となる。 Therefore, the user does not need to manually create definition data used for the form recognition process. Further, in order to create definition data, the user only needs to prepare image data of the first form and the second form. Therefore, the user can create definition data without knowledge about creation of definition data. As a result, according to the information processing apparatus according to one aspect of the present invention, it is possible to reduce the trouble of creating definition data for the form.

また、上記情報処理装置の別の形態として、前記定義作成部は、前記抽出された領域に対して所定の認識処理を適用することで得られる、前記抽出された領域に存在する情報の属性を、前記抽出された領域に関する属性として前記定義データに設定してもよい。 Further, as another form of the information processing apparatus, the definition creating unit obtains an attribute of information existing in the extracted area obtained by applying a predetermined recognition process to the extracted area. The definition data may be set as an attribute relating to the extracted area.

また、上記各情報処理装置の別の形態として、前記抽出部は、前記第１帳票又は前記第２帳票において、前記抽出した領域の位置周辺に存在する、文字が印字されている印字領
域を特定し、前記定義作成部は、前記印字領域から取得される文字により示される名称を、前記抽出された領域の項目名として前記定義データに設定してもよい。 As another form of each of the information processing apparatuses, the extraction unit specifies a print area in which characters are printed in the vicinity of the extracted area in the first form or the second form. The definition creating unit may set a name indicated by characters acquired from the print area in the definition data as an item name of the extracted area.

また、上記情報処理装置の別の形態として、前記定義作成部は、帳票に含まれる領域の項目名として使用される名称が登録された辞書データを参照し、前記印字領域から取得される文字により示される名称が該参照した辞書データに登録されていなかった場合、前記印字領域から取得される文字により示される名称を、登録されている名称のうち、前記印字領域から取得される文字により示される名称に類似する名称に修正して、前記抽出された領域の項目名として前記定義データに設定してもよい。 As another form of the information processing apparatus, the definition creation unit refers to dictionary data in which names used as item names of areas included in the form are registered, and uses characters acquired from the print area. If the indicated name is not registered in the referenced dictionary data, the name indicated by the character acquired from the print area is indicated by the character acquired from the print area among the registered names. It may be modified to a name similar to the name and set in the definition data as the item name of the extracted area.

また、上記各情報処理装置の別の形態として、前記取得部は、前記第１帳票及び前記第２帳票と同種の帳票であって、前記第１帳票及び前記第２帳票には存在しない情報が存在する帳票である第３帳票のイメージデータを更に取得し、前記抽出部は、前記第１帳票のイメージデータと前記第３帳票のイメージデータとの差分から、前記第３帳票において前記第１帳票及び前記第２帳票には存在しない情報が存在する領域を更に抽出し、前記定義作成部は、前記第３帳票のイメージデータから抽出された領域が前記第２帳票のイメージデータから抽出された領域と重なる場合、前記第２帳票のイメージデータから抽出された領域により設定された前記情報の読み取り位置を、前記第２帳票及び前記第３帳票のイメージデータから抽出された領域を含む領域の位置に修正してもよい。 As another form of each of the information processing apparatuses, the acquisition unit is a form of the same type as the first form and the second form, and there is information that does not exist in the first form and the second form. Further, image data of a third form, which is an existing form, is further acquired, and the extraction unit calculates the first form in the third form from the difference between the image data of the first form and the image data of the third form. And an area where information that does not exist in the second form is further extracted, and the definition creating unit extracts an area extracted from the image data of the third form from the image data of the second form The information reading position set by the area extracted from the image data of the second form, the area extracted from the image data of the second form and the third form It may be modified to a position of a region including.

なお、上記各情報処理装置の別態様としては、以上の各構成を実現する情報処理方法であってもよいし、プログラムであってもよいし、このようなプログラムを記録したコンピュータその他装置、機械等が読み取り可能な記憶媒体であってもよい。ここで、コンピュータ等が読み取り可能な記録媒体とは、プログラム等の情報を、電気的、磁気的、光学的、機械的、又は、化学的作用によって蓄積する媒体である。 In addition, as another aspect of each said information processing apparatus, the information processing method which implement | achieves each said structure may be sufficient, a program may be sufficient, and the computer other apparatus and machine which recorded such a program May be a readable storage medium. Here, the computer-readable recording medium is a medium that stores information such as programs by electrical, magnetic, optical, mechanical, or chemical action.

本発明によれば、帳票の定義データを作成する手間を削減するができる。 According to the present invention, it is possible to reduce the trouble of creating form definition data.

図１は、実施の形態に係る定義データの作成場面を例示する。FIG. 1 illustrates a creation scene of definition data according to the embodiment. 図２は、実施の形態に係る情報処理装置を例示する。FIG. 2 illustrates an information processing apparatus according to the embodiment. 図３は、実施の形態に係る情報処理装置の処理手順を例示するフローチャートである。FIG. 3 is a flowchart illustrating the processing procedure of the information processing apparatus according to the embodiment. 図４は、実施の形態に係る情報処理装置による定義データ作成処理を例示するフローチャートである。FIG. 4 is a flowchart illustrating definition data creation processing by the information processing apparatus according to the embodiment. 図５は、実施の形態に係る情報処理装置による座標取得処理を例示するフローチャートである。FIG. 5 is a flowchart illustrating a coordinate acquisition process performed by the information processing apparatus according to the embodiment. 図６は、実施の形態に係る情報処理装置による項目名取得処理を例示するフローチャートである。FIG. 6 is a flowchart illustrating an item name acquisition process by the information processing apparatus according to the embodiment. 図７は、実施の形態に係る情報処理装置による定義データ修正処理を例示するフローチャートである。FIG. 7 is a flowchart illustrating definition data correction processing by the information processing apparatus according to the embodiment. 図８は、実施の形態に係る情報処理装置により作成された定義データの表示画面を例示する。FIG. 8 illustrates a definition data display screen created by the information processing apparatus according to the embodiment. 図９は、実施の形態に係る情報処理装置による定義データ保存処理を例示するフローチャートである。FIG. 9 is a flowchart illustrating definition data storage processing by the information processing apparatus according to the embodiment. 図１０は、実施の形態に係る定義データの修正場面を例示する。FIG. 10 illustrates an example of a definition data correction scene according to the embodiment. 図１１は、実施の形態に係る情報処理装置の処理手順を例示するフローチャートである。FIG. 11 is a flowchart illustrating the processing procedure of the information processing apparatus according to the embodiment. 図１２は、実施の形態に係る情報処理装置による、抽出した差分に基づく定義データ修正処理を例示するフローチャートである。FIG. 12 is a flowchart illustrating the definition data correction process based on the extracted difference by the information processing apparatus according to the embodiment.

以下、本発明の一側面に係る実施の形態（以下、「本実施形態」とも表記する）を、図面に基づいて説明する。ただし、以下で説明する本実施形態は、あらゆる点において本発明の例示に過ぎず、その範囲を限定しようとするものではない。本発明の範囲を逸脱することなく種々の改良や変形を行うことができることは言うまでもない。つまり、本発明の実施にあたって、本実施形態に応じた具体的構成が適宜採用されてもよい。 Hereinafter, an embodiment according to an aspect of the present invention (hereinafter, also referred to as “this embodiment”) will be described with reference to the drawings. However, the present embodiment described below is merely an example of the present invention in all points, and is not intended to limit the scope thereof. It goes without saying that various improvements and modifications can be made without departing from the scope of the present invention. That is, in implementing the present invention, a specific configuration according to the present embodiment may be employed as appropriate.

なお、本実施形態において登場するデータを自然言語により説明しているが、より具体的には、コンピュータが認識可能な疑似言語、コマンド、パラメタ、マシン語等で指定される。 Although data appearing in the present embodiment is described in a natural language, more specifically, it is specified by a pseudo language, a command, a parameter, a machine language, or the like that can be recognized by a computer.

§１情報処理装置
図１は、実施形態に係る定義データの作成場面を例示する。本実施形態に係る情報処理装置は、スキャナ等の装置から、第１帳票５０のイメージデータと第２帳票６０のイメージデータとを取得する。第２帳票６０は、第１帳票５０と同種の帳票であって、当該第１帳票５０には存在しない情報が存在する帳票である。 §1 Information processing apparatus FIG. 1 illustrates a creation scene of definition data according to the embodiment. The information processing apparatus according to the present embodiment acquires the image data of the first form 50 and the image data of the second form 60 from an apparatus such as a scanner. The second form 60 is the same type of form as the first form 50, and is a form in which information that does not exist in the first form 50 exists.

ここで、帳票に存在する情報とは、帳票に記入、押印、印字、又は、貼付等された文字、記号、又は、図形等である。帳票に存在する情報とは、例えば、手書きで記入された文字、印字されたバーコード等である。なお、本実施形態において、文字の数は限定されない。以下、「文字」という表現には、１文字の場合と、複数文字の場合とが含まれる。 Here, the information existing in the form is a character, symbol, figure, or the like that has been entered, stamped, printed, or pasted on the form. The information existing in the form is, for example, handwritten characters, printed bar codes, and the like. In the present embodiment, the number of characters is not limited. Hereinafter, the expression “character” includes a case of one character and a case of a plurality of characters.

また、第１帳票５０には存在しない情報が第２帳票６０には存在する状態とは、例えば、第１帳票５０に記入等されていない文字又はバーコードが第２帳票６０に存在する状態、第１帳票５０と第２帳票６０とでは対応する領域において異なる文字の記入等がされている状態等である。 The state in which information that does not exist in the first form 50 exists in the second form 60 is, for example, a state in which characters or barcodes that are not entered in the first form 50 exist in the second form 60, The first form 50 and the second form 60 are in a state where different characters are entered in the corresponding areas.

第１帳票５０に存在しない情報が第２帳票６０には存在する状態の具体例を、図１を用いて説明する。図１に示される第２帳票６０は、「○○書」という名称の、第１帳票５０と同種の帳票である。第２帳票６０の欄６１及び欄６２は、それぞれ、第１帳票５０の欄５１及び欄５２に対応する。 A specific example of a state in which information that does not exist in the first form 50 exists in the second form 60 will be described with reference to FIG. The second form 60 shown in FIG. 1 is the same type of form as the first form 50, which is named “XX book”. The fields 61 and 62 of the second form 60 correspond to the fields 51 and 52 of the first form 50, respectively.

欄５１及び欄６１には、例えば、欄５２及び欄６２に記入等される情報の項目名を示す文字（図中の「Ａ項目」）が印字される。また、欄５２及び欄６２には、例えば、欄５１及び欄６１に印字等された項目名により示される情報が手書き等で記入される。図１では、欄５２には何も記入されておらず、欄６２の領域６３には「ＡＡＡＡ」という文字が記入されている。 In the column 51 and the column 61, for example, characters (“A item” in the figure) indicating the item names of information entered in the column 52 and the column 62 are printed. In the columns 52 and 62, for example, information indicated by item names printed in the columns 51 and 61 is entered by handwriting or the like. In FIG. 1, nothing is entered in the column 52, and the characters “AAAA” are entered in the area 63 of the column 62.

また、図１に示される例では、第２帳票６０の領域６４において、第１帳票５０には印字されていないバーコードが印字されている。なお、本実施形態において、帳票に印字されるバーコードの種類は限定されない。例えば、第２帳票６０の領域６４には、１２桁の数値を示す、ＮＷ−７規格のバーコードが印字されているとする。 In the example shown in FIG. 1, a bar code not printed on the first form 50 is printed in the area 64 of the second form 60. In the present embodiment, the type of barcode printed on the form is not limited. For example, it is assumed that an NW-7 standard barcode indicating a 12-digit numerical value is printed in the area 64 of the second form 60.

この図１において、第１帳票５０には存在しないが第２帳票６０には存在する情報とは、具体的には、領域６３に存在する文字「ＡＡＡＡ」と領域６４に存在するバーコードである。なお、例えば、欄５２に「ＡＡＡＡ」と異なる文字が記入されていたとしても、本実施形態では、領域６３に存在する文字「ＡＡＡＡ」は、第１帳票５０には存在しないが第２帳票６０には存在する情報と見なされる。 In FIG. 1, the information that does not exist in the first form 50 but exists in the second form 60 is specifically the characters “AAAA” that exist in the area 63 and the barcode that exists in the area 64. . For example, even if a character different from “AAAA” is entered in the column 52, in this embodiment, the character “AAAA” present in the area 63 does not exist in the first form 50 but is in the second form 60. Is considered to be present information.

このような第１帳票５０には存在しない情報が第２帳票６０に存在する場合、第１帳票５０と第２帳票６０との差分から、例えば、領域６３及び領域６４のような当該情報が存在する領域を特定することができる。本実施形態に係る情報処理装置は、第１帳票５０のイメージデータと第２帳票６０のイメージデータとの差分から、領域６３及び領域６４のような、第２帳票６０において当該情報が存在する領域を抽出する。この抽出された領域は、第１帳票５０及び第２帳票６０と同種の帳票において、情報が記入等される領域（情報が存在する領域）と推定される。すなわち、この領域に所定の認識処理を適用することにより、当該情報が得られると推定される。 When such information that does not exist in the first form 50 exists in the second form 60, the information such as the area 63 and the area 64 exists from the difference between the first form 50 and the second form 60, for example. The area to be performed can be specified. The information processing apparatus according to the present embodiment uses a difference between the image data of the first form 50 and the image data of the second form 60, such as an area 63 and an area 64 in which the information exists in the second form 60. To extract. This extracted area is presumed to be an area where information is entered (area where information exists) in the same type of form as the first form 50 and the second form 60. That is, it is estimated that the information can be obtained by applying a predetermined recognition process to this area.

そこで、本実施形態に係る情報処理装置は、第２帳票６０において抽出された領域６３及び領域６４の位置を情報の読み取り位置として設定した、第１帳票５０及び第２帳票６０と同種の帳票の読み取り処理に用いる定義データを作成する。つまり、本実施形態に係る情報処理装置によれば、第１帳票５０と第２帳票６０との差分から得られる情報が用いられて、当該第１帳票５０及び第２帳票６０と同種の帳票の認識処理に用いられる定義データが作成される。なお、本実施形態では、領域の位置は、イメージデータ上の座標として表現される。しかしながら、領域の位置の表現方法は、限定されるものではなく、適宜選択される。 Therefore, the information processing apparatus according to the present embodiment uses the same type of form as the first form 50 and the second form 60 in which the positions of the area 63 and the area 64 extracted in the second form 60 are set as information reading positions. Create definition data to be used for reading process. That is, according to the information processing apparatus according to the present embodiment, information obtained from the difference between the first form 50 and the second form 60 is used, and the same type of form as the first form 50 and the second form 60 is used. Definition data used for recognition processing is created. In the present embodiment, the position of the area is expressed as coordinates on the image data. However, the method of expressing the position of the region is not limited and is appropriately selected.

このように定義データが作成されるため、本実施形態に係る情報処理装置によれば、ユーザは、手作業で、帳票の認識処理に用いる定義データを作成する必要がなくなる。また、ユーザは、定義データを作成するためには、第１帳票５０及び第２帳票６０のような関係にある、同種の帳票のイメージデータを少なくとも２つ用意すればよい。そのため、ユーザは、定義データの作成に関する知識が無くても、定義データを作成することができる。その結果、本実施形態によれば、帳票の定義データを作成する手間を削減することが可能となる。 Since the definition data is created in this way, according to the information processing apparatus according to the present embodiment, the user does not need to manually create definition data used for form recognition processing. Further, in order to create the definition data, the user only needs to prepare at least two image data of the same type of form having a relationship such as the first form 50 and the second form 60. Therefore, the user can create definition data without knowledge about creation of definition data. As a result, according to the present embodiment, it is possible to reduce the trouble of creating form definition data.

なお、本実施形態において定義データの作成対象である帳票の種類は限定されるものではない。また、帳票のイメージデータは、スキャナ等により帳票を電子化したデータであってもよいし、コンピュータ上で作成された文書や画像などのデータであってもよい。 In the present embodiment, the type of form for which definition data is to be created is not limited. The image data of the form may be data obtained by digitizing the form by a scanner or the like, or may be data such as a document or an image created on a computer.

また、第１帳票５０のイメージデータと第２帳票６０のイメージデータは、１つの帳票（紙）から取得されるイメージデータであってもよい。例えば、ユーザは、記入前の帳票をスキャナに読み込ませることにより、第１帳票５０のイメージデータを取得することができる。そして、ユーザは、第１帳票５０のイメージデータを取得するために用いた帳票の欄（例えば、欄５２）に文字を記入した後にスキャナに読み込ませることにより、第２帳票６０のイメージデータを取得することができる。 Further, the image data of the first form 50 and the image data of the second form 60 may be image data acquired from one form (paper). For example, the user can acquire the image data of the first form 50 by causing the scanner to read the form before entry. Then, the user obtains the image data of the second form 60 by entering the characters in the form column (for example, the field 52) used for obtaining the image data of the first form 50 and causing the scanner to read the characters. can do.

本実施形態に係る情報処理装置は、第１帳票５０と第２帳票６０との差分を抽出することで、当該第１帳票５０及び第２帳票６０と同種の帳票の認識処理に用いる定義データを作成する。そのため、第１帳票５０と第２帳票６０との差分が明らかな方が好ましい。つまり、第１帳票５０は、例えば、情報を記入等する欄に何ら記入等されていない未記入の帳票である方が好ましい。 The information processing apparatus according to the present embodiment extracts the difference between the first form 50 and the second form 60, so that the definition data used for recognition processing of the same form as the first form 50 and the second form 60 is obtained. create. Therefore, it is preferable that the difference between the first form 50 and the second form 60 is clear. That is, it is preferable that the first form 50 is, for example, an unfilled form in which no information is entered in a column for entering information.

また、本実施形態に係る情報処理装置は、第１帳票５０又は第２帳票６０において、前記差分に基づいて抽出した領域の位置周辺に存在する、文字が印字されている印字領域を特定してもよい。そして、本実施形態に係る情報処理装置は、当該印字領域から取得される文字により示される名称を、当該抽出された領域の項目名として定義データに設定してもよい。 In addition, the information processing apparatus according to the present embodiment specifies a print area in which characters are printed that exist around the position of the area extracted based on the difference in the first form 50 or the second form 60. Also good. Then, the information processing apparatus according to the present embodiment may set the name indicated by the characters acquired from the print area in the definition data as the item name of the extracted area.

例えば、本実施形態に係る情報処理装置は、第１帳票５０における領域６３に対応する領域の位置の、又は、第２帳票６０における領域６３の位置の、周辺を検索し、文字が印字されている印字領域を特定してもよい。この場合、例えば、第１帳票５０では欄５１の領域が、第２帳票６０では欄６１の領域が印字領域として特定される。 For example, the information processing apparatus according to the present embodiment searches the periphery of the position of the area corresponding to the area 63 in the first form 50 or the position of the area 63 in the second form 60, and characters are printed. A print area may be specified. In this case, for example, the area of the column 51 is specified as the print area in the first form 50, and the area of the column 61 is specified as the print area in the second form 60.

本実施形態に係る情報処理装置は、当該印字領域において、例えば、ＯＣＲ（光学文字認識：Optical Character Recognition）処理を実行することで、文字「Ａ項目」を認識
してもよい。そして、情報処理装置は、認識した文字「Ａ項目」を、領域６３に基づいて設定される帳票の読み取り領域の項目名として設定してもよい。 The information processing apparatus according to the present embodiment may recognize the character “A item” by executing, for example, OCR (Optical Character Recognition) processing in the print area. Then, the information processing apparatus may set the recognized character “A item” as the item name of the reading area of the form set based on the area 63.

この場合に、情報処理装置は、帳票に含まれる領域の項目名として使用される名称が登録された辞書データを参照してもよい。そして、情報処理装置は、印字領域から取得される文字により示される名称が参照した辞書データに登録されていなかった場合、当該印字領域から取得される文字により示される名称を、辞書データに登録されている名称に基づいて、修正してもよい。これにより、情報処理装置は、第１帳票５０と第２帳票６０との差分から抽出した領域の項目名として、当該修正した名称を定義データに設定してもよい。 In this case, the information processing apparatus may refer to dictionary data in which names used as item names of areas included in the form are registered. When the name indicated by the character acquired from the print area is not registered in the referenced dictionary data, the information processing apparatus registers the name indicated by the character acquired from the print area in the dictionary data. You may correct based on the name. Thereby, the information processing apparatus may set the corrected name in the definition data as the item name of the area extracted from the difference between the first form 50 and the second form 60.

また、本実施形態に係る情報処理装置は、少なくとも第１帳票５０及び第２帳票６０のいずれかにおいて所定の条件を満たす文字を探査することで、第１帳票５０及び第２帳票６０と同種の帳票の名称（帳票名）を取得してもよい。 In addition, the information processing apparatus according to the present embodiment searches for a character that satisfies a predetermined condition in at least one of the first form 50 and the second form 60, and thus is the same type as the first form 50 and the second form 60. The name of the form (form name) may be acquired.

例えば、本実施形態に係る情報処理装置は、第１帳票５０から、フォントサイズの最も大きい文字を探査し、当該条件を満たす文字により示される名称を帳票名として取得してもよい。第２帳票６０は第１帳票５０と同種の帳票であるため、第１帳票５０において当該帳票名が取得された位置に対応する第２帳票６０上の位置に、当該帳票名を示す文字が存在するはずである。よって、本実施形態に係る情報処理装置は、第１帳票５０において当該帳票名が取得された位置に対応する第２帳票６０上の位置に、当該帳票名を示す文字が存在するか否かを判定することによって、第１帳票５０から取得された帳票名が真に正しい帳票名であるか否かを判定してもよい。このような処理により、例えば、図１により示される例では、情報処理装置は、帳票名として、「○○書」という名称を取得してもよい。 For example, the information processing apparatus according to the present embodiment may search the first form 50 for the character having the largest font size and acquire the name indicated by the character that satisfies the condition as the form name. Since the second form 60 is the same type of form as the first form 50, there is a character indicating the form name at a position on the second form 60 corresponding to the position where the form name is acquired in the first form 50. Should do. Therefore, the information processing apparatus according to the present embodiment determines whether or not a character indicating the form name exists at a position on the second form 60 corresponding to the position at which the form name is acquired in the first form 50. By determining, it may be determined whether or not the form name acquired from the first form 50 is a truly correct form name. Through such processing, for example, in the example illustrated in FIG. 1, the information processing apparatus may acquire the name “XX book” as the form name.

また、本実施形態に係る情報処理装置は、第１帳票５０及び第２帳票６０の差分に基づいて抽出した領域に対して、所定の認識処理を適用してもよい。所定の認識処理とは、例えば、ＯＣＲ処理、バーコード認識処理等である。情報処理装置は、当該所定の認識処理を適用することで得られる、当該抽出した領域に存在する情報の属性を、当該抽出した領域に関する属性として定義データに設定してもよい。 In addition, the information processing apparatus according to the present embodiment may apply a predetermined recognition process to an area extracted based on the difference between the first form 50 and the second form 60. The predetermined recognition process is, for example, an OCR process, a barcode recognition process, or the like. The information processing apparatus may set the attribute of information existing in the extracted area obtained by applying the predetermined recognition process in the definition data as an attribute related to the extracted area.

例えば、図１に示される例では、本実施形態に係る情報処理装置は、領域６３に対してＯＣＲ処理を適用することにより、領域６３（欄５２及び欄６２）に記入される、文字の種類に関する属性（文字種）と、文字の数に関する属性（文字数）とを得ることができる。また、情報処理装置は、領域６４に対してバーコード認識処理を適用することにより、領域６４に印字される、バーコードの種類に関する属性（バーコード種別）と、バーコードにより示される文字の数に関する属性（文字数）とを得ることができる。なお、情報処理装置は、各領域に適用する認識処理を特定せず、各領域に対して複数の認識処理を適用してよい。この場合、情報処理装置は、エラーが発生しなかった認識処理の結果に基づいて、各領域の属性の値を特定してもよい。 For example, in the example shown in FIG. 1, the information processing apparatus according to the present embodiment applies the OCR process to the area 63 to thereby enter the type of characters entered in the area 63 (column 52 and column 62). And an attribute (number of characters) related to the number of characters. In addition, the information processing apparatus applies the barcode recognition process to the area 64 to print the attribute related to the type of barcode (barcode type) and the number of characters indicated by the barcode. Attributes (number of characters). Note that the information processing apparatus may apply a plurality of recognition processes to each area without specifying the recognition process to be applied to each area. In this case, the information processing apparatus may specify the attribute value of each area based on the result of the recognition process in which no error has occurred.

また、本実施形態に係る情報処理装置は、定義データを作成するために、３枚以上の帳
票のイメージデータを用いてもよい。 In addition, the information processing apparatus according to the present embodiment may use image data of three or more forms in order to create definition data.

例えば、情報処理装置は、第２帳票６０が複数枚存在すると扱うことで、３枚以上の帳票のイメージデータに基づいて、定義データを作成してもよい。この場合、情報処理装置は、第１帳票５０と各第２帳票６０との差分に基づいて、定義データを作成する。 For example, the information processing apparatus may create definition data based on the image data of three or more forms by treating that there are a plurality of second forms 60. In this case, the information processing apparatus creates definition data based on the difference between the first form 50 and each second form 60.

また、例えば、情報処理装置は、第１帳票５０及び第２帳票６０と同種の帳票であって、当該第１帳票５０及び第２帳票６０には存在しない情報が存在する帳票である第３帳票のイメージデータを取得してもよい。この場合、情報処理装置は、第１帳票５０のイメージデータと第３帳票のイメージデータとの差分から、当該第３帳票において第１帳票５０及び第２帳票６０には存在しない情報が存在する領域を更に抽出してもよい。そして、情報処理装置は、当該第３帳票のイメージデータから抽出された領域が第２帳票６０のイメージデータから抽出した領域と重なる場合、第２帳票６０のイメージデータから抽出した領域により設定された情報の読み取り位置を、第２帳票６０及び第３帳票のイメージデータから抽出された領域を含む領域の位置に修正してもよい（後述する図１０〜１２）。 Further, for example, the information processing apparatus is the same form as the first form 50 and the second form 60, and the third form is a form in which information that does not exist in the first form 50 and the second form 60 exists. The image data may be acquired. In this case, the information processing apparatus determines, from the difference between the image data of the first form 50 and the image data of the third form, an area where information that does not exist in the first form 50 and the second form 60 exists in the third form. May be further extracted. The information processing apparatus sets the area extracted from the image data of the second form 60 when the area extracted from the image data of the third form overlaps the area extracted from the image data of the second form 60. The information reading position may be corrected to the position of the area including the area extracted from the image data of the second form 60 and the third form (FIGS. 10 to 12 described later).

以下では、本実施形態に係る情報処理装置１が、第１帳票５０と第２帳票６０との差分に基づいて、情報の読み取り位置（座標）、情報の属性（文字種、文字数等）、項目名、帳票名を設定した定義データを作成する例を示す。 In the following, the information processing apparatus 1 according to the present embodiment, based on the difference between the first form 50 and the second form 60, the information reading position (coordinates), information attributes (character type, number of characters, etc.), item name An example of creating definition data in which a form name is set is shown.

［構成例］
図２は、本実施形態に係る情報処理装置１を例示する。情報処理装置１は、図２に示されるとおり、そのハードウェア構成として、バス１３に接続される、記憶部１１、制御部１２、入出力部１４、及び、通信部１５等を有する。 [Configuration example]
FIG. 2 illustrates the information processing apparatus 1 according to the present embodiment. As illustrated in FIG. 2, the information processing apparatus 1 includes a storage unit 11, a control unit 12, an input / output unit 14, a communication unit 15, and the like that are connected to the bus 13 as a hardware configuration.

記憶部１１は、制御部１２で実行される処理で利用される各種データ及びプログラムを記憶する（不図示）。記憶部１１は、例えば、ハードディスク、フラッシュメモリ等の記憶装置によって実現される。 The storage unit 11 stores various data and programs used in processing executed by the control unit 12 (not shown). The storage unit 11 is realized by a storage device such as a hard disk or a flash memory, for example.

また、記憶部１１は、辞書データ２１を格納する。辞書データには、帳票に含まれる領域の項目名として使用される可能性のある名称が登録されている。例えば、辞書データ２１は、帳票に含まれる領域の項目名として使用される可能性のある名称のリストである。 The storage unit 11 stores dictionary data 21. In the dictionary data, names that may be used as item names of areas included in the form are registered. For example, the dictionary data 21 is a list of names that may be used as item names of areas included in the form.

なお、当該辞書データは、本実施形態のように、情報処理装置１に保持されていなくてもよい。例えば、辞書データ２１は、情報処理装置１がアクセス可能な他の情報処理装置に保持されていてもよい。また、後述する辞書データに基づく項目名の修正処理が実施されない場合、情報処理装置１は、当該辞書データ２１を保持していなくてもよいし、当該辞書データ２１にアクセスしなくてもよい。 The dictionary data may not be held in the information processing apparatus 1 as in the present embodiment. For example, the dictionary data 21 may be held in another information processing apparatus accessible by the information processing apparatus 1. Further, when an item name correction process based on dictionary data, which will be described later, is not performed, the information processing apparatus 1 may not hold the dictionary data 21 and may not access the dictionary data 21.

制御部１２は、マイクロプロセッサ又はＣＰＵ（Central Processing Unit）等の１又
は複数のプロセッサと、当該１又は複数のプロセッサの処理に利用される周辺回路（ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）、インタフェース回路等）と、を有する。制御部１２は、記憶部１１に格納されている各種データ及びプログラムを実行することにより、本実施形態における情報処理装置１の処理を実現する。ＲＯＭ、ＲＡＭ等は、制御部１２内のプロセッサが取り扱うアドレス空間に配置されているという意味で主記憶装置と呼ばれてもよい。 The control unit 12 includes one or more processors such as a microprocessor or a CPU (Central Processing Unit), and peripheral circuits (ROM (Read Only Memory), RAM (Random Access Memory) used for processing of the one or more processors. ), An interface circuit, and the like. The control unit 12 implements the processing of the information processing apparatus 1 in the present embodiment by executing various data and programs stored in the storage unit 11. ROM, RAM, and the like may be referred to as a main storage device in the sense that they are arranged in an address space handled by a processor in the control unit 12.

図２に示されるとおり、制御部１２は、取得部３１、抽出部３２、及び、定義作成部３３を含む。取得部３１、抽出部３２、及び、定義作成部３３は、例えば、記憶部１１に格納されたプログラム等が制御部１２の周辺回路であるＲＡＭ等に展開され、制御部１２の
プロセッサにより実行されることによって実現される。 As shown in FIG. 2, the control unit 12 includes an acquisition unit 31, an extraction unit 32, and a definition creation unit 33. For example, the acquisition unit 31, the extraction unit 32, and the definition creation unit 33 are expanded by a program stored in the storage unit 11 in a RAM that is a peripheral circuit of the control unit 12 and executed by the processor of the control unit 12. It is realized by doing.

取得部３１は、第１帳票のイメージデータと、該第１帳票と同種の帳票であって、該第１帳票には存在しない情報が存在する帳票である第２帳票のイメージデータとを取得する。 The acquisition unit 31 acquires the image data of the first form and the image data of the second form, which is a form of the same type as the first form and includes information that does not exist in the first form. .

抽出部３２は、第１帳票のイメージデータと第２帳票のイメージデータとの差分に基づいて、第１帳票には存在しない情報が存在する領域を第２帳票から抽出する。 Based on the difference between the image data of the first form and the image data of the second form, the extraction unit 32 extracts an area where information that does not exist in the first form exists from the second form.

定義作成部３３は、第２帳票において抽出された領域の位置を情報の読み取り位置として設定した、第１帳票及び第２帳票と同種の帳票の読み取り処理に用いる定義データを作成する。 The definition creating unit 33 creates definition data used for the reading process of the same form as the first form and the second form, in which the position of the area extracted in the second form is set as the information reading position.

なお、定義作成部３３は、第２帳票において抽出された領域に対して所定の認識処理を適用することで得られる、当該抽出された領域に存在する情報の属性を、当該抽出された領域に関する属性として定義データに設定してもよい。ここで、所定の認識処理とは、例えば、ＯＣＲ処理、バーコード認識処理等、画像解析によりその領域に存在する情報を認識する処理である。 Note that the definition creation unit 33 relates the attribute of the information existing in the extracted area obtained by applying a predetermined recognition process to the area extracted in the second form, regarding the extracted area. It may be set in the definition data as an attribute. Here, the predetermined recognition process is a process for recognizing information existing in the region by image analysis, such as an OCR process or a barcode recognition process.

当該領域に存在する情報が文字で表現されている場合、定義作成部３３は、当該領域にＯＣＲ処理を適用すると、当該領域に存在する情報をエラーなく取得することができる。このとき、定義作成部３３は、エラーなく読み取れた文字（取得した情報）の種別（アルファベット、数字等）、読み取れた文字の数等を、当該領域に関する属性として、定義データに設定してもよい。 When the information existing in the area is expressed by characters, the definition creating unit 33 can acquire the information existing in the area without error by applying the OCR process to the area. At this time, the definition creation unit 33 may set the type (alphabet, number, etc.) of the characters (acquired information) read without error, the number of characters read, and the like as the attributes related to the area in the definition data. .

一方、当該領域に存在する情報がバーコードで表現されている場合、定義作成部３３は、当該領域にバーコード認識処理を適用すると、当該領域に存在する情報をエラーなく取得することができる。このとき、定義作成部３３は、エラーなく読み取れたバーコード（取得した情報）の種別、読み取れたバーコードにより示される文字の数等を、当該領域に関する属性として、定義データに設定してもよい。 On the other hand, when the information existing in the area is expressed by a barcode, the definition creating unit 33 can acquire the information existing in the area without error by applying the barcode recognition process to the area. At this time, the definition creating unit 33 may set the type of the barcode (acquired information) read without error, the number of characters indicated by the read barcode, and the like as the attributes related to the area in the definition data. .

また、抽出部３２は、第１帳票又は第２帳票において、前記抽出した領域の位置周辺に存在する、文字が印字されている印字領域を特定してもよい。この場合、定義作成部３３は、当該印字領域から取得される文字により示される名称を、前記抽出された領域の項目名として定義データに設定してもよい。 In addition, the extraction unit 32 may specify a print area in which characters are printed that exist around the position of the extracted area in the first form or the second form. In this case, the definition creation unit 33 may set the name indicated by the characters acquired from the print area in the definition data as the item name of the extracted area.

また、この場合、定義作成部３３は、辞書データ２１を参照し、当該印字領域から取得される文字により示される名称が参照した辞書データ２１に登録されていなかった場合、当該印字領域から取得される文字により示される名称を、登録されている名称のうち、当該印字領域から取得される文字により示される名称に類似する名称に修正して、前記抽出された領域の項目名として定義データに設定してもよい。 Further, in this case, the definition creation unit 33 refers to the dictionary data 21, and if the name indicated by the character acquired from the print area is not registered in the referenced dictionary data 21, the definition creation unit 33 acquires the print data from the print area. The name indicated by the character is corrected to a name similar to the name indicated by the character acquired from the print area among the registered names, and set in the definition data as the item name of the extracted area May be.

また、取得部３１は、第１帳票及び第２帳票と同種の帳票であって、当該第１帳票及び当該第２帳票には存在しない情報が存在する帳票である第３帳票のイメージデータを更に取得してもよい。抽出部３２は、当該第１帳票のイメージデータと第３帳票のイメージデータとの差分から、当該第３帳票において、第１帳票及び第２帳票には存在しない情報が存在する領域を更に抽出してもよい。そして、第３帳票のイメージデータから抽出された領域が第２帳票のイメージデータから抽出された領域と重なる場合、定義作成部３３は、第２帳票のイメージデータから抽出された領域により設定された情報の読み取り位置を、第２帳票及び第３帳票のイメージデータから抽出された領域を含む領域の位置に修正して
もよい。 Further, the acquisition unit 31 further obtains image data of a third form, which is a form of the same type as the first form and the second form and includes information that does not exist in the first form and the second form. You may get it. The extraction unit 32 further extracts, from the difference between the image data of the first form and the image data of the third form, an area in the third form where information that does not exist in the first form and the second form exists. May be. When the area extracted from the image data of the third form overlaps with the area extracted from the image data of the second form, the definition creation unit 33 is set by the area extracted from the image data of the second form The information reading position may be corrected to the position of the area including the area extracted from the image data of the second form and the third form.

入出力部１４は、情報処理装置１の外部に存在する装置とデータの送受信を行うための１又は複数のインタフェースである。入出力部１４は、例えば、入力装置及び出力装置等のユーザインタフェースと接続するためのインタフェース、若しくは、ＵＳＢ（Universal Serial Bus）メモリ等の装置とＵＳＢ接続するためのインタフェース、又は、これらのインタフェースの組合せである。入出力部１４は、例えば、不図示のユーザインタフェース（タッチパネル、テンキー、キーボード、マウス、ディスプレイ等の入出力装置）と接続してもよい。また、入出力部１４は、スキャナ２と接続してもよい。この場合、情報処理装置１は、当該入出力部１４を介して、スキャナ２からデータを取得する。 The input / output unit 14 is one or a plurality of interfaces for transmitting / receiving data to / from a device existing outside the information processing device 1. The input / output unit 14 is, for example, an interface for connecting to a user interface such as an input device and an output device, an interface for connecting to a device such as a USB (Universal Serial Bus) memory, or these interfaces. It is a combination. For example, the input / output unit 14 may be connected to a user interface (not shown) (input / output devices such as a touch panel, a numeric keypad, a keyboard, a mouse, and a display). The input / output unit 14 may be connected to the scanner 2. In this case, the information processing apparatus 1 acquires data from the scanner 2 via the input / output unit 14.

通信部１５は、ネットワークを介して、他の装置とデータ通信を行うための１又は複数のインタフェースである。情報処理装置１とスキャナ２とがネットワークを介して接続される場合、情報処理装置１は、当該通信部１５を介して、スキャナ２からデータを取得する。 The communication unit 15 is one or a plurality of interfaces for performing data communication with other devices via a network. When the information processing apparatus 1 and the scanner 2 are connected via a network, the information processing apparatus 1 acquires data from the scanner 2 via the communication unit 15.

本実施形態に係る情報処理装置１は、これらの構成を備える装置である。情報処理装置１は、例えば、ＰＣ等の汎用コンピュータや仮想環境のコンピュータである。 The information processing apparatus 1 according to the present embodiment is an apparatus having these configurations. The information processing apparatus 1 is, for example, a general-purpose computer such as a PC or a virtual environment computer.

§２動作例
次に、図３〜９を用いて、本実施形態に係る情報処理装置１の動作例を説明する。なお、以下で説明する動作例は、本実施形態に係る情報処理装置１の情報処理の一例に過ぎず、各処理は、当該各処理の前に実行された処理の結果を用いる等の従属関係がない等、可能な限り入れ替えられてよい。 §2 Operation Example Next, an operation example of the information processing apparatus 1 according to the present embodiment will be described with reference to FIGS. Note that the operation example described below is merely an example of information processing of the information processing apparatus 1 according to the present embodiment, and each process uses a dependency relationship such as the result of the process executed before each process. They may be replaced as much as possible.

図３は、本実施形態に係る情報処理装置１の処理手順の一例を示す。なお、図３では、ステップを「Ｓ」と略称する。図４〜７、９、並びに、後述する図１１及び１２においても同様の略称を用いる。 FIG. 3 shows an example of a processing procedure of the information processing apparatus 1 according to the present embodiment. In FIG. 3, the step is abbreviated as “S”. Similar abbreviations are used in FIGS. 4 to 7 and 9 and FIGS. 11 and 12 described later.

まず、例えば、ユーザによる操作に応じて、記憶部１１に格納されたプログラムが、制御部１２のＲＡＭ等に展開される。そして、制御部１２のＲＡＭ等に展開された当該プログラムが、制御部１２のプロセッサにより実行される。このようにして、情報処理装置１は、処理を開始する。 First, for example, a program stored in the storage unit 11 is expanded in the RAM or the like of the control unit 12 in accordance with a user operation. Then, the program developed in the RAM or the like of the control unit 12 is executed by the processor of the control unit 12. In this way, the information processing apparatus 1 starts processing.

ステップ１００では、取得部３１によって、帳票のイメージデータが取得される。例えば、取得部３１は、記憶部１１等に格納されているイメージデータのうち、帳票の定義データを作成するために用いるイメージデータの選択操作を受け付ける。または、取得部３１は、情報処理装置１に接続されるスキャナ２において帳票の読み取りを受け付ける。本実施形態では、これにより、取得部３１は、第１帳票５０のイメージデータと第２帳票６０のイメージデータとを取得する。 In step 100, the image data of the form is acquired by the acquisition unit 31. For example, the acquisition unit 31 receives an operation of selecting image data used to create form definition data among the image data stored in the storage unit 11 or the like. Alternatively, the acquisition unit 31 accepts reading of a form in the scanner 2 connected to the information processing apparatus 1. In the present embodiment, the acquisition unit 31 thereby acquires the image data of the first form 50 and the image data of the second form 60.

ステップ２００では、第１帳票５０と第２帳票６０とが比較され、その差分が抽出され、抽出された差分に基づいて、当該第１帳票５０及び第２帳票６０と同種の帳票の定義データが作成される。当該定義データ作成処理の具体例は、図４〜６に示される。 In step 200, the first form 50 and the second form 60 are compared, the difference is extracted, and based on the extracted difference, the definition data of the same type of form as the first form 50 and the second form 60 is obtained. Created. Specific examples of the definition data creation processing are shown in FIGS.

図４は、本実施形態に係る情報処理装置１による定義データ作成処理を例示するフローチャートである。本動作例では、抽出部３２によって、ステップ１００において取得された第１帳票５０のイメージデータと第２帳票６０のイメージデータとの差分に基づいて、第２帳票６０における、第１帳票５０には存在しない情報が存在する領域が抽出される。そして、当該抽出結果に基づいて、定義作成部３３によって、第１帳票５０及び第２帳票
６０と同種の帳票に係る定義データが作成される。具体的には、以下のとおりに処理が実行される。 FIG. 4 is a flowchart illustrating definition data creation processing by the information processing apparatus 1 according to this embodiment. In this operation example, the first form 50 in the second form 60 is stored in the second form 60 based on the difference between the image data of the first form 50 and the image data of the second form 60 acquired in step 100 by the extraction unit 32. A region where information that does not exist is extracted. Then, based on the extraction result, the definition creation unit 33 creates definition data related to the same type of form as the first form 50 and the second form 60. Specifically, the process is executed as follows.

ステップ２１０では、抽出部３２によって、第２帳票６０における、第１帳票５０には存在しない情報が存在する領域の座標が取得される。当該座標取得処理の具体例は、図５により示される。 In step 210, the extraction unit 32 acquires the coordinates of an area in the second form 60 where information that does not exist in the first form 50 exists. A specific example of the coordinate acquisition process is shown in FIG.

図５は、本実施形態に係る情報処理装置１による座標取得処理を例示するフローチャートである。本動作例では、第１帳票５０から取得されるレイアウトに関する情報（レイアウト情報）に基づいて、第１帳票５０と第２帳票６０との差分領域が抽出され、抽出された領域の座標が取得される。具体的には、以下のとおりに処理が実行される。 FIG. 5 is a flowchart illustrating the coordinate acquisition process by the information processing apparatus 1 according to this embodiment. In this operation example, the difference area between the first form 50 and the second form 60 is extracted based on the layout-related information (layout information) acquired from the first form 50, and the coordinates of the extracted area are acquired. The Specifically, the process is executed as follows.

ステップ２１１では、第１帳票５０のレイアウト情報が取得される。例えば、抽出部３２は、第１帳票５０のイメージデータを解析し、罫線の情報、文字等が印字されている領域（プレ印字領域）を特定するための情報、円等の特定の条件を満たす図形の領域を特定するための情報等を含むレイアウト情報を取得する。罫線、プレ印字領域、特定の条件を満たす図形の領域を特定する方法は、特に限定されるものではなく、適宜選択される。例えば、罫線は、始点の座標、縦方向の長さ（高さ）、横方向の長さ（幅）、太さ等で表現される。また、プレ印字領域及び特定の条件を満たす図形の領域は、それぞれ、文字、図形等を含む矩形の左上端の座標、及び、右下端の座標等で表現される。罫線、プレ印字領域、特定の条件を満たす図形の領域の表現方法は、特に限定されるものではなく、適宜選択される。 In step 211, the layout information of the first form 50 is acquired. For example, the extraction unit 32 analyzes the image data of the first form 50, satisfies the specific conditions such as ruled line information, information for specifying the area (pre-printed area) on which characters and the like are printed, and a circle. Layout information including information for specifying a graphic area is acquired. The method for specifying the ruled line, the pre-printing area, and the area of the graphic that satisfies the specific condition is not particularly limited and is appropriately selected. For example, the ruled line is expressed by the coordinates of the start point, the length (height) in the vertical direction, the length (width) in the horizontal direction, the thickness, and the like. In addition, the pre-printing area and the graphic area satisfying the specific condition are expressed by the coordinates of the upper left corner and the coordinates of the lower right corner of a rectangle including characters, graphics, and the like, respectively. The expression method of the ruled line, the pre-printing area, and the graphic area that satisfies the specific condition is not particularly limited, and is appropriately selected.

ステップ２１２では、第１帳票５０のプレ印字領域から帳票名が取得される。例えば、定義作成部３３は、プレ印字領域のうち、フォントサイズの最も大きい文字が印字されているプレ印字領域の文字により示される名称を帳票名として取得する。また、例えば、定義作成部３３は、所定の位置に存在するプレ印字領域の文字により示される名称を帳票名として取得する。図１に示される例では、定義作成部３３は、帳票名として、「○○書」という名称を取得する。 In step 212, the form name is acquired from the pre-print area of the first form 50. For example, the definition creation unit 33 acquires, as a form name, the name indicated by the character in the pre-print area in which the character having the largest font size is printed out of the pre-print areas. For example, the definition creation unit 33 acquires the name indicated by the characters in the pre-print area existing at a predetermined position as the form name. In the example shown in FIG. 1, the definition creation unit 33 acquires the name “XX book” as the form name.

なお、定義作成部３３は、例えば、ＯＣＲ処理をプレ印字領域に適用することで、プレ印字領域に印字されている文字の情報を取得する。定義作成部３３は、取得した帳票名を、例えば、作成する定義データのファイル名等に利用する。 Note that the definition creating unit 33 acquires information on characters printed in the pre-printing area by applying OCR processing to the pre-printing area, for example. The definition creation unit 33 uses the acquired form name, for example, as a file name of definition data to be created.

ステップ２１３では、第１帳票５０のレイアウト情報が用いられて、第１帳票５０と第２帳票６０との照合作業が実行される。抽出部３２は、例えば、ステップ２１１で取得した第１帳票５０のレイアウト情報を用いて、第１帳票５０と第２帳票６０とを照合する。このとき、抽出部３２は、例えば、第１帳票５０のレイアウトと第２帳票６０のレイアウトが異なるほど、第１帳票５０と第２帳票６０とは一致しないと判定してもよい。また、抽出部３２は、例えば、第１帳票５０には存在しない記載等が第２帳票に存在するほど、第１帳票５０と第２帳票６０とは一致しないと判定してもよい。 In step 213, the layout information of the first form 50 is used, and the collation operation between the first form 50 and the second form 60 is executed. For example, the extraction unit 32 collates the first form 50 and the second form 60 using the layout information of the first form 50 acquired in step 211. At this time, for example, the extraction unit 32 may determine that the first form 50 and the second form 60 do not match as the layout of the first form 50 and the second form 60 differ. In addition, the extraction unit 32 may determine that the first form 50 and the second form 60 do not match, for example, as the description that does not exist in the first form 50 exists in the second form.

ここで、抽出部３２は、第１帳票５０と第２帳票６０とを照合した結果を、帳票の一致率として、完全に一致していることを示す数値から全く一致していないことを示す数値の範囲に含まれる数値で表現してもよい。そして、抽出部３２は、当該帳票の一致率が、第１帳票５０と第２帳票６０とが同一のイメージデータであると判定するための閾値を超える場合、処理対象となっている２つのイメージデータは同一のデータであると判定してもよい。この場合、制御部１２は、情報処理装置１に接続される表示装置（不図示）に、同一のデータが処理対象となっていることを伝えるためのエラーメッセージを表示してもよい。 Here, the extraction unit 32 uses a result obtained by comparing the first form 50 and the second form 60 as a form matching rate, and a numerical value indicating that there is no match from a numerical value indicating that they are completely matched. It may be expressed by a numerical value included in the range. Then, when the matching rate of the form exceeds the threshold for determining that the first form 50 and the second form 60 are the same image data, the extraction unit 32 selects the two images to be processed. The data may be determined to be the same data. In this case, the control unit 12 may display an error message on a display device (not shown) connected to the information processing device 1 to notify that the same data is a processing target.

ステップ２１４では、第１帳票５０と第２帳票６０との照合作業が更に詳細に実行される。例えば、抽出部３２は、第２帳票６０のイメージデータを８×８画素単位に分割する。そして、抽出部３２は、第２帳票６０のイメージデータにおいて、第１帳票５０のレイアウト情報により示される罫線、プレ印字領域、特定の条件を満たす図形の領域それぞれに対応する罫線、領域を特定する。これにより、抽出部３２は、第１帳票５０に含まれる罫線、プレ印字領域、特定の条件を満たす図形の領域それぞれに対応する第２帳票６０上の罫線、領域の座標を特定する。なお、抽出部３２は、第２帳票６０を８×８画素単位等に分割して第１帳票５０と照合することで、第２帳票６０における局所的なズレ等による照合ミスを可能な範囲で防止する。 In step 214, the collation operation between the first form 50 and the second form 60 is executed in more detail. For example, the extraction unit 32 divides the image data of the second form 60 into 8 × 8 pixel units. Then, the extraction unit 32 specifies ruled lines and areas corresponding to the ruled lines, the pre-printed areas, and the graphic areas that satisfy the specific conditions indicated by the layout information of the first form 50 in the image data of the second form 60. . As a result, the extraction unit 32 specifies the ruled lines and the coordinates of the areas on the second form 60 corresponding to the ruled lines, the pre-printed areas, and the graphic areas that satisfy the specific conditions included in the first form 50. The extraction unit 32 divides the second form 60 into 8 × 8 pixel units and collates it with the first form 50, so that a collation error due to local misalignment or the like in the second form 60 is possible. To prevent.

ステップ２１５では、ステップ２１４で実行された詳細な照合作業の結果に基づいて、第２帳票６０のイメージデータから、第１帳票５０に含まれる罫線に対応する罫線が消去される（定型消去）。抽出部３２は、例えば、第２帳票６０の罫線の太さを特定しながら、第２帳票６０のイメージデータの各分割領域において、第１帳票５０に含まれる罫線に対応する罫線を消去する。 In step 215, ruled lines corresponding to the ruled lines included in the first form 50 are erased from the image data of the second form 60 based on the result of the detailed collation performed in step 214 (standard erasure). For example, the extraction unit 32 erases the ruled line corresponding to the ruled line included in the first form 50 in each divided region of the image data of the second form 60 while specifying the thickness of the ruled line of the second form 60.

ステップ２１６では、ステップ２１４で実行された詳細な照合作業の結果に基づいて、第２帳票６０のイメージデータから、第１帳票５０に含まれるプレ印字領域の文字等、特定の条件を満たす図形に対応する印字、図形等が消去される（非定型消去）。抽出部３２は、例えば、第２帳票６０のイメージデータの各分割領域において、第１帳票５０に含まれるプレ印字領域及び特定の条件を満たす図形の領域それぞれに対応する領域内の画像を消去する。 In step 216, based on the result of the detailed collation performed in step 214, the image data of the second form 60 is converted into a graphic that satisfies a specific condition such as characters in the pre-print area included in the first form 50. Corresponding prints, graphics, etc. are erased (atypical erase). For example, in each divided area of the image data of the second form 60, the extraction unit 32 erases an image in an area corresponding to each of the pre-print area included in the first form 50 and the graphic area satisfying the specific condition. .

ステップ２１５及び２１６では、第２帳票６０のイメージデータから、第１帳票５０のイメージデータに含まれる罫線、文字、図形等に対応するものが消去される。よって、これらの消去処理の後に第２帳票６０のイメージデータに残るものは、例えば、領域６３に記入された文字、領域６４に印字されたバーコードであり、第１帳票５０には存在しないものである。これらは、第１帳票５０と第２帳票６０の差分に対応し、第１帳票５０には存在しない、第２帳票６０に存在する情報に該当する。 In steps 215 and 216, those corresponding to the ruled lines, characters, graphics, etc. included in the image data of the first form 50 are deleted from the image data of the second form 60. Therefore, what remains in the image data of the second form 60 after these erasure processes are, for example, characters entered in the area 63 and bar codes printed in the area 64, which do not exist in the first form 50. It is. These correspond to the difference between the first form 50 and the second form 60 and correspond to information existing in the second form 60 that does not exist in the first form 50.

なお、本動作例では、領域６３に対応する第１帳票５０の領域には、何ら記載がないことを前提としている。しかしながら、上述のとおり、領域６３に対応する第１帳票５０の領域には、領域６３に記入されている「ＡＡＡＡ」とは異なる文字が記入されていてもよい。この場合、本実施形態では、領域６３に対応する第１帳票５０の領域は、プレ印字領域、又は、特定の条件を満たす図形の領域として扱われる。そうすると、上記ステップ２１６の消去処理を実行した場合、領域６３に記入されている「ＡＡＡＡ」は、第１帳票５０には存在しない情報であるにも関わらず、消去されてしまう可能性がある。 In this operation example, it is assumed that there is no description in the area of the first form 50 corresponding to the area 63. However, as described above, in the area of the first form 50 corresponding to the area 63, characters different from “AAAA” written in the area 63 may be entered. In this case, in the present embodiment, the area of the first form 50 corresponding to the area 63 is treated as a pre-print area or a graphic area that satisfies a specific condition. Then, when the erasing process in step 216 is executed, “AAAA” entered in the area 63 may be erased even though the information does not exist in the first form 50.

抽出部３２は、当該可能性を考慮して、第２帳票６０のイメージデータにおいて第１帳票５０のプレ印字領域等に対応する領域の消去処理を実行する前に、当該消去処理を実行するか否かを判断するための検証処理を実行してもよい。 Whether the extraction unit 32 performs the erasure process before executing the erasure process of the area corresponding to the pre-print area of the first form 50 in the image data of the second form 60 in consideration of the possibility. Verification processing for determining whether or not may be performed.

抽出部３２は、例えば、当該検証処理として、第１帳票５０のプレ印字領域等と当該プレ印字領域等に対応する第２帳票６０の領域との一致具合を示す一致率を求めてもよい。第１帳票５０のプレ印字領域等に記入されている文字と当該プレ印字領域等に対応する第２帳票６０の領域に記入されている文字とが異なるほど、当該一致率は低くなる。 For example, as the verification process, the extraction unit 32 may obtain a coincidence rate indicating the degree of coincidence between the pre-print area of the first form 50 and the area of the second form 60 corresponding to the pre-print area. As the characters entered in the pre-print area etc. of the first form 50 and the characters entered in the area of the second form 60 corresponding to the pre-print area etc. differ, the matching rate becomes lower.

したがって、抽出部３２は、これらの領域に異なる文字が記入されているか否かを検証するために、これらの領域の一致率が所定の閾値よりも低くなるか否かを判定してもよい
。当該閾値は、第１帳票５０のプレ印字領域等に記入されている文字と当該プレ印字領域等に対応する第２帳票６０の領域に記入されている文字とが異なるか否かを検証するために、適宜設定される。 Therefore, in order to verify whether or not different characters are written in these areas, the extraction unit 32 may determine whether or not the matching rate of these areas is lower than a predetermined threshold. The threshold value is used to verify whether the characters entered in the pre-print area of the first form 50 and the characters entered in the area of the second form 60 corresponding to the pre-print area are different. Is set as appropriate.

抽出部３２は、所定の閾値よりも当該一致率が低くなると判定した場合、第１帳票５０のプレ印字領域等と当該プレ印字領域等に対応する第２帳票６０の領域とには異なる文字が記入されていると判断する。そして、このように判断した場合、抽出部３２は、当該プレ印字領域等に対応する第２帳票６０の領域に対して、上記ステップ２１６の消去処理の実行をスキップする。これにより、上述のような可能性を防止することが可能である。なお、この点は、文字以外の場合であっても同様である。 When the extraction unit 32 determines that the coincidence rate is lower than a predetermined threshold, different characters are present in the pre-print area of the first form 50 and the area of the second form 60 corresponding to the pre-print area. Judge that it is filled in. If it is determined in this way, the extraction unit 32 skips the execution of the erasure process in step 216 for the area of the second form 60 corresponding to the pre-print area or the like. Thereby, it is possible to prevent the above-mentioned possibility. This also applies to cases other than characters.

ステップ２１７では、ステップ２１５及び２１６の消去処理で消去されなかった文字、バーコード等が存在する領域の座標が取得される。上述のとおり、ステップ２１５及び２１６の消去処理で消去されなかった文字、バーコード等は、第１帳票５０には存在しない情報である。抽出部３２は、第２帳票６０において当該情報が存在する領域を特定し、特定した領域の座標を取得する。 In step 217, the coordinates of the area in which characters, barcodes, etc. that have not been erased by the erasure processing in steps 215 and 216 exist are obtained. As described above, characters, barcodes, and the like that have not been erased by the erasure processing in steps 215 and 216 are information that does not exist in the first form 50. The extraction unit 32 specifies an area where the information exists in the second form 60, and acquires the coordinates of the specified area.

例えば、抽出部３２は、ステップ２１５及び２１６の消去処理で消去されなかった文字、バーコード等を覆う矩形領域を特定する。そして、抽出部３２は、当該矩形領域を表現する座標を取得する。例えば、当該矩形領域は、矩形の左上端の座標と右下端の座標で表現される。これにより、抽出部３２は、ステップ２１５及び２１６の消去処理で消去されなかった文字、バーコード等が存在する領域の座標を取得する。 For example, the extraction unit 32 identifies a rectangular area that covers characters, barcodes, and the like that have not been erased by the erasure processing in steps 215 and 216. And the extraction part 32 acquires the coordinate expressing the said rectangular area. For example, the rectangular area is represented by the coordinates of the upper left corner and the coordinates of the lower right corner of the rectangle. As a result, the extraction unit 32 acquires the coordinates of an area where characters, barcodes, and the like that have not been erased by the erasure processing in steps 215 and 216 exist.

なお、本実施形態では、抽出部３２は、ステップ２１５及び２１６の消去処理で消去されなかった文字、バーコード等を覆う矩形領域を特定する際に、それぞれの間の距離を考慮する。例えば、抽出部３２は、所定の距離離れているものは、それぞれ別の領域に含まれるものと判定する。これにより、抽出部３２は、第２帳票６０において、領域６３と領域６４とを区別して抽出し、それぞれの座標を取得する。 In the present embodiment, the extraction unit 32 considers the distance between the respective areas when specifying a rectangular area that covers characters, barcodes, and the like that have not been erased by the erasure processing in steps 215 and 216. For example, the extraction unit 32 determines that those that are separated by a predetermined distance are included in different regions. Thereby, the extraction part 32 distinguishes and extracts the area | region 63 and the area | region 64 in the 2nd form 60, and acquires each coordinate.

これらの処理により、第２帳票６０における、第１帳票５０には存在しない情報が存在する領域の座標が取得される。図１に示される例では、領域６３及び領域６４それぞれを表現する座標が取得される。これにより、座標取得処理は終了する。そして、処理は、ステップ２２０に進む。 Through these processes, the coordinates of the area in the second form 60 where information that does not exist in the first form 50 exists are acquired. In the example illustrated in FIG. 1, coordinates representing each of the region 63 and the region 64 are acquired. Thereby, the coordinate acquisition process ends. Then, the process proceeds to step 220.

なお、抽出部３２は、領域６３のような欄（欄５２、欄６２）に含まれる領域を抽出した場合、抽出した領域を表現する座標に代えて、抽出した領域を含む欄を表現する座標を取得してもよい。抽出部３２は、例えば、ステップ２１１において作成したレイアウト情報に含まれる罫線情報を参照することで、当該抽出した領域を含む欄を特定する。 In addition, when the extraction unit 32 extracts a region included in a column such as the region 63 (column 52, column 62), the coordinate representing the column including the extracted region is used instead of the coordinate representing the extracted region. May be obtained. For example, the extraction unit 32 refers to the ruled line information included in the layout information created in step 211 to identify the column including the extracted region.

図４に戻り、ステップ２２０では、第２帳票６０において抽出された領域の座標を情報の読み取り位置として設定した、第１帳票５０及び第２帳票６０と同種の帳票の読み取り処理に用いる定義データが作成される。例えば、定義作成部３３は、内容が空の定義データを用意し、ステップ２１０で取得した座標を情報の読み取り位置として、用意した定義データに設定する。図１に示される例では、領域６３及び領域６４それぞれの座標が、第１帳票５０及び第２帳票６０と同種の帳票において、当該帳票に存在する情報をＯＣＲ処理、バーコード認識処理等で読み取り位置として設定された定義データが作成される。 Returning to FIG. 4, in step 220, definition data used for reading the same type of form as the first form 50 and the second form 60 in which the coordinates of the area extracted in the second form 60 are set as the information reading position. Created. For example, the definition creating unit 33 prepares empty definition data, and sets the coordinates acquired in step 210 as the information reading position in the prepared definition data. In the example shown in FIG. 1, in the same type of form as the first form 50 and the second form 60, the information in the form is read by OCR processing, barcode recognition processing, etc. Definition data set as a position is created.

なお、定義作成部３３は、ステップ２１１で作成された第１帳票５０のレイアウト情報を、第１帳票５０及び第２帳票６０と同種の帳票のレイアウト情報として、作成する定義データに設定してもよい。この場合、定義作成部３３は、ステップ２１１で作成された第
１帳票５０のレイアウト情報の一部を削除した後に、一部を削除したレイアウト情報を定義データに登録してもよい。例えば、定義作成部３３は、ステップ２１１で作成された第１帳票５０のレイアウト情報において、第１帳票５０において手書きで記入された文字の領域等を特定するための情報を削除する。このように、定義作成部３３は、第１帳票５０と同種の他の帳票において存在しない可能性のある領域を極力省いた上で、レイアウト情報を定義データに登録してもよい。 The definition creation unit 33 may set the layout information of the first form 50 created in step 211 as definition data to be created as layout information of the same form as the first form 50 and the second form 60. Good. In this case, the definition creation unit 33 may delete a part of the layout information of the first form 50 created in step 211 and then register the layout information from which the part has been deleted in the definition data. For example, the definition creation unit 33 deletes information for specifying a region of a character handwritten in the first form 50 in the layout information of the first form 50 created in step 211. As described above, the definition creation unit 33 may register the layout information in the definition data after omitting an area that may not exist in another form of the same type as the first form 50 as much as possible.

ステップ２３０では、第２帳票６０において抽出された領域６３、領域６４等の領域に対して所定の認識処理が実行される。所定の認識処理とは、ＯＣＲ処理、バーコード認識処理等である。定義作成部３３は、第２帳票６０において抽出された領域６３、領域６４等の各領域に対して、ＯＣＲ処理、バーコード認識処理等設定された所定の認識処理を順次実行する。 In step 230, a predetermined recognition process is executed for the areas 63, 64, etc. extracted in the second form 60. The predetermined recognition process is an OCR process, a barcode recognition process, or the like. The definition creating unit 33 sequentially executes predetermined recognition processing such as OCR processing, barcode recognition processing, and the like for each of the regions 63 and 64 extracted in the second form 60.

例えば、対象の領域に存在する情報が文字で表現されている場合、バーコード認識処理は成功しないが（エラーとなる）、ＯＣＲ処理は成功する。他方、対象の領域に存在する情報がバーコードで表現されている場合、文字認識のためのＯＣＲ処理は成功しないが、バーコード認識処理は成功する。 For example, when the information existing in the target area is expressed by characters, the barcode recognition process does not succeed (results in an error), but the OCR process succeeds. On the other hand, when the information existing in the target area is expressed by a barcode, the OCR process for character recognition is not successful, but the barcode recognition process is successful.

ステップ２４０及び２５０では、定義作成部３３によって、ステップ２３０で実行した所定の認識処理が成功した場合に（ステップ２４０の「ＹＥＳ」）、当該認識処理によって得られた、対象の領域に存在する情報の属性が、当該対象の領域に関する属性として定義データに設定される（ステップ２５０）。他方、ステップ２３０で実行した所定の認識処理が失敗した場合（ステップ２４０の「ＮＯ」）、ステップ２５０の処理は省略される。 In steps 240 and 250, when the predetermined recognition process executed in step 230 is successful by the definition creating unit 33 ("YES" in step 240), information existing in the target region obtained by the recognition process. Is set in the definition data as an attribute relating to the target area (step 250). On the other hand, when the predetermined recognition process executed in step 230 fails (“NO” in step 240), the process in step 250 is omitted.

図１に示される例では、領域６３において、ＯＣＲ処理が成功する。定義作成部３３は、当該ＯＣＲ処理によって、領域６３に記入されている文字の種類に関する属性（文字種）と、文字の数に関する属性（文字数）とを特定することができる。具体的には、定義作成部３３は、領域６３に対するＯＣＲ処理の結果として、文字種の属性値「アルファベット」と、文字数の属性値「４」とを取得する。 In the example shown in FIG. 1, the OCR process is successful in the area 63. The definition creation unit 33 can specify an attribute (character type) related to the type of characters entered in the area 63 and an attribute (number of characters) related to the number of characters by the OCR process. Specifically, the definition creation unit 33 acquires the attribute value “alphabet” of the character type and the attribute value “4” of the number of characters as a result of the OCR process for the region 63.

また、領域６４において、バーコード認識処理が成功する。定義作成部３３は、当該バーコード認識処理によって、領域６４に印字されているバーコードの種類に関する属性（バーコード種別）と、バーコードにより示される文字の数に関する属性（文字数）とを特定することができる。具体的には、定義作成部３３は、バーコード種別の属性値「ＮＷ−７」と、文字数の属性値「１２」とを取得する。 In the area 64, the barcode recognition process is successful. The definition creation unit 33 specifies an attribute (barcode type) related to the type of barcode printed in the area 64 and an attribute (number of characters) related to the number of characters indicated by the barcode by the barcode recognition process. be able to. Specifically, the definition creation unit 33 acquires the attribute value “NW-7” of the barcode type and the attribute value “12” of the number of characters.

ステップ２６０では、第１帳票５０又は第２帳票６０において、第２帳票６０において抽出された領域周辺の印字領域が捜索され、発見された印字領域から当該抽出された領域に係る項目名が取得される。当該項目名取得処理の具体例は、図６により示される。 In step 260, in the first form 50 or the second form 60, the print area around the area extracted in the second form 60 is searched, and the item name relating to the extracted area is acquired from the found print area. The A specific example of the item name acquisition process is shown in FIG.

図６は、本実施形態に係る情報処理装置１による項目名取得処理を例示するフローチャートである。本動作例では、ステップ２１０において第２帳票６０から抽出された領域を対象領域として、当該対象領域周辺から印字領域が捜査され、発見された印字領域から項目名が取得される。ステップ２１０において第２帳票６０から複数の領域が抽出された場合は、当該複数の領域それぞれが対象領域として処理される。具体的には、以下のとおりに処理が実行される。 FIG. 6 is a flowchart illustrating an item name acquisition process by the information processing apparatus 1 according to this embodiment. In this operation example, with the area extracted from the second form 60 in step 210 as the target area, the print area is searched from the periphery of the target area, and the item name is acquired from the found print area. When a plurality of areas are extracted from the second form 60 in step 210, each of the plurality of areas is processed as a target area. Specifically, the process is executed as follows.

ステップ２６１では、第１帳票５０又は第２帳票６０において、対象領域の周辺に存在する印字領域が捜索される。例えば、定義作成部３３は、第１帳票５０のレイアウト情報
を参照して、対象領域の周囲で所定の距離内にあるプレ印字領域を検索する。また、例えば、定義作成部３３は、第２帳票６０のイメージデータにおいて、対象領域の周辺で所定の距離内にある印字領域を検索する。 In step 261, in the first form 50 or the second form 60, a print area existing around the target area is searched. For example, the definition creating unit 33 refers to the layout information of the first form 50 and searches for a pre-print region that is within a predetermined distance around the target region. For example, the definition creation unit 33 searches the image data of the second form 60 for a print area within a predetermined distance around the target area.

対象領域に存在する情報の内容を示す項目名が印字された印字領域は、一般的に、当該対象領域の上、又は、左に隣接して存在する。この点を考慮し、定義作成部３３は、対象領域の周囲で所定の距離内にあるプレ印字領域を検索する際、その検索範囲を対象領域の上側、又は、左側に限定してもよい。 A print area in which an item name indicating the content of information existing in the target area is printed generally exists on the target area or adjacent to the left. Considering this point, the definition creating unit 33 may limit the search range to the upper side or the left side of the target area when searching for a pre-print area within a predetermined distance around the target area.

ステップ２６２では、定義作成部３３によって、対象領域の周辺に印字領域が存在するか否か判定される。対象領域の周辺に１又は複数の印字領域が存在する場合、処理はステップ２６３に進む。 In step 262, the definition creating unit 33 determines whether a print area exists around the target area. If one or more print areas exist around the target area, the process proceeds to step 263.

図１に示される例では、領域６３の周辺には、「Ａ項目」が印字されたプレ印字領域が存在する。当該処理において、定義作成部３３は、領域６３の周辺に存在する印字領域として、「Ａ項目」が印字された領域を発見する。 In the example shown in FIG. 1, a pre-print area in which “Item A” is printed exists around the area 63. In this process, the definition creation unit 33 finds an area where “A item” is printed as a print area existing around the area 63.

一方、対象領域の周辺に印字領域が存在しない場合、項目名取得処理は終了する。この場合、定義作成部３３は、対象領域の周辺から項目名を取得することはできない。このとき、定義作成部３３は、項目名を任意の方法で定めてもよい。 On the other hand, if there is no print area around the target area, the item name acquisition process ends. In this case, the definition creation unit 33 cannot acquire the item name from the periphery of the target area. At this time, the definition creation unit 33 may determine the item name by any method.

図１に示される例では、領域６４の周辺には項目名が印字された領域は存在しない。そのため、定義作成部３３は、領域６４に存在する情報を示す項目名を当該領域６４の周辺から取得することはできない。この場合、定義作成部３３は、例えば、「（対象物の名称）＿（追番）」と項目名を定めてもよい。このとき、定義作成部３３は、領域６４に存在する情報を示す項目名を「バーコード＿１」と決定する。 In the example shown in FIG. 1, there is no area where the item name is printed around the area 64. Therefore, the definition creation unit 33 cannot acquire item names indicating information existing in the area 64 from the periphery of the area 64. In this case, the definition creation unit 33 may determine the item name, for example, “(name of object) _ (additional number)”. At this time, the definition creating unit 33 determines the item name indicating the information existing in the area 64 as “barcode_1”.

ステップ２６３では、定義作成部３３によって、対象領域の周辺に存在する印字領域に含まれる文字列が、項目名の候補として取得される。定義作成部３３は、対象領域の周辺に存在する印字領域に対してＯＣＲ処理を適用することにより、当該印字領域に含まれる文字列を取得する。ステップ２６１の処理において複数の印字領域が発見された場合、項目名の候補として、複数の文字列が取得される。 In step 263, the definition creating unit 33 acquires a character string included in the print area around the target area as a candidate item name. The definition creation unit 33 acquires the character string included in the print area by applying the OCR process to the print area existing around the target area. If a plurality of print areas are found in the process of step 261, a plurality of character strings are acquired as item name candidates.

ステップ２６４では、定義作成部３３は、項目名として使用される可能性のある名称が登録されている、記憶部１１に格納された辞書データ２１にアクセスする。情報処理装置１がネットワークを介してアクセス可能な他の装置に辞書データ２１が保持されている場合、定義作成部３３は、ネットワークを介して、当該他の装置が保持する辞書データ２１にアクセスしてもよい。 In step 264, the definition creation unit 33 accesses the dictionary data 21 stored in the storage unit 11 in which names that may be used as item names are registered. When the dictionary data 21 is held in another device accessible by the information processing apparatus 1 via the network, the definition creation unit 33 accesses the dictionary data 21 held by the other device via the network. May be.

ステップ２６５では、辞書データ２１が参照され、ステップ２６３で取得された項目名候補と完全一致する文字列が辞書データ２１に登録されているか判定される。完全一致する文字列が辞書データ２１に登録されていない項目名候補については、ステップ２６６の処理が適用される（ステップ２６５の「ＮＯ」）。他方、完全一致する文字列が辞書データ２１に登録されている項目名候補については、ステップ２６６の処理が省略される（ステップ２６５の「ＹＥＳ」）。 In step 265, the dictionary data 21 is referred to, and it is determined whether a character string that completely matches the item name candidate acquired in step 263 is registered in the dictionary data 21. For item name candidates for which a completely matched character string is not registered in the dictionary data 21, the process of step 266 is applied ("NO" in step 265). On the other hand, the processing of step 266 is omitted for the item name candidates in which the character string that completely matches is registered in the dictionary data 21 (“YES” in step 265).

ステップ２６６では、辞書データ２１に登録されている文字列に基づいて、項目名候補の文字列が修正される。例えば、定義作成部３３は、項目名候補の文字列を、辞書データ２１に登録されている文字列のうち、項目名候補の文字列に類似する文字列に修正する。項目名候補の文字列に類似する文字列は、例えば、項目名候補の文字列と一致する文字が
一番多い文字列、項目名候補の文字列と前方一致する文字列、項目名候補の文字列と後方一致する文字列等である。項目名候補の文字列に類似するか否かの判定方法は、適宜選択される。 In step 266, the character string of the item name candidate is corrected based on the character string registered in the dictionary data 21. For example, the definition creation unit 33 corrects the character string of the item name candidate to a character string similar to the character string of the item name candidate among the character strings registered in the dictionary data 21. The character string similar to the item name candidate character string is, for example, the character string with the most characters matching the item name candidate character string, the character string matching the item name candidate character string, or the item name candidate character. For example, a character string that matches the sequence backward. The method for determining whether or not the item name candidate character string is similar is selected as appropriate.

ステップ２６７では、定義作成部３３によって、対象領域に係る項目名候補から、当該対象領域の項目名が特定される。対象領域に係る項目名候補の文字列が１つである場合、定義作成部３３は、その文字列を対象領域の項目名と特定する。他方、対象領域に係る項目名候補の文字列が複数ある場合、定義作成部３３は、その複数の文字列の中から１つの文字列を選択して、選択した文字列を対象領域の項目名と特定する。例えば、定義作成部３３は、項目名候補をユーザに提示することで、対象領域の項目名に設定する文字列の選択を受け付けてもよい。また、定義作成部３３は、辞書データに登録されていない項目名候補の文字列を除外した後に、対象領域の項目名を特定してもよい。 In step 267, the definition creation unit 33 identifies the item name of the target area from the item name candidates related to the target area. When there is one item name candidate character string related to the target area, the definition creating unit 33 identifies that character string as the item name of the target area. On the other hand, when there are a plurality of item name candidate character strings related to the target area, the definition creating unit 33 selects one character string from the plurality of character strings, and selects the selected character string as the item name of the target area. Is specified. For example, the definition creation unit 33 may accept selection of a character string to be set as the item name of the target area by presenting the item name candidate to the user. Further, the definition creation unit 33 may specify the item name of the target area after excluding the character string of the item name candidate that is not registered in the dictionary data.

これらの処理により、第２帳票６０から抽出された領域に係る項目名が取得される。図１では、例えば、領域６３の項目名として、「Ａ項目」という名称が取得される。これにより、項目名取得処理は、終了する。そして、処理は、ステップ２７０に進む。 By these processes, the item name related to the area extracted from the second form 60 is acquired. In FIG. 1, for example, the name “A item” is acquired as the item name of the region 63. Thereby, the item name acquisition process ends. Then, the process proceeds to Step 270.

図４に戻り、ステップ２７０では、ステップ２６０において取得された項目名が、対象領域に存在する情報の内容を示す項目名として、定義データに設定される。 Returning to FIG. 4, in step 270, the item name acquired in step 260 is set in the definition data as an item name indicating the content of the information existing in the target area.

これらの処理により、情報の読み取り位置（座標）、情報の属性（文字種、文字数等）、項目名、及び、帳票名が設定された、第１帳票５０及び第２帳票６０と同種の帳票の定義データが作成される。そして、定義データ作成処理は終了する。なお、定義データに属性を設定しない場合、ステップ２３０〜２５０の処理は省略可能である。また、定義データに項目名を設定しない場合、ステップ２６０及び２７０の処理は省略可能である。当該定義データ作成処理が終了すると、処理は、ステップ３００に進む。 Through these processes, the definition of the same form as the first form 50 and the second form 60 in which the information reading position (coordinates), information attributes (character type, number of characters, etc.), item name, and form name are set. Data is created. Then, the definition data creation process ends. If no attribute is set in the definition data, the processes in steps 230 to 250 can be omitted. If no item name is set in the definition data, the processing in steps 260 and 270 can be omitted. When the definition data creation process ends, the process proceeds to step 300.

図３に戻り、ステップ３００では、制御部１２によって、定義データ修正処理が実行される。当該定義データ修正処理の具体例は、図７及び８により示される。 Returning to FIG. 3, in step 300, definition data correction processing is executed by the control unit 12. Specific examples of the definition data correction process are shown in FIGS.

図７は、本実施形態に係る情報処理装置１による定義データ修正処理を例示するフローチャートである。本動作例では、ステップ２００で作成された定義データがユーザに提示される。そして、ユーザは、提示された情報に基づいて、作成された定義データを手動で修正する。具体的には、以下のとおりに処理が実行される。 FIG. 7 is a flowchart illustrating definition data correction processing by the information processing apparatus 1 according to this embodiment. In this operation example, the definition data created in step 200 is presented to the user. Then, the user manually corrects the created definition data based on the presented information. Specifically, the process is executed as follows.

ステップ３０１では、ステップ２００で作成された定義データが、情報処理装置１に接続される表示装置（不図示）に表示される。表示画面例は、図８により示される。 In step 301, the definition data created in step 200 is displayed on a display device (not shown) connected to the information processing apparatus 1. An example of the display screen is shown in FIG.

図８は、本実施形態に係る情報処理装置１により作成された定義データの表示画面を例示する。図８に示される例では、画面の右側において定義データの作成に使用された帳票が表示されている。表示される帳票は、第１帳票５０又は第２帳票６０のいずれでもよい。また、例えば、制御部１２は、第１帳票５０及び第２帳票６０の両方が表示されるように、第１帳票５０及び第２帳票６０の各イメージデータのアルファ値を調整して、当該各イメージデータを重ね合わせて表示してもよい。 FIG. 8 illustrates a display screen of definition data created by the information processing apparatus 1 according to the present embodiment. In the example shown in FIG. 8, a form used to create definition data is displayed on the right side of the screen. The displayed form may be either the first form 50 or the second form 60. For example, the control unit 12 adjusts the alpha value of each image data of the first form 50 and the second form 60 so that both the first form 50 and the second form 60 are displayed, and Image data may be superimposed and displayed.

また、画面の左側において、定義データの各パラメタ値が、ユーザの操作により変更可能な状態で、表示されている。例えば、ユーザは、情報処理装置１に接続される入力装置（不図示）を操作して、パラメタ値を入力したり、プルダウンリスト表示されるパラメタ値を選択したりすることで、定義データに設定されている各パラメタ値を修正する。なお、ユーザは、入力装置を操作して、ボタン８０を押下することで、当該修正操作を終了す
る。 On the left side of the screen, each parameter value of the definition data is displayed in a state that can be changed by a user operation. For example, the user operates the input device (not shown) connected to the information processing device 1 to input the parameter value or select the parameter value displayed in the pull-down list, thereby setting the definition data. Modify each parameter value that has been specified. The user operates the input device and presses the button 80 to end the correction operation.

ステップ３０２では、制御部１２によって、このようなユーザの修正操作が受け付けられる。そして、ステップ３０３では、制御部１２は、当該修正操作を終了するか否かを判定する。ボタン８０が押下された場合、制御部１２は、当該修正操作を終了すると判定し（ステップ３０３の「ＹＥＳ」）、定義データ修正処理を終了する。定義データ修正処理が終了すると、処理は、ステップ４００に進む。他方、ボタン８０が押下されていない場合、制御部１２は、当該修正操作を終了しないと判定し（ステップ３０３の「ＮＯ」）、引き続き修正操作が受け付けられる（ステップ３０２）。 In step 302, the control unit 12 accepts such a user correction operation. In step 303, the control unit 12 determines whether or not to end the correction operation. When the button 80 is pressed, the control unit 12 determines to end the correction operation (“YES” in step 303), and ends the definition data correction process. When the definition data correction process ends, the process proceeds to step 400. On the other hand, when the button 80 is not pressed, the control unit 12 determines that the correction operation is not finished (“NO” in Step 303), and the correction operation is continuously accepted (Step 302).

図３に戻り、ステップ４００では、定義データ保存処理が実行される。当該定義データ保存処理の具体例は、図９により示される。 Returning to FIG. 3, in step 400, a definition data storage process is executed. A specific example of the definition data storage process is shown in FIG.

図９は、本実施形態に係る情報処理装置１による定義データ保存処理を例示するフローチャートである。本動作例では、作成された定義データにエラーが発生していないか否かが判定される。そして、エラーが発生していない場合、作成された定義データが保存される。具体的には、以下のとおりに処理が実行される。 FIG. 9 is a flowchart illustrating definition data storage processing by the information processing apparatus 1 according to this embodiment. In this operation example, it is determined whether or not an error has occurred in the created definition data. If no error has occurred, the created definition data is saved. Specifically, the process is executed as follows.

ステップ４０１では、制御部１２は、同じ項目名の領域が複数存在するか否かを判定する。例えば、制御部１２は、作成された定義データを参照し、複数の領域において、項目名に同じパラメタ値（名称）が設定されているか否かを判定する。制御部１２は、複数の領域において、項目名に同じパラメタ値が設定されていると判定した場合（ステップ４０１の「ＹＥＳ」）、表示装置に、同じ項目名の領域が複数存在することを示すエラーメッセージを出力する（ステップ４０４）。他方、制御部１２は、複数の領域において、項目名に同じパラメタ値が設定されていないと判定した場合（ステップ４０１の「ＮＯ」）、ステップ４０２に処理を進める。 In step 401, the control unit 12 determines whether there are a plurality of areas having the same item name. For example, the control unit 12 refers to the created definition data and determines whether or not the same parameter value (name) is set for the item name in a plurality of areas. When it is determined that the same parameter value is set for the item name in a plurality of areas (“YES” in step 401), the control unit 12 indicates that there are a plurality of areas having the same item name on the display device. An error message is output (step 404). On the other hand, when it is determined that the same parameter value is not set for the item name in the plurality of areas (“NO” in step 401), the control unit 12 advances the process to step 402.

ステップ４０２では、制御部１２は、作成された定義データにおいて、異常な値のパラメタ値が設定されているか否かを判定する。例えば、複数の読み取り領域が重なり合っている場合、読み取り領域として設定されている領域がイメージデータの範囲外である場合等、設定されているパラメタ値が所定の条件を満たしている場合、制御部１２は、異常な値のパラメタ値が存在すると判定する（ステップ４０２の「ＹＥＳ」）。そして、制御部１２は、当該異常な値と判定された理由を示すエラーメッセージを出力する（ステップ４０４）。他方、異常な値のパラメタ値が存在しないと判定した場合（ステップ４０２の「ＮＯ」）、制御部１２は、ステップ４０３に処理を進める。 In step 402, the control unit 12 determines whether or not an abnormal parameter value is set in the created definition data. For example, when the set parameter values satisfy a predetermined condition, such as when a plurality of reading areas overlap, or when the area set as the reading area is outside the range of the image data, the control unit 12 Determines that there is an abnormal parameter value ("YES" in step 402). And the control part 12 outputs the error message which shows the reason determined to be the said abnormal value (step 404). On the other hand, when it is determined that there is no parameter value having an abnormal value (“NO” in step 402), the control unit 12 advances the process to step 403.

ステップ４０３では、作成された定義データが保存される。例えば、制御部１２は、これまでの処理で作成された定義データを記憶部１１に格納する。これにより、定義データ格納処理は終了する。そして、情報処理装置１は、本動作例に係る処理を終了する。 In step 403, the created definition data is saved. For example, the control unit 12 stores the definition data created by the processes so far in the storage unit 11. Thereby, the definition data storage process ends. Then, the information processing apparatus 1 ends the process according to this operation example.

＜その他＞
なお、本実施形態に係る情報処理装置１は、定義データを作成する際に、複数の第２帳票６０を利用してもよい。複数の第２帳票６０を利用する場合、例えば、情報処理装置１は、複数の第２帳票６０それぞれに、上記処理を適用することで、定義データを作成する。 <Others>
Note that the information processing apparatus 1 according to the present embodiment may use a plurality of second forms 60 when creating definition data. When using a plurality of second forms 60, for example, the information processing apparatus 1 creates the definition data by applying the above process to each of the plurality of second forms 60.

この場合、例えば、欄５２等の同じ対象領域が、複数の第２帳票６０から取得される可能性がある。このとき、例えば、抽出部３２は、それぞれの第２帳票６０から取得された領域が重なる場合、これらの領域を含む１つの領域として抽出してもよい。これにより、情報処理装置１は、複数の第２帳票６０から抽出される領域について、可能な限り、同じ
対象領域として認識する。 In this case, for example, the same target area such as the column 52 may be acquired from the plurality of second forms 60. At this time, for example, when the areas acquired from the respective second forms 60 overlap, the extraction unit 32 may extract the areas as a single area including these areas. Thereby, the information processing apparatus 1 recognizes the areas extracted from the plurality of second forms 60 as the same target area as much as possible.

§３変形例
なお、本実施形態に係る情報処理装置１は、第３帳票を更に取得し、当該第３帳票を用いて定義データを修正してもよい。当該第３帳票に基づく定義データの修正処理を、図１０〜１２を用いて説明する。 §3 Modification Note that the information processing apparatus 1 according to the present embodiment may further acquire a third form and modify the definition data using the third form. The definition data correction process based on the third form will be described with reference to FIGS.

図１０は、本実施形態に係る定義データの修正場面を例示する。第３帳票７０は、第１帳票５０及び第２帳票６０と同種の帳票であって、第１帳票５０及び第２帳票６０には存在しない情報が存在する帳票である。具体的には、第１帳票５０及び第２帳票６０には存在しない情報は、第３帳票７０の領域７３と領域７４とに存在する。領域７３には「１２３４５」という文字が記入されている。また、領域７４には、領域６４とは同じ規格で異なる内容を示すバーコードが印字されているとする。例えば、領域７４には、１０桁の数値を示す、ＮＷ−７規格のバーコードが印字されているとする。なお、第３帳票７０の欄７１及び欄７２は、それぞれ、第１帳票５０の欄５１及び欄５２に対応する。 FIG. 10 illustrates a definition scene of definition data according to this embodiment. The third form 70 is the same type of form as the first form 50 and the second form 60, and is a form in which information that does not exist in the first form 50 and the second form 60 exists. Specifically, information that does not exist in the first form 50 and the second form 60 exists in the area 73 and the area 74 of the third form 70. In the area 73, characters “12345” are entered. Further, it is assumed that a bar code indicating the content different from that in the area 64 is printed in the area 74. For example, it is assumed that an NW-7 standard barcode indicating a 10-digit numerical value is printed in the area 74. The columns 71 and 72 of the third form 70 correspond to the columns 51 and 52 of the first form 50, respectively.

図１１は、本実施形態に係る情報処理装置１による第３帳票７０に基づく定義データ修正処理の一例を示す。 FIG. 11 shows an example of definition data correction processing based on the third form 70 by the information processing apparatus 1 according to the present embodiment.

ステップ５００では、取得部３１によって、第３帳票７０のイメージデータが取得される。取得部３１は、ステップ１００と同様の方法で、第３帳票７０のイメージデータを取得する。 In step 500, the image data of the third form 70 is acquired by the acquisition unit 31. The acquisition unit 31 acquires the image data of the third form 70 by the same method as in step 100.

ステップ６００では、定義作成部３３によって、記憶部１１に格納されている定義データのうち、修正対象の定義データが特定される。例えば、定義作成部３３は、修正対象の定義データの指定を受け付けてもよい。この指定により、定義作成部３３は、修正対象の定義データを特定してもよい。また、定義作成部３３は、記憶部１１に格納されている定義データに設定されている帳票のレイアウト情報を参照して、ステップ５００において取得された第３帳票７０と最も合致するレイアウト情報が設定されている定義データが修正対象であると特定してもよい。 In step 600, the definition creation unit 33 identifies the definition data to be corrected among the definition data stored in the storage unit 11. For example, the definition creation unit 33 may accept designation of definition data to be modified. With this designation, the definition creation unit 33 may specify definition data to be modified. The definition creation unit 33 refers to the layout information of the form set in the definition data stored in the storage unit 11, and sets the layout information that most closely matches the third form 70 acquired in step 500. The defined definition data may be specified as a correction target.

ステップ７００では、第１帳票５０と第３帳票７０との差分に基づいて、ステップ６００で特定された定義データが修正される。当該抽出した差分に基づく定義データ修正処理の具体例は、図１２により示される。 In step 700, the definition data specified in step 600 is corrected based on the difference between the first form 50 and the third form 70. A specific example of the definition data correction process based on the extracted difference is shown in FIG.

図１２は、本実施形態に係る情報処理装置１による、抽出した差分に基づく定義データ修正処理を例示するフローチャートである。なお、ステップ７１０〜７７０の処理は、第２帳票６０を第３帳票７０に置き換えることで、上述したステップ２１０〜２７０の処理に対応する。よって、ステップ７１０〜７７０の詳細な説明は省略する。 FIG. 12 is a flowchart illustrating definition data correction processing based on the extracted difference by the information processing apparatus 1 according to this embodiment. Note that the processing in steps 710 to 770 corresponds to the processing in steps 210 to 270 described above by replacing the second form 60 with the third form 70. Therefore, detailed description of steps 710-770 is omitted.

ただし、ステップ７２０、７５０、及び、７７０では、定義データに設定された情報の読み取り位置（座標）、情報の属性（文字種、文字数等）、及び、項目名のパラメタ値が、それぞれ、第３帳票７０から抽出した情報に基づき、必要に応じて、修正される。 However, in steps 720, 750, and 770, the reading position (coordinates) of information set in the definition data, the attribute of information (character type, number of characters, etc.), and the parameter value of the item name are respectively in the third form. Based on the information extracted from 70, it is corrected as necessary.

例えば、情報の読み取り位置（座標）について、定義作成部３３は、第３帳票７０から抽出された領域が第２帳票６０から抽出された領域と重なる場合、第２帳票６０から抽出された領域により設定した情報の読み取り位置を、第２帳票６０及び第３帳票７０から抽出された領域を含む領域の位置に修正する。これにより、図１０に示される例では、例えば、欄５２に係る読み取り領域について、領域６３を示す座標が、領域６３及び領域７３を含む領域を示す座標に修正される。 For example, with respect to the information reading position (coordinates), the definition creating unit 33 uses the area extracted from the second form 60 when the area extracted from the third form 70 overlaps with the area extracted from the second form 60. The set information reading position is corrected to the position of the area including the area extracted from the second form 60 and the third form 70. Accordingly, in the example illustrated in FIG. 10, for example, with respect to the reading region related to the column 52, the coordinates indicating the region 63 are corrected to the coordinates indicating the region including the region 63 and the region 73.

また、例えば、情報の属性（文字種、文字数等）について、定義作成部３３は、第３領域７０から抽出された領域に存在する情報の属性を満たすように、当該属性のパラメタ値を修正する。これにより、図１０に示される例では、例えば、定義作成部３３は、領域６３及び領域７３について、文字種「アルファベット」を「アルファベット及び数字」に、文字数「４」を「５」に修正する。他方、領域７４に存在するバーコードは、１０桁の数値を示す、ＮＷ−７規格のバーコードであるため、定義作成部３３は、領域６４及び７４について、バーコード種別「ＮＷ−７」、文字数「１２」のままで、パラメタ値を修正しなくてよい。なお、定義作成部３３は、当該属性の修正と同様の方法で、項目名についても修正してもよい。 Further, for example, for the information attribute (character type, number of characters, etc.), the definition creating unit 33 corrects the parameter value of the attribute so as to satisfy the attribute of the information existing in the area extracted from the third area 70. Accordingly, in the example illustrated in FIG. 10, for example, the definition creation unit 33 corrects the character type “alphabet” to “alphabet and number” and the number of characters “4” to “5” for the region 63 and the region 73. On the other hand, since the barcode existing in the area 74 is an NW-7 standard barcode indicating a 10-digit numerical value, the definition creating unit 33 creates a barcode type “NW-7” for the areas 64 and 74, It is not necessary to modify the parameter value while maintaining the number of characters “12”. Note that the definition creation unit 33 may also modify the item name in the same manner as the modification of the attribute.

これらの処理により、第１帳票５０と第３帳票７０との差分に基づいて、ステップ６００で特定された定義データが修正される。そして、処理は、ステップ８００に進む。 With these processes, the definition data specified in step 600 is corrected based on the difference between the first form 50 and the third form 70. Then, the process proceeds to Step 800.

図１１に戻り、ステップ８００では、制御部１２によって、手動による定義データの修正が受け付けられる。ステップ８００の処理は、ステップ３００の処理とほぼ同様に説明可能である。よって、ステップ８００の詳細な説明は、省略する。 Returning to FIG. 11, in step 800, manual correction of definition data is accepted by the control unit 12. The process of step 800 can be described in substantially the same manner as the process of step 300. Therefore, detailed description of step 800 is omitted.

ステップ９００では、制御部１２によって、修正した定義データの保存処理が実行される。ステップ９００の処理は、上述したステップ４００の処理とほぼ同様に説明可能である。よって、ステップ９００の詳細な説明は、省略する。これにより、情報処理装置１は、本動作例に係る処理を終了する。 In step 900, the control unit 12 executes a storage process for the corrected definition data. The process of step 900 can be described in substantially the same manner as the process of step 400 described above. Therefore, detailed description of step 900 is omitted. Thereby, the information processing apparatus 1 ends the processing according to this operation example.

１…情報処理装置、２…スキャナ、
１１…記憶部、１２…制御部、１３…バス、１４…入出力部、１５…通信部、
２１…辞書データ、
３１…取得部、３２…抽出部、３３…定義作成部、
５０…第１帳票、５１〜５２…欄、
６０…第２帳票、６１〜６２…欄、６３〜６４…領域、
７０…第３帳票、７１〜７２…欄、７３〜７４…領域、 1 ... information processing device, 2 ... scanner,
DESCRIPTION OF SYMBOLS 11 ... Memory | storage part, 12 ... Control part, 13 ... Bus, 14 ... Input / output part, 15 ... Communication part,
21 ... Dictionary data,
31 ... Acquisition unit, 32 ... Extraction unit, 33 ... Definition creation unit,
50 ... first form 51-52 ... column,
60 ... 2nd form, 61-62 ... column, 63-64 ... area,
70 ... 3rd form, 71 to 72 ... column, 73 to 74 ... area,

Claims

An acquisition unit that acquires image data of a first form and image data of a second form that is a form of the same type as the first form and includes information that does not exist in the first form;
An extraction unit for extracting an area where the information exists in the second form from the difference between the image data of the first form and the image data of the second form;
A definition creating unit that creates definition data used for reading processing of the same form as the first form and the second form, in which the position of the extracted area in the second form is set as an information reading position;
An information processing apparatus comprising:

The definition creation unit uses the attribute of information existing in the extracted area obtained by applying a predetermined recognition process to the extracted area as the attribute related to the extracted area. Set to
The information processing apparatus according to claim 1.

The extraction unit specifies a print area in which characters are printed in the vicinity of the position of the extracted area in the first form or the second form,
The definition creation unit sets a name indicated by characters acquired from the print area in the definition data as an item name of the extracted area.
The information processing apparatus according to claim 1 or 2.

The definition creation unit refers to dictionary data in which names used as item names of areas included in the form are registered, and names indicated by characters acquired from the print area are registered in the referenced dictionary data. If not, the name indicated by the character acquired from the print area is corrected to a name similar to the name indicated by the character acquired from the print area among the registered names, and the extraction is performed. Set in the definition data as the item name of the designated area,
The information processing apparatus according to claim 3.

The acquisition unit further includes image data of a third form that is a form of the same type as the first form and the second form, and is a form in which information that does not exist in the first form and the second form exists. Acquired,
The extraction unit further includes an area in the third form in which information that does not exist in the first form and the second form exists based on a difference between the image data of the first form and the image data of the third form. Extract and
The definition creating unit is set by the area extracted from the image data of the second form when the area extracted from the image data of the third form overlaps with the area extracted from the image data of the second form. The information reading position is corrected to the position of the area including the area extracted from the image data of the second form and the third form.
The information processing apparatus according to any one of claims 1 to 4.

Computer
Obtaining image data of a first form and image data of a second form, which is a form of the same type as the first form and has information that does not exist in the first form;
Extracting a region where the information exists in the second form from the difference between the image data of the first form and the image data of the second form;
Creating definition data used for reading processing of the same form as the first form and the second form, wherein the position of the extracted area in the second form is set as an information reading position;
Information processing method to execute.

On the computer,
Obtaining image data of a first form and image data of a second form, which is a form of the same type as the first form and has information that does not exist in the first form;
Extracting a region where the information exists in the second form from the difference between the image data of the first form and the image data of the second form;
Creating definition data used for reading processing of the same form as the first form and the second form, wherein the position of the extracted area in the second form is set as an information reading position;
A program for running