JP6281739B2

JP6281739B2 - Processing apparatus and program

Info

Publication number: JP6281739B2
Application number: JP2013234876A
Authority: JP
Inventors: 勇作栗原; 政登杉井; 小林　裕次郎; 裕次郎小林; 大悟堀江; 卓史村上; 未希鬼束
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2013-11-13
Filing date: 2013-11-13
Publication date: 2018-02-21
Anticipated expiration: 2033-11-13
Also published as: JP2015095144A

Description

本発明は、処理装置及びプログラムに関する。 The present invention relates to a processing apparatus and a program.

特許文献１は、スキャナを用いて読み取った画像データから検索キー等の属性情報を伴う文字コードを容易確実に生成して登録作業に要する時間の短縮を図る装置として、蓄積された画像データについて、文字認識処理を行う前に、当該画像データの文字領域に属性情報を付与する装置について開示している。具体的には、領域枠形成部により形成された領域枠によって文字領域を指定し、この指定領域の位置情報と属性情報ファイルから任意に選択した属性情報とを対応付けるとともに、この対応情報を対応情報ファイルに格納しておき、上記領域枠にて特定される文字領域に対して文字認識部で文字認識処理を施し、その認識結果と該当する属性情報とを同時に出力し、属性情報を伴う文字コードを得る。 Japanese Patent Laid-Open No. 2004-133867 discloses a device that easily and reliably generates a character code with attribute information such as a search key from image data read using a scanner, and reduces the time required for registration work. An apparatus for providing attribute information to a character area of the image data before performing character recognition processing is disclosed. Specifically, a character area is designated by the area frame formed by the area frame forming unit, the position information of the designated area is associated with attribute information arbitrarily selected from the attribute information file, and the correspondence information is associated with the correspondence information. A character code that is stored in a file, subjected to character recognition processing by the character recognition unit for the character region specified in the region frame, and outputs the recognition result and the corresponding attribute information at the same time. Get.

特許文献２は、画像文書をスキャナ入力し、文書管理システムへ画像とインデックスを保存する画像入力装置、フォーム認識、ＯＣＲ等、ユーザインタフェース等の構成により、原稿の正当性を評価した上で、文書管理システムへ画像文書とインデックスを入力する画像入力装置について開示している。 Japanese Patent Application Laid-Open No. 2005-228867 evaluates the validity of a manuscript with the configuration of an image input device that scans an image document and stores the image and index in a document management system, a form recognition, an OCR, and a user interface. An image input apparatus for inputting an image document and an index to a management system is disclosed.

特許文献３は、次のようにして、登録する書類のメタデータを生成することができる電子文書登録システムについて開示している。特許文献３で開示されたシステムでは、書類を作成するときには電子フォームが利用され、電子フォームにデータが入力されることで、書類の電子データが作成され、データが入力された電子フォームを印刷することで、書類が作成される。また、印刷された書類には、電子フォームの一部のデータがエンコードされた２次元バーコードが付与され、書類の画像データを電子文書として登録するときは、書類の画像データに含まれる２次元バーコードの画像をデコードすることで得られるデータをメタデータとして利用する構成となっている。 Patent Document 3 discloses an electronic document registration system capable of generating metadata of a document to be registered as follows. In the system disclosed in Patent Document 3, an electronic form is used when a document is created, and data is input to the electronic form, thereby creating electronic data of the document and printing the electronic form in which the data is input. A document is created. The printed document is given a two-dimensional barcode encoded with a part of the data of the electronic form. When registering the image data of the document as an electronic document, the two-dimensional barcode included in the image data of the document is used. Data obtained by decoding a barcode image is used as metadata.

特許文献４は、電子フォームと２次元バーコードを利用して履歴書のデータの一元管理を可能とする電子履歴書システムとして次のことを開示している。端末装置は履歴書フォームに従って電子履歴書作成を行い、履歴書印刷時、サーバは、端末装置からの履歴書データの２次元バーコード化を行い、印刷用の２次元バーコード付き電子履歴書データを送信し、端末装置は２次元バーコード付き履歴書を印刷する。人事情報システムは、応募者から送られた２次元バーコード付き履歴書をスキャナで読み取り、２次元バーコードを履歴書データに復元し、履歴書データ、写真イメージデータ、捺印イメージデータ等夫々に、共通の受付番号を付与し登録データベースに登録する。 Patent Document 4 discloses the following as an electronic resume system that enables unified management of resume data using an electronic form and a two-dimensional barcode. The terminal device creates an electronic resume in accordance with the resume form, and when printing the resume, the server converts the resume data from the terminal device into a two-dimensional barcode and prints the electronic resume data with the two-dimensional barcode for printing. The terminal device prints a resume with a two-dimensional barcode. The personnel information system reads a resume with a two-dimensional barcode sent from the applicant with a scanner, restores the two-dimensional barcode into resume data, and each of the resume data, photo image data, seal image data, A common receipt number is assigned and registered in the registration database.

特開平８−０８３２８５号公報Japanese Patent Laid-Open No. 8-083285 特開２００８−２９３３５３号公報JP 2008-293353 A 特開２００８−０９７０６６号公報JP 2008-097066 A 特開２００６−１６３６７７号公報JP 2006-163677 A

本発明の目的は、印刷処理が行われる記録媒体の特定の領域に記入された内容情報について、印刷処理後に領域指定をうけることなく抽出することができる処理装置及びプログラムを提供することである。 An object of the present invention is to provide a processing apparatus and a program capable of extracting content information entered in a specific area of a recording medium on which a printing process is performed without receiving an area designation after the printing process.

請求項１に係る本発明は、位置及び範囲を表す情報が付加された第１の画像データを取得する第１の取得手段と、前記第１の画像データに対応する画像が印刷された記録媒体を読み取った第２の画像データを取得する第２の取得手段と、前記位置及び範囲から特定される領域に対応する前記第２の画像データの領域から前記記録媒体に記入された内容情報を認識する認識手段とを有する処理装置である。 According to a first aspect of the present invention, there is provided a first acquisition means for acquiring first image data to which information representing a position and a range is added, and a recording medium on which an image corresponding to the first image data is printed A second acquisition means for acquiring the second image data obtained by reading the image, and recognizing the content information entered in the recording medium from the area of the second image data corresponding to the area specified from the position and range And a recognition device.

請求項２に係る本発明は、前記第１の取得手段は、前記位置及び範囲を表す情報に加え、記入されるデータの型に関する情報が付加された第１の画像データを取得し、前記認識手段は、前記データの型に関する情報を参照して、前記内容情報を認識する請求項１に記載の処理装置である。 According to a second aspect of the present invention, the first acquisition means acquires first image data to which information on the type of data to be entered is added in addition to the information indicating the position and range, and the recognition The processing device according to claim 1, wherein the means recognizes the content information with reference to information on the data type.

請求項３に係る本発明は、前記位置及び範囲を表す情報が印刷された記録媒体を読み取る読取手段を更に有し、前記第２の取得手段は、前記記録媒体に印刷された当該位置及び範囲を表す情報を取得する請求項１又は２記載の処理装置である。 The present invention according to claim 3 further includes reading means for reading a recording medium on which information representing the position and range is printed, and the second acquisition means includes the position and range printed on the recording medium. The processing apparatus according to claim 1, wherein information representing the above is acquired.

請求項４に係る本発明は、前記位置及び範囲を表す情報が記憶された記憶先情報が印刷された記録媒体を読み取る読取手段を更に有し、前記第２の取得手段は、前記記録媒体に印刷された前記記憶先情報を取得する請求項１又は２記載の処理装置である。 The present invention according to claim 4 further includes a reading unit that reads a recording medium on which storage destination information in which the information indicating the position and the range is stored is printed, and the second acquisition unit includes the recording medium. The processing apparatus according to claim 1, wherein the storage destination information printed is acquired.

請求項５に係る本発明は、前記認識手段は、前記領域のうち、前記第１の画像データにおいて内容情報が記入されていない領域に対して認識する請求項１乃至４いずれか記載の処理装置である。 According to a fifth aspect of the present invention, in the processing apparatus according to any one of the first to fourth aspects, the recognizing unit recognizes a region of the first image data in which content information is not entered. It is.

請求項６に係る本発明は、前記認識手段により認識された内容情報を前記第１の画像データに付加した第３の画像データへと更新する更新手段をさらに有する請求項１乃至５いずれか記載の処理装置である。 The present invention according to claim 6 further comprises update means for updating content information recognized by the recognition means to third image data added to the first image data. It is a processing device.

請求項７に係る本発明は、前記更新手段は、前記認識手段により前記内容情報を認識できない領域が存在するときは、当該領域に対応する前記第２の画像データを前記第１の画像データに付加する請求項６記載の処理装置である。 According to a seventh aspect of the present invention, when there is a region where the content information cannot be recognized by the recognizing unit, the updating unit converts the second image data corresponding to the region into the first image data. The processing apparatus according to claim 6 to be added.

請求項８に係る本発明は、複数の画像データ間の差分を抽出し、該差分に関する情報から位置及び範囲を抽出する抽出手段を更に有し、前記認識手段は、前記抽出手段によって抽出された位置及び範囲に対応する領域に記入された内容情報を認識する請求項１乃至７いずれか記載の処理装置である。 The present invention according to claim 8 further includes extraction means for extracting a difference between a plurality of image data and extracting a position and a range from information regarding the difference, wherein the recognition means is extracted by the extraction means. The processing apparatus according to claim 1, wherein content information entered in an area corresponding to a position and a range is recognized.

請求項９に係る本発明は、位置及び範囲を表す情報が付加された第１の画像データを取得ステップと、前記第１の画像データが印刷された記録媒体を読み取った第２の画像データを取得するステップと、前記位置及び範囲から特定される領域において前記記録媒体に記入された内容情報を前記第２の画像データから認識するステップとをコンピュータに実行させるプログラムである。 According to a ninth aspect of the present invention, there is provided an acquisition step of first image data to which information indicating a position and a range is added, and second image data obtained by reading a recording medium on which the first image data is printed. A program for causing a computer to execute an acquiring step and a step of recognizing content information entered on the recording medium from the second image data in an area specified by the position and range.

請求項１に係る本発明によれば、印刷処理が行われる記録媒体の特定の領域に記入された内容情報について、印刷処理後に領域指定をうけることなく抽出することができる。 According to the first aspect of the present invention, the content information entered in a specific area of the recording medium on which the printing process is performed can be extracted without receiving an area designation after the printing process.

請求項２に係る本発明によれば、データの型に関する情報を参照しないで内容情報を抽出する場合に比較して、内容情報の抽出精度を高くすることができる。 According to the second aspect of the present invention, it is possible to increase the accuracy of extracting the content information as compared with the case where the content information is extracted without referring to the information regarding the data type.

請求項３に係る本発明によれば、記録媒体から内容情報を抽出する位置及び範囲を表す情報を取得することができる。 According to the third aspect of the present invention, it is possible to acquire information representing the position and range from which the content information is extracted from the recording medium.

請求項４に係る本発明によれば、内容情報を抽出する位置及び範囲を表す情報の記憶先を特定することができる。 According to the fourth aspect of the present invention, it is possible to specify the storage destination of information representing the position and range from which the content information is extracted.

請求項５に係る本発明によれば、全ての領域に対して抽出を行う場合に比較して、抽出する領域を少なくすることができる。 According to the fifth aspect of the present invention, it is possible to reduce the number of areas to be extracted as compared with the case where extraction is performed for all areas.

請求項６に係る本発明によれば、追記された内容を反映した画像データを生成することができる。 According to the sixth aspect of the present invention, it is possible to generate image data reflecting the added content.

請求項７に係る本発明によれば、抽出できない場合であっても、内容情報を反映した画像データを生成することができる。 According to the seventh aspect of the present invention, image data reflecting content information can be generated even when extraction is impossible.

請求項８に係る本発明によれば、位置及び範囲を表す情報を備えていない記録媒体においても、特定の領域に記入された内容情報について、領域指定をうけることなく抽出することができる。 According to the eighth aspect of the present invention, content information entered in a specific area can be extracted without receiving any area designation even on a recording medium that does not have information indicating the position and range.

請求項９に係る本発明によれば、印刷処理が行われる記録媒体の特定の領域に記入された内容情報について、印刷処理後に領域指定をうけることなく抽出することができる。 According to the ninth aspect of the present invention, the content information entered in a specific area of the recording medium on which the printing process is performed can be extracted without receiving the area designation after the printing process.

本発明の実施形態に係る文書処理システム２を示す模式図である。It is a mimetic diagram showing document processing system 2 concerning an embodiment of the present invention. 画像形成装置４を示す断面図である。2 is a cross-sectional view showing an image forming apparatus 4. 画像形成装置４のハードウェア構成を示すブロック図である。2 is a block diagram illustrating a hardware configuration of an image forming apparatus 4. FIG. プログラムが実行されることにより実現される画像形成装置４の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the image forming apparatus 4 implement | achieved by running a program. 文書処理装置としての画像形成装置４において、フォームフィールド付きの電子文書を印刷するまでの動作の一例を示すフローチャートである。6 is a flowchart illustrating an example of an operation until an electronic document with a form field is printed in the image forming apparatus 4 as a document processing apparatus. 第１の取得部７０が取得するフォームフィールド付きの電子文書の一例を示す模式図である。It is a schematic diagram which shows an example of the electronic document with a form field which the 1st acquisition part 70 acquires. 記憶先の場所を示す情報をともなって電子文書が印刷された記録媒体の一例を示す模式図である。It is a schematic diagram which shows an example of the recording medium with which the electronic document was printed with the information which shows the place of a memory | storage destination. 文書処理装置としての画像形成装置４において、フォームフィールド付きの電子文書を印刷後、印刷がなされた記録媒体からメタ情報を抽出する場合の動作の一例を示すフローチャートである。6 is a flowchart illustrating an example of an operation when extracting meta information from a printed recording medium after printing an electronic document with a form field in the image forming apparatus 4 as a document processing apparatus. 図６で示した電子文書を記録媒体に印刷後に、この記録媒体に追記がなされた場合の記録媒体の読取画像について一例を示した模式図である。FIG. 7 is a schematic diagram illustrating an example of a read image on a recording medium when the electronic document illustrated in FIG. 6 is printed on the recording medium and additional recording is performed on the recording medium. 図６及び図９で示した電子文書及び読取画像から抽出されるメタ情報について示した模式図であり、（ａ）は、電子文書から抽出されたメタ情報を示し、（ｂ）は、読取画像から抽出されたメタ情報を示す。FIGS. 10A and 10B are schematic diagrams showing meta information extracted from the electronic document and the read image shown in FIGS. 6 and 9, wherein FIG. 10A shows meta information extracted from the electronic document, and FIG. Meta information extracted from プログラムが実行されることにより実現される第二の実施形態に係る画像形成装置４の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the image forming apparatus 4 which concerns on 2nd embodiment implement | achieved by running a program. 第３の取得部１１０が取得する複数の読取画像について示す模式図である。It is a schematic diagram shown about the some read image which the 3rd acquisition part 110 acquires. 図１２に示した複数の読取画像において共通する記載を示した模式図である。It is the schematic diagram which showed the description common in the some read image shown in FIG. 記入欄情報生成部１１６により生成されたフォームフィールド情報の一例について説明する模式図である。It is a schematic diagram explaining an example of the form field information produced | generated by the entry column information production | generation part. 第二の実施形態において、フォームフィールド情報を生成する際の動作の一例について示すフローチャートである。It is a flowchart shown about an example of the operation | movement at the time of producing | generating form field information in 2nd embodiment.

以下、本発明の第一の実施形態について図面を参照して詳細に説明する。
図１は、本発明の実施形態に係る文書処理システム２を示す模式図である。図１に示すように、文書処理システム２は、文書処理装置として機能する画像形成装置４と、端末装置６とから構成されており、画像形成装置４と、端末装置６とは、ＬＡＮ，ＷＡＮ，インターネット等により構成されるネットワーク８により接続されている。 Hereinafter, a first embodiment of the present invention will be described in detail with reference to the drawings.
FIG. 1 is a schematic diagram showing a document processing system 2 according to an embodiment of the present invention. As shown in FIG. 1, the document processing system 2 includes an image forming apparatus 4 that functions as a document processing apparatus and a terminal device 6, and the image forming apparatus 4 and the terminal device 6 are LAN and WAN. Are connected by a network 8 constituted by the Internet or the like.

端末装置６は、例えば、パーソナルコンピュータとして構成されており、アプリケーションソフトを実行することにより画像データの作成を行う。なお、端末装置６により作成される画像データには、電子文書（文書データ）も含まれる。また、端末装置６は、画像データを文書処理装置としての画像形成装置４へ送信する。 The terminal device 6 is configured as a personal computer, for example, and creates image data by executing application software. The image data created by the terminal device 6 includes an electronic document (document data). Further, the terminal device 6 transmits the image data to the image forming device 4 as a document processing device.

本実施形態において、端末装置６が生成する画像データには、少なくとも、記入欄の位置（例えば、記入欄の座標）及び範囲（例えば、記入欄の縦の長さ及び横の長さ）を表す情報が付加されている。そのような画像データの例として、端末装置６は、例えば、フォームフィールド付きの電子文書を作成する。フォームフィールドとは、文書内に設けられた記入欄の文書内における位置及び範囲と、この記入欄に記入されるデータの型とを表す情報のことである。なお、記入欄は、記入がなされる領域を有していればよく、必ずしも罫線などによる枠を備えていなくてもよい。データの型とは、記入欄に記入された内容情報を認識する際に、参照することによって内容情報の認識精度を向上させるために用いるために予めグルーピング（分類）したデータの種類のことであり、例えば、「文字」、「英文字」、「数字」などがある。記入欄に記入されるデータの型とは、予め定められたデータの種類のうちいずれのデータの種類が記入されるかを示している。予め定められたデータの種類としては、「文字」、「英文字」、「数字」などに限らず、「チェックボックス」、「ラジオボタン」、「数式」、「金額」、「日付」などであってもよいし、「画像」であってもよい。 In the present embodiment, the image data generated by the terminal device 6 represents at least the position of the entry field (for example, the coordinates of the entry field) and the range (for example, the vertical length and the horizontal length of the entry field). Information is added. As an example of such image data, the terminal device 6 creates an electronic document with a form field, for example. The form field is information indicating the position and range of the entry field provided in the document and the type of data entered in the entry field. Note that the entry field only needs to have an area where entry is made, and does not necessarily have a frame made of ruled lines or the like. Data type refers to the type of data that has been grouped (classified) in advance to be used to improve the recognition accuracy of the content information by referring to it when recognizing the content information entered in the entry field. For example, there are “character”, “English character”, “number”, and the like. The type of data entered in the entry field indicates which type of data is entered among predetermined data types. The predetermined data types are not limited to “letters”, “English letters”, “numbers”, etc., but also “check boxes”, “radio buttons”, “formulas”, “money amounts”, “dates”, etc. It may be “image”.

例えば、ＰＤＦ（ＰｏｒｔａｂｌｅＤｏｃｕｍｅｎｔＦｏｒｍａｔ）形式、ＤｏｃｕＷｏｒｋｓ（登録商標）形式等の文書データにおいてはフォームフィールドを付加することが可能な文書データ形式であり、このような文書データ形式では、フォームフィールとして、テキストフィールド、チェックボックス、ラジオボタン、リストボックス、コンボボックスなどが用意されている。 For example, a document data format such as a PDF (Portable Document Format) format or a DocumentWorks (registered trademark) format is a document data format to which a form field can be added. In such a document data format, a text is used as a form field. Fields, check boxes, radio buttons, list boxes, combo boxes, etc. are available.

なお、テキストフィールドは、文字を記入する記入欄であり、チェックボックスは、複数の選択肢のうち該当する選択肢を指定する記入欄であり、ラジオボタンは、複数の選択肢のうち該当する選択肢をいずれか１つ指定する記入欄であり、リストボックスは、予め定められた選択肢をリスト表示していずれかを選択するよう指定する記入欄であり、コンボボックスは、予め定められた選択肢をリスト表示していずれかを選択するよう指定するか又は文字を記入することが可能な記入欄である。 Note that the text field is an entry field for entering characters, the check box is an entry field for designating a corresponding option among a plurality of options, and the radio button is one of a plurality of options. It is an entry field for specifying one, the list box is an entry field for designating a list of predetermined options and selecting one of them, and the combo box is for displaying a list of predetermined options. This is an entry field in which either one of them can be designated or a character can be entered.

このように、端末装置６から画像形成装置４に対して、例えば、画像データとして、ＰＤＦ形式、ＤｏｃｕＷｏｒｋｓ形式等のフォームフィールド付きの文書データを送信する。画像形成装置４では、受信した画像データを印刷し、又は記憶する。このように、フォームフィールド付きの文書データを画像形成装置４で印刷する場合、例えばプリンタドライバにより印刷用のデータに変換するのではなく、フォームフィールド付きの文書データを画像形成装置４に直接送信して印刷を行う（ダイレクトプリントを行う）。 Thus, for example, document data with a form field such as a PDF format or a DocuWorks format is transmitted as image data from the terminal device 6 to the image forming device 4. The image forming apparatus 4 prints or stores the received image data. As described above, when document data with form fields is printed by the image forming apparatus 4, for example, document data with form fields is directly transmitted to the image forming apparatus 4 instead of being converted into data for printing by a printer driver. Print (direct print).

図２は、画像形成装置４を示す断面図である。画像形成装置４は、ＵＩ装置１０、印刷装置１２、読取装置１４及び通信装置１６を有している。 FIG. 2 is a cross-sectional view showing the image forming apparatus 4. The image forming apparatus 4 includes a UI device 10, a printing device 12, a reading device 14, and a communication device 16.

ＵＩ装置１０は、例えばタッチパネルなどとして構成され、情報を表示する表示装置としての機能、及び、操作者によってなされる入力を受付ける入力受付装置としての機能を備える。 The UI device 10 is configured as a touch panel, for example, and includes a function as a display device that displays information and a function as an input reception device that receives an input made by an operator.

印刷装置１２は、印刷を行う装置である。印刷装置１２は、例えば３段の記録媒体供給カセット１８を有し、これら記録媒体供給カセット１８のそれぞれには供給ヘッド２０が設けられている。 The printing device 12 is a device that performs printing. The printing apparatus 12 includes, for example, three stages of recording medium supply cassettes 18, and each of the recording medium supply cassettes 18 is provided with a supply head 20.

記録媒体供給カセット１８の一つが選択されると、供給ヘッド２０が作動して選択された記録媒体供給カセット１８から記録媒体供給路２２を介して画像形成機構部２４に供給される。 When one of the recording medium supply cassettes 18 is selected, the supply head 20 is actuated and supplied from the selected recording medium supply cassette 18 to the image forming mechanism 24 via the recording medium supply path 22.

画像形成機構部２４は、イエロー、マゼンタ、シアン及びブラックの各感光体２６が併設されていると共に、中間転写ベルト２８が設けられている。 The image forming mechanism 24 is provided with yellow, magenta, cyan, and black photoconductors 26 and an intermediate transfer belt 28.

各感光体２６の周囲には、帯電装置、露光装置、現像装置、一次転写装置及びクリーニング装置など（図示せず）が配置され、各感光体２６に形成されたトナー像が中間転写ベルト２８に転写される。白黒設定された場合は、ブラックのみが作動可能であるようにされる。 Around each photoconductor 26, a charging device, an exposure device, a developing device, a primary transfer device, a cleaning device, and the like (not shown) are arranged, and a toner image formed on each photoconductor 26 is transferred to the intermediate transfer belt 28. Transcribed. When black and white is set, only black is enabled.

中間転写ベルト２８のトナー像は、二次転写ロール３０により、送られてきた記録媒体に転写され、定着装置３２により定着され、このトナー像が定着された記録媒体が記録媒体排出路３４を通って排出部３６に排出される。 The toner image on the intermediate transfer belt 28 is transferred to the sent recording medium by the secondary transfer roll 30 and fixed by the fixing device 32, and the recording medium on which the toner image is fixed passes through the recording medium discharge path 34. And discharged to the discharge unit 36.

ただし、両面印刷が設定された場合は、定着装置３２により表面が定着された記録媒体は、記録媒体排出路３４から反転装置３８に送られ、この反転装置３８で反転され、記録媒体反転路４０に送られ、再び記録媒体供給路２２に戻され、画像形成機構部２４に送られて裏面の印刷がなされる。 However, when duplex printing is set, the recording medium whose surface is fixed by the fixing device 32 is sent from the recording medium discharge path 34 to the reversing device 38, reversed by the reversing device 38, and recorded on the recording medium reversing path 40. To the recording medium supply path 22 again and sent to the image forming mechanism 24 to print the back side.

読取装置１４は、両面原稿の読み取りが可能な自動原稿送り装置４２を有し、この自動原稿送り装置４２により原稿はプラテン４４に送られ、このプラテン４４上でＣＣＤ等からなる読取部４６により原稿の画像が読み取られる。 The reading device 14 has an automatic document feeder 42 capable of reading a double-sided document. The automatic document feeder 42 feeds the document to a platen 44, and the document is scanned on the platen 44 by a reading unit 46 composed of a CCD or the like. Images are read.

自動原稿送り装置４２に原稿がセットされたか否かを検出する原稿セット検出器４８が設けられている。また、自動原稿送り装置４２はプラテンカバーを兼ねており、このプラテンカバーを開けることにより原稿をプラテン４４上に置くことができる。このプラテンカバーの開閉は、プラテンカバー開閉検出器５０により検出できるようになっている。 A document set detector 48 for detecting whether or not a document is set on the automatic document feeder 42 is provided. The automatic document feeder 42 also serves as a platen cover, and the document can be placed on the platen 44 by opening the platen cover. The opening / closing of the platen cover can be detected by a platen cover opening / closing detector 50.

通信装置１６は、ネットワーク８を介して端末装置６などの外部の装置と通信を行う装置であり、例えば、データ回線終端装置が該当する。画像形成装置４は、例えば、通信装置１６を用いることで、端末装置６から送信された文書データを取得する。 The communication device 16 is a device that communicates with an external device such as the terminal device 6 via the network 8, and corresponds to, for example, a data line termination device. The image forming apparatus 4 acquires the document data transmitted from the terminal device 6 by using, for example, the communication device 16.

図３は、画像形成装置４のハードウェア構成を示すブロック図である。 FIG. 3 is a block diagram illustrating a hardware configuration of the image forming apparatus 4.

図３に示すように、画像形成装置４は、上述の印刷装置１２、読取装置１４、ＵＩ装置１０、及び通信装置１６のほか、ＣＰＵ６０、メモリ６２及び記憶装置６４がバス接続された構成となっている。 As shown in FIG. 3, the image forming apparatus 4 has a configuration in which a CPU 60, a memory 62, and a storage device 64 are connected by a bus in addition to the above-described printing device 12, reading device 14, UI device 10, and communication device 16. ing.

このように、画像形成装置４は、端末装置６及び他の装置との通信が可能なコンピュータとしての構成部分を有している。 As described above, the image forming apparatus 4 includes a component as a computer capable of communicating with the terminal device 6 and other devices.

ＣＰＵ６０は、メモリ６２又は記憶装置６４に書き込まれたプログラムを実行することにより、画像形成装置４の動作を制御する。また、ＵＩ装置１０を介して受け付けられた入力は、ＣＰＵ６０に伝達され、ＣＰＵ６０からの表示情報がＵＩ装置１０に伝達される。 The CPU 60 controls the operation of the image forming apparatus 4 by executing a program written in the memory 62 or the storage device 64. In addition, an input received via the UI device 10 is transmitted to the CPU 60, and display information from the CPU 60 is transmitted to the UI device 10.

なお、ＣＰＵ６０は、図示しないＣＤ−ＲＯＭなどの可搬型の記憶媒体に記憶されたプログラムを実行してもよいし、又は通信装置１６を通じて提供されるプログラムを実行してもよい。 The CPU 60 may execute a program stored in a portable storage medium such as a CD-ROM (not shown), or may execute a program provided through the communication device 16.

記憶装置６４は、例えばハードディスクなどに、データを書き込み及び読み出し可能に記憶する。 The storage device 64 stores data in a hard disk or the like so as to be able to write and read data.

次に、画像形成装置４の文書処理装置としての機能構成について説明する。
図４は、プログラムが実行されることにより実現される画像形成装置４の機能構成を示すブロック図である。 Next, a functional configuration of the image forming apparatus 4 as a document processing apparatus will be described.
FIG. 4 is a block diagram illustrating a functional configuration of the image forming apparatus 4 realized by executing the program.

本実施形態の画像形成装置４は、図４に示されるように、第１の取得部７０と、記憶制御部７２と、印刷制御部７４と、第２の取得部７６と、コード検出部７８と、認識部８０と、解析部８２と、内容情報抽出部８４と、文書更新部８６とを備えている。このような構成により、例えばフォームフィールド付きの電子文書を用紙などの記録媒体に印刷し、印刷されたこの記録媒体に対し手書きによる記入や印字による記入などの印刷後の追記がなされた場合に、この印刷後の追記がなされた記録媒体を読取り、印刷後の記載内容を認識する。また、さらに、印刷前に既に電子文書内にデータとして記載されている内容を表す内容情報（メタ情報）と、印刷後に追記がなされた内容を表す内容情報（メタ情報）とを抽出する。以下、「内容情報」について「メタ情報」と呼ぶ場合がある。 As shown in FIG. 4, the image forming apparatus 4 of the present embodiment includes a first acquisition unit 70, a storage control unit 72, a print control unit 74, a second acquisition unit 76, and a code detection unit 78. A recognition unit 80, an analysis unit 82, a content information extraction unit 84, and a document update unit 86. With such a configuration, for example, when an electronic document with a form field is printed on a recording medium such as paper, and after the printing is added to the printed recording medium, such as handwritten entry or entry by printing, The recording medium on which the post-printing has been added is read, and the description content after the printing is recognized. Further, content information (meta information) representing content already described as data in the electronic document before printing and content information (meta information) representing content added after printing are extracted. Hereinafter, “content information” may be referred to as “meta information”.

第１の取得部７０は、記入欄の位置及び範囲を表す情報が付加された画像データを取得する。本実施形態では、第１の取得部７０は、フォームフィールド付きの電子文書を取得する。本実施形態では、端末装置６は、画像形成装置４において印刷出力するフォームフィールド付きの電子文書を画像形成装置４に送信し、画像形成装置４の第１の取得部７０がこれを取得する。また、第１の取得部７０は、後述するように、記憶装置６４に記憶された画像データ（電子文書）を取得する。 The first acquisition unit 70 acquires image data to which information indicating the position and range of the entry field is added. In the present embodiment, the first acquisition unit 70 acquires an electronic document with a form field. In the present embodiment, the terminal device 6 transmits an electronic document with a form field to be printed out by the image forming apparatus 4 to the image forming apparatus 4, and the first acquisition unit 70 of the image forming apparatus 4 acquires the electronic document. The first acquisition unit 70 acquires image data (electronic document) stored in the storage device 64 as will be described later.

記憶制御部７２は、第１の取得部７０が取得した画像データを記憶装置６４に記憶するよう制御する。本実施形態では、印刷出力するために端末装置６から取得したフォームフィールド付きの電子文書を記憶装置６４に記憶し、記憶先の場所を印刷制御部７４に通知する。なお、本実施形態では、画像形成装置４に設けられた記憶装置６４に記憶する構成について説明したが、画像形成装置４とは異なる装置に、例えばネットワーク８を介して記憶されてもよい。 The storage control unit 72 performs control so that the image data acquired by the first acquisition unit 70 is stored in the storage device 64. In the present embodiment, an electronic document with a form field acquired from the terminal device 6 for printing out is stored in the storage device 64 and the storage location is notified to the print control unit 74. In the present embodiment, the configuration of storing in the storage device 64 provided in the image forming apparatus 4 has been described. However, the configuration may be stored in a device different from the image forming apparatus 4 via, for example, the network 8.

印刷制御部７４は、第１の取得部７０が取得した印刷対象の画像データを印刷するよう制御する。この際、印刷制御部７４は、記憶制御部７２から通知された記憶先の場所を示す情報を電子文書とともに印刷する。本実施形態では、印刷制御部７４は、第１の取得部７０が取得した印刷対象のフォームフィールド付きの電子文書に、記憶先の場所を示す情報をコード化した２次元コードなどの光学コード画像を付加し、この光学コード画像が付加された電子文書を印刷するよう制御する。 The print control unit 74 controls to print the image data to be printed acquired by the first acquisition unit 70. At this time, the print control unit 74 prints information indicating the storage location notified from the storage control unit 72 together with the electronic document. In the present embodiment, the print control unit 74 is an optical code image such as a two-dimensional code in which information indicating a storage location is encoded in an electronic document with a form field to be printed acquired by the first acquisition unit 70. And control to print the electronic document to which the optical code image is added.

以上の構成により、フォームフィールド付きの電子文書が記録媒体に印刷される。その後、印刷された記録媒体に対して、上述の通り追記がなされることとなる。以下、印刷された記録媒体に追記がなされた記録媒体から、追記内容を認識し、さらに、内容情報（メタ情報）を抽出する構成について説明する。 With the above configuration, an electronic document with a form field is printed on a recording medium. Thereafter, additional recording is performed on the printed recording medium as described above. Hereinafter, a configuration for recognizing additional contents from a recording medium that has been additionally recorded on a printed recording medium and extracting content information (meta information) will be described.

第２の取得部７６は、第１の取得部７０が取得した画像データが印刷された記録媒体を読み取った画像を取得する。本実施形態では、記憶制御部７２から通知された記憶先の場所を示す情報とともに電子文書が印刷された記録媒体を読み取った画像を取得する。また、本実施形態では、第２の取得部７６は読取装置１４により記録媒体を読み取って得られた画像を取得する。なお、第２の取得部７６が取得する画像には、追記がなされているものとして説明するが、必ずしも追記がなされていなくてもよい。 The second acquisition unit 76 acquires an image obtained by reading a recording medium on which the image data acquired by the first acquisition unit 70 is printed. In the present embodiment, an image obtained by reading a recording medium on which an electronic document is printed is acquired together with information indicating a storage location notified from the storage control unit 72. In the present embodiment, the second acquisition unit 76 acquires an image obtained by reading the recording medium with the reading device 14. Note that although the image acquired by the second acquisition unit 76 is described as having been additionally written, it is not necessarily required to be additionally written.

コード検出部７８は、第２の取得部７６が取得した画像から光学コード画像を検出する。また、第１の取得部７０は、当該読取画像の元となった画像データの記憶先の場所を示す情報をコード検出部７８が検出した光学コード画像から解読し、元となった画像データを記憶先から取得する。 The code detection unit 78 detects an optical code image from the image acquired by the second acquisition unit 76. Further, the first acquisition unit 70 decodes information indicating the storage location of the image data that is the basis of the read image from the optical code image detected by the code detection unit 78, and obtains the original image data. Obtain from the storage location.

認識部８０は、第２の取得部７６が取得した画像に対しＯＣＲ（ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅｃｏｇｎｉｔｉｏｎ）などの認識処理を行い、記入欄に追記された内容を認識する。ここで、認識部８０は、第２の取得部７６により取得された画像に対して、第１の取得部７０が取得した画像データに付加された情報に基づいて、印刷後に記入欄に記入された内容の認識処理を行う。本実施形態では、認識部８０は、第２の取得部７６により取得された画像に対して、第１の取得部７０が取得した電子文書に付加されたフォームフィールド情報から得られる記入欄の位置及び範囲と、この記入欄に記入されるデータの型とに基づいて、印刷後に記入欄に記入された内容の認識処理を行う。 The recognition unit 80 performs recognition processing such as OCR (Optical Character Recognition) on the image acquired by the second acquisition unit 76, and recognizes the content added in the entry column. Here, the recognizing unit 80 fills the image acquired by the second acquiring unit 76 in the entry column after printing based on the information added to the image data acquired by the first acquiring unit 70. Recognize the contents. In the present embodiment, the recognition unit 80 positions the entry field obtained from the form field information added to the electronic document acquired by the first acquisition unit 70 with respect to the image acquired by the second acquisition unit 76. Based on the range and the type of data entered in this entry field, the contents entered in the entry field are recognized after printing.

具体的には、認識部８０は、第２の取得部７６が取得した画像内において、フォームフィールド情報に基づいて、記入欄の位置及びその領域を特定し、特定した部分について認識処理を行う。また、認識にあたっては、フォームフィールド情報に基づいて、認識対象のデータの型を限定する。例えば、認識対象の記入欄に記入されるデータの型が数字であることがフォームフィールド情報から分かる場合、認識部８０は、認識対象が数字であることを前提に、数字のパターン画像と比較することにより認識を行う。 Specifically, the recognizing unit 80 specifies the position of the entry field and its region based on the form field information in the image acquired by the second acquiring unit 76, and performs recognition processing on the specified portion. In recognition, the type of data to be recognized is limited based on the form field information. For example, if the form field information indicates that the type of data to be entered in the entry field of the recognition target is a number, the recognition unit 80 compares it with a number pattern image on the assumption that the recognition target is a number. Recognition.

認識部８０は、さらに以下の解析部８２による解析結果を用いて認識処理を行ってもよい。
解析部８２は、第１の取得部７０により取得された電子文書に設けられた記入欄へ電子文書上で記入された内容を解析する。なお、記入された内容の解析には、そもそも記入欄に電子文書上の記入がなされているか（記入済みの記入欄であるか）、または記入がないか（未記入の記入欄であるか）についての解析を含む。このように、解析部８２は、読取画像から記入欄の内容を解析するのではなく、電子文書から電子文書内に記載されている内容を解析する。例えば、フォームフィールドにより指定されているある記入欄において、文字列が電子的に記入されていた場合、解析部８２は、この文字列を電子文書から抽出する。 The recognition unit 80 may further perform a recognition process using an analysis result obtained by the following analysis unit 82.
The analysis unit 82 analyzes the content entered on the electronic document in the entry field provided in the electronic document acquired by the first acquisition unit 70. In addition, in the analysis of the contents entered, whether or not there is an entry on the electronic document in the entry field (whether it is an entry field that has been filled in) or not (whether it is an entry field that has not been filled in) Including analysis of. Thus, the analysis unit 82 does not analyze the contents of the entry field from the read image, but analyzes the contents described in the electronic document from the electronic document. For example, when a character string is electronically entered in a certain entry field designated by the form field, the analysis unit 82 extracts this character string from the electronic document.

これにより、認識部８０は、第１の取得部７０により取得された文書に設けられた記入欄のうち、解析部８２により未記入の記入欄と解析された記入欄について、第２の取得部７６により取得された画像に対する認識処理を行う。このように、認識部８０は、文書内の全ての記入欄を認識対象とするのではなく、電子文書において未記入の記入欄、すなわち、印刷後に追記されると想定される記入欄に限定して、認識処理を行う。 Thereby, the recognition unit 80 uses the second acquisition unit for the entry field that has not been entered by the analysis unit 82 and the entry field that has been analyzed by the analysis unit 82 among the entry fields provided in the document acquired by the first acquisition unit 70. A recognition process is performed on the image acquired in 76. As described above, the recognition unit 80 does not recognize all the entry fields in the document as a recognition target, but restricts the entry fields not filled in the electronic document, that is, the entry fields that are assumed to be added after printing. Recognition processing.

内容情報抽出部８４は、第１の取得部７０により取得された画像及び第２の取得部７６により取得された画像に示される内容を表す情報である内容情報（メタ情報）を、第１の取得部７０により取得された画像及び第２の取得部７６により取得された画像から抽出する。具体的には、内容情報抽出部８４は、第１の取得部７０により取得された画像から、解析部８２により解析された、記入済みの記入欄に記入された内容を内容情報として抽出し、第２の取得部７６により取得された画像から、認識部８０により認識された内容を内容情報として抽出する。 The content information extraction unit 84 converts the content information (meta information), which is information representing the content shown in the image acquired by the first acquisition unit 70 and the image acquired by the second acquisition unit 76, to the first information. Extraction is performed from the image acquired by the acquisition unit 70 and the image acquired by the second acquisition unit 76. Specifically, the content information extraction unit 84 extracts, as content information, the content entered in the completed entry field analyzed by the analysis unit 82 from the image acquired by the first acquisition unit 70, The content recognized by the recognition unit 80 is extracted as content information from the image acquired by the second acquisition unit 76.

内容情報抽出部８４により抽出された内容情報（メタ情報）は、記憶装置６４などに設けられたデータベースに登録され、例えばデータ処理の際に用いられる。なお、メタ情報を元の電子文書又は読取画像に付加してもよい。 The content information (meta information) extracted by the content information extraction unit 84 is registered in a database provided in the storage device 64 or the like, and is used for data processing, for example. Meta information may be added to the original electronic document or read image.

文書更新部８６は、認識部８０により認識された内容を第１の取得部７０により取得された画像データに付加し、この画像データを更新する。これにより、追記された内容が含まれた画像データとして活用される。また、文書更新部８６は、認識部８０による認識に失敗した画像領域や記入欄におけるデータの型が「画像」として定められている記入欄の画像領域について、この画像領域の部分画像データを第１の取得部７０により取得された画像データに付加してもよい。 The document update unit 86 adds the content recognized by the recognition unit 80 to the image data acquired by the first acquisition unit 70, and updates the image data. As a result, it is used as image data including the added content. Further, the document update unit 86 sets the partial image data of the image region for the image region that has failed to be recognized by the recognition unit 80 or the image region of the entry column whose data type in the entry column is defined as “image”. The image data acquired by one acquisition unit 70 may be added.

なお、認識部８０は、未記入の記入欄のみを認識対象とする場合、文書更新部８６により更新された画像データに基づいて未記入の記入欄と判定された記入欄を認識対象とするよう構成してもよい。例えば３つの記入欄Ａ，Ｂ，Ｃについて、１回目の追記で記入欄Ａに追記がなされ、２回目の追記で記入欄Ｂ，Ｃに追記がなされる場合において、各追記の後に画像の読取り及び追記内容の認識が行われる状況を仮定すると、１回目の認識では認識部８０は記入欄Ａ，Ｂ，Ｃを認識対象とし、２回目の認識では認識部８０は記入欄Ｂ，Ｃのみを認識対象とする。１回目の認識の際は、記入欄Ａ，Ｂ，Ｃが未記入であると解析部８２に解析され、認識部８０は記入欄Ａ，Ｂ，Ｃを認識対象とする。そして、１回目の認識後、文書更新部８６は、記入欄Ａが記入された画像データへと画像データを更新し、２回目の認識の際、認識部８０は、更新された画像データに基づいて未記入と解析された記入欄Ｂ及びＣについて認識対象とする。 In the case where only the unfilled entry field is to be recognized, the recognizing unit 80 recognizes the entry field determined as the unfilled entry field based on the image data updated by the document update unit 86. It may be configured. For example, in the case of three entry fields A, B, and C, when an addition is made in entry field A in the first addition and entry is made in entry fields B and C in the second addition, the image is read after each addition. Assuming the situation in which the content of additional writing is recognized, the recognition unit 80 recognizes the entry fields A, B, and C in the first recognition, and the recognition unit 80 recognizes only the entry fields B and C in the second recognition. Recognized. In the first recognition, if the entry fields A, B, and C are not filled, the analysis unit 82 analyzes the entry fields A, B, and C, and the recognition unit 80 sets the entry fields A, B, and C as recognition targets. Then, after the first recognition, the document update unit 86 updates the image data to the image data in which the entry field A is entered, and in the second recognition, the recognition unit 80 is based on the updated image data. The entry fields B and C that have been analyzed as unfilled are recognized.

次に、文書処理装置としての画像形成装置４の動作について説明する。
図５は、文書処理装置としての画像形成装置４において、フォームフィールド付きの電子文書を印刷するまでの動作の一例を示すフローチャートである。 Next, the operation of the image forming apparatus 4 as a document processing apparatus will be described.
FIG. 5 is a flowchart illustrating an example of an operation until an electronic document with a form field is printed in the image forming apparatus 4 as a document processing apparatus.

ステップ１００（Ｓ１００）において、第１の取得部７０が端末装置６から送信された電子文書を取得する。 In step 100 (S100), the first acquisition unit 70 acquires the electronic document transmitted from the terminal device 6.

ステップ１０２（Ｓ１０２）において、取得した電子文書がフォームフィールド付きの電子文書であるか否かが判定され、フォームフィールド付きである場合、ステップ１０４へ移行し、フォームフィールド付きではない場合、ステップ１０８へ移行する。 In step 102 (S102), it is determined whether or not the acquired electronic document is an electronic document with a form field. If the electronic document has a form field, the process proceeds to step 104. If not, the process proceeds to step 108. Transition.

ステップ１０４（Ｓ１０４）において、記憶制御部７２は、取得した電子文書を記憶装置６４に記憶するよう制御する。 In step 104 (S104), the storage control unit 72 controls to store the acquired electronic document in the storage device 64.

ステップ１０６（Ｓ１０６）において、印刷制御部７４は、電子文書に記憶先の場所を示す光学コード画像を付加し、電子文書を印刷するよう制御する。 In step 106 (S106), the print control unit 74 adds an optical code image indicating the storage location to the electronic document, and controls to print the electronic document.

ステップ１０８（Ｓ１０８）において、電子文書の印刷が行われる。その後、ステップ１１０（Ｓ１１０）において、全てのページについて終了したか否かが判定され、未処のページがあれば、ステップ１０２へ戻り、全てのページについて終了している場合、処理を終了する。 In step 108 (S108), the electronic document is printed. Thereafter, in step 110 (S110), it is determined whether or not all pages have been completed. If there are unprocessed pages, the process returns to step 102. If all pages have been completed, the process is terminated.

図６は、第１の取得部７０が取得するフォームフィールド付きの電子文書の一例を示す模式図である。図６に示した例では、図中のハッチングされている各領域がフォームフィールド情報を伴う記入欄を示しており、図６において記入欄９０〜９８は、データの型として「文字」を記入する記入欄であり、記入欄１００は、印影などの画像が記載される記入欄である。なお、図６に示した例では、記入欄９０及び９２には、電子文書上で既に記入がなされており、記入欄９０には文字列「ＸＸＸ」、記入欄９２には文字列「ＹＹＹ」が記入されている。 FIG. 6 is a schematic diagram illustrating an example of an electronic document with form fields acquired by the first acquisition unit 70. In the example shown in FIG. 6, each hatched area in the figure indicates an entry field with form field information. In FIG. 6, the entry fields 90 to 98 enter “character” as the data type. The entry field 100 is an entry field in which an image such as a seal is described. In the example shown in FIG. 6, the entry fields 90 and 92 are already filled in on the electronic document. The entry field 90 has the character string “XXX”, and the entry field 92 has the character string “YYY”. Is filled in.

図７は、記憶先の場所を示す情報をともなって電子文書が印刷された記録媒体の一例を示す模式図である。図７に示すように、本実施形態では、電子文書の記憶先の場所を示す情報として２次元コード１０２を電子文書に重畳して印刷している。 FIG. 7 is a schematic diagram illustrating an example of a recording medium on which an electronic document is printed with information indicating a storage location. As shown in FIG. 7, in this embodiment, a two-dimensional code 102 is superimposed and printed on the electronic document as information indicating the storage location of the electronic document.

図８は、文書処理装置としての画像形成装置４において、フォームフィールド付きの電子文書を印刷後、印刷がなされた記録媒体からメタ情報を抽出する場合の動作の一例を示すフローチャートである。 FIG. 8 is a flowchart illustrating an example of an operation in the case of extracting meta information from a printed recording medium after printing an electronic document with a form field in the image forming apparatus 4 as a document processing apparatus.

ステップ２００（Ｓ２００）において、第２の取得部７６は、電子文書が印刷された記録媒体を読み取った画像を取得する。 In step 200 (S200), the second acquisition unit 76 acquires an image obtained by reading a recording medium on which an electronic document is printed.

ステップ２０２（Ｓ２０２）において、コード検出部７８は、ステップ２００で取得した画像内に含まれる光学コード画像を検出する。光学コードが検出された場合（ステップ２０４でＹｅｓ）、ステップ２０６へ移行し、検出されなかった場合（ステップ２０４でＮｏ）、ステップ２１８へ移行する。 In step 202 (S202), the code detector 78 detects the optical code image included in the image acquired in step 200. If an optical code is detected (Yes in step 204), the process proceeds to step 206. If not detected (No in step 204), the process proceeds to step 218.

ステップ２０６（Ｓ２０６）において、第１の取得部７０は、光学コードに示される記憶先の電子文書を記憶装置６４から取得する。 In step 206 (S206), the first acquisition unit 70 acquires the storage destination electronic document indicated by the optical code from the storage device 64.

ステップ２０８（Ｓ２０８）において、解析部８２がステップ２０６で取得した電子文書に付加されているフォームフィールド情報を解析する。注目する記入欄について、電子文書上で記入済みの場合（ステップ２１０でＹｅｓ）の場合には、ステップ２１２へ移行し、電子文書上で未記入の場合（ステップ２１０でＮｏ）の場合には、ステップ２１４へ移行する。 In step 208 (S208), the analysis unit 82 analyzes the form field information added to the electronic document acquired in step 206. If the entry field of interest has already been entered on the electronic document (Yes in Step 210), the process proceeds to Step 212, and if it has not been entered on the electronic document (No in Step 210), Control goes to step 214.

ステップ２１２（Ｓ２１２）において、内容情報抽出部８４は、注目している記入欄に記載されている内容を電子文書から取得し、取得した内容をメタ情報とする。 In step 212 (S212), the content information extraction unit 84 acquires the content described in the entry field of interest from the electronic document, and uses the acquired content as meta information.

これに対し、ステップ２１４（Ｓ２１４）では、注目している記入欄について、認識部８０が、フォームフィールド情報に基づいて認識処理を行い、記入欄に追記された内容を読取画像から取得する。ステップ２１６（Ｓ２１６）において、内容情報抽出部８４は、ステップ２１４において取得した内容をメタ情報とする。 On the other hand, in step 214 (S214), the recognizing unit 80 performs a recognition process on the entry field of interest based on the form field information, and acquires the contents added to the entry field from the read image. In step 216 (S216), the content information extraction unit 84 uses the content acquired in step 214 as meta information.

一方、ステップ２０４において光学コードが検出されなかった場合には、ステップ２１８において、フォームフィールド情報を用いずに認識部８０が認識処理を行う。この場合、例えば、認識部８０は、画像全体に対して認識処理を行う。 On the other hand, if no optical code is detected in step 204, the recognition unit 80 performs recognition processing in step 218 without using form field information. In this case, for example, the recognition unit 80 performs recognition processing on the entire image.

ステップ２２０（Ｓ２２０）において、フォームフィールド情報により示される全ての記入欄について処理が終了したか否かを判定し、終了した場合にはステップ２２２へ移行し、終了していない場合にはステップ２０８へ戻る。 In step 220 (S220), it is determined whether or not the processing has been completed for all entry fields indicated by the form field information. If completed, the process proceeds to step 222. If not completed, the process proceeds to step 208. Return.

ステップ２２２（Ｓ２２２）において、メタ情報の抽出対象の全ての記録媒体について処理が終了したか否かを判定し、終了した場合には一連の処理を終了し、終了していない場合にはステップ２００へ戻る。 In step 222 (S222), it is determined whether or not the processing has been completed for all the recording media from which the meta information is to be extracted. If the processing has been completed, the series of processing is terminated. Return to.

図９は、図６で示した電子文書を記録媒体に印刷後に、この記録媒体に追記がなされた場合の記録媒体の読取画像について一例を示した模式図である。ここでは、図６に示した記入欄のうち、記入欄９４〜１００に追記がなされている。 FIG. 9 is a schematic diagram showing an example of a read image on a recording medium when the electronic document shown in FIG. 6 is printed on the recording medium and then added to the recording medium. Here, of the entry fields shown in FIG. 6, the entry fields 94 to 100 are additionally written.

図１０は、図６及び図９で示した電子文書及び読取画像から抽出されるメタ情報について示した模式図であり、図１０（ａ）は、電子文書から抽出されたメタ情報を示し、図１０（ｂ）は、読取画像から抽出されたメタ情報を示す。なお、図１０に示した例では、図６及び図９に示される各記入欄について上から順にＡ１、Ａ２、Ａ３・・・と管理しているものとする。 FIG. 10 is a schematic diagram showing meta information extracted from the electronic document and the read image shown in FIGS. 6 and 9, and FIG. 10A shows meta information extracted from the electronic document. 10 (b) indicates meta information extracted from the read image. In the example shown in FIG. 10, it is assumed that the entry fields shown in FIGS. 6 and 9 are managed as A1, A2, A3.

図１０（ａ）に示すように、電子文書内で既に記入済みの内容については、電子文書からメタ情報が抽出される。また、図１０（ｂ）に示されるように、印刷後に追記された内容については、読取画像からメタ情報が抽出される。なお、記入欄１００に記載されて内容については、画像として抽出されている。 As shown in FIG. 10 (a), meta information is extracted from the electronic document for the contents already filled in the electronic document. Further, as shown in FIG. 10B, meta information is extracted from the read image for the contents added after printing. The contents described in the entry field 100 are extracted as an image.

次に、本発明の第二の実施形態について説明する。第一の実施形態では、フォームフィールド情報が既に付加された電子文書に対する文書処理について説明したが、本実施形態では、記録媒体の読取画像からフォームフィールド情報を生成する点で、第一の実施形態と異なる。具体的には、以下に説明するように、同じ雛形の文書である複数の記録媒体の読取画像の差分から、文書内に設けられた記入欄を抽出し、また、各記入欄に記入されるデータの型を判別することにより、フォームフィールド情報を生成する。 Next, a second embodiment of the present invention will be described. In the first embodiment, document processing for an electronic document to which form field information has already been added has been described. However, in the present embodiment, the first embodiment is provided in that form field information is generated from a read image of a recording medium. And different. Specifically, as described below, the entry fields provided in the document are extracted from the differences between the read images of a plurality of recording media that are documents of the same template, and are filled in each entry field. Form field information is generated by determining the data type.

図１１は、プログラムが実行されることにより実現される第二の実施形態に係る画像形成装置４の機能構成を示すブロック図である。 FIG. 11 is a block diagram showing a functional configuration of the image forming apparatus 4 according to the second embodiment realized by executing a program.

本実施形態の画像形成装置４は、図１１に示されるように、第３の取得部１１０と、差分抽出部１１２と、記入欄抽出部１１４と、記入欄情報生成部１１６と、副認識部１１８と、データ型判定部１２０とが追加されている点で、第一の実施形態に係る画像形成装置４とは異なる。 As shown in FIG. 11, the image forming apparatus 4 according to the present embodiment includes a third acquisition unit 110, a difference extraction unit 112, an entry column extraction unit 114, an entry column information generation unit 116, and a sub-recognition unit. The image forming apparatus 4 is different from the image forming apparatus 4 according to the first embodiment in that a data type determination unit 120 is added.

第３の取得部１１０は、複数の記録媒体を読み取って得られた複数の画像を取得する。第３の取得部１１０は、例えば、同じ雛形の文書が表された複数の記録媒体の読取画像を取得する。ここで、この雛型には記入欄が設けられており、第３の取得部１１０が取得する記録媒体には、記入欄に記載がなされているものを含む。なお、本実施形態では、第３の取得部１１０は、読取装置１４により記録媒体を読み取って得られた画像を取得する。 The third acquisition unit 110 acquires a plurality of images obtained by reading a plurality of recording media. For example, the third acquisition unit 110 acquires read images of a plurality of recording media on which documents of the same template are represented. Here, the template is provided with an entry field, and the recording medium acquired by the third acquisition unit 110 includes those recorded in the entry field. In the present embodiment, the third acquisition unit 110 acquires an image obtained by reading the recording medium with the reading device 14.

図１２は、第３の取得部１１０が取得する複数の読取画像について示す模式図である。図１２に示した例では、「請求書」についての予め定められた雛形の各文書用紙に対し、それぞれ宛名や金額などが雛形で定められた記入欄に沿って追記されている。 FIG. 12 is a schematic diagram illustrating a plurality of read images acquired by the third acquisition unit 110. In the example shown in FIG. 12, an address, an amount, and the like are added to each document sheet of a predetermined template for “invoice” along an entry column defined by the template.

差分抽出部１１２は、第３の取得部１１０が取得した複数の画像における差分を抽出し、複数の画像間で共通する記載部分と共通しない記載部分を取得する。 The difference extraction unit 112 extracts differences in a plurality of images acquired by the third acquisition unit 110, and acquires a description part that is not common to a description part common to the plurality of images.

図１３は、図１２に示した複数の読取画像において共通する記載を示した模式図である。図１３に示した通り、各読取画像の差分をとることにより、「請求書」についての予め定められた雛形情報が生成される。このように、複数の画像間で共通する記載部分を抽出することにより、文書のフォーマット情報（雛形情報）が生成される。 FIG. 13 is a schematic diagram showing a description common to a plurality of read images shown in FIG. As shown in FIG. 13, by taking the difference between the read images, predetermined template information for “invoice” is generated. In this way, document format information (model information) is generated by extracting a description portion common to a plurality of images.

記入欄抽出部１１４は、差分抽出部１１２が抽出した複数の画像間の差分に基づいて、複数の画像間で共通しない記載部分について、記入欄として抽出する。これにより、各記録媒体において、それぞれ任意に追記されている部分が記入欄として抽出される。記入欄抽出部１１４は、具体的には、記入欄の位置及び範囲を抽出する。例えば、記入欄抽出部１１４は、差分において複数の画像間で共通しない記載部分の位置を記入欄の位置として抽出する。また、記入欄抽出部１１４は、複数の画像間で共通しない記載部分を取り囲む罫線であって、複数の画像間で共通して記載されている罫線について、この罫線で囲まれる範囲を記入欄の範囲として抽出する。 Based on the differences between the plurality of images extracted by the difference extraction unit 112, the entry column extraction unit 114 extracts a description portion that is not common among the plurality of images as an entry column. Thereby, in each recording medium, a part arbitrarily added is extracted as an entry field. Specifically, the entry column extraction unit 114 extracts the position and range of the entry column. For example, the entry column extraction unit 114 extracts a position of a description portion that is not common among a plurality of images in the difference as a position of the entry column. In addition, the entry column extraction unit 114 is a ruled line that surrounds a description portion that is not common among a plurality of images, and a ruled line that is described in common between a plurality of images is a range surrounded by the ruled line. Extract as a range.

記入欄情報生成部１１６は、記入欄抽出部１１４により抽出された情報により、記入欄についての情報であるフォームフィールド情報（記入欄情報）を生成する。具体的には、記入欄の位置及び範囲についての情報を含むフォームフィールド情報が生成される。 The entry field information generation unit 116 generates form field information (entry field information) that is information about the entry field from the information extracted by the entry field extraction unit 114. Specifically, form field information including information on the position and range of the entry field is generated.

記入欄情報生成部１１６が生成したフォームフィールド情報を、例えばフォームフィールド情報を備えていない「請求書」の電子文書に付加してもよい。ここで、フォームフィールド情報が付加される電子文書としては、上述のように、差分抽出部１１２の処理により複数の画像間で共通する記載部分を抽出することにより得られた文書の雛形情報であってもよい。また、第１の取得部７０が、記入欄抽出部１１４による抽出結果から生成されたフォームフィールド情報が付加された電子文書を取得し、上述の処理を行ってもよい。 For example, the form field information generated by the entry field information generation unit 116 may be added to an “invoice” electronic document that does not include the form field information. Here, as described above, the electronic document to which form field information is added is template information of a document obtained by extracting a description portion common to a plurality of images by the processing of the difference extraction unit 112. May be. Further, the first acquisition unit 70 may acquire an electronic document to which form field information generated from the extraction result by the entry column extraction unit 114 is added, and perform the above-described processing.

なお、本実施形態では、記入欄情報生成部１１６は、後述するデータ型判定部１２０により判定された記入欄のデータの型を示す情報をさらに含んだフォームフィールド情報を生成する。 In the present embodiment, the entry field information generation unit 116 generates form field information further including information indicating the data type of the entry field determined by the data type determination unit 120 described later.

副認識部１１８は、記入欄抽出部１１４に抽出された記入欄に記載された内容の認識処理を行う。なお、副認識部１１８により認識された内容を、記入欄に記入される候補を記録した辞書データとしてデータベースに格納してもよい。また、認識部８０の認識において、この辞書データに基づいて認識を行うようにしてもよい。 The sub-recognition unit 118 performs recognition processing for the contents described in the entry field extracted by the entry field extraction unit 114. The content recognized by the sub-recognition unit 118 may be stored in the database as dictionary data in which candidates to be entered in the entry field are recorded. Further, in recognition by the recognition unit 80, recognition may be performed based on this dictionary data.

データ型判定部１２０は、副認識部１１８により認識された内容に基づいて、記入欄へ記載されるデータの型を判定する。具体的には、例えば、注目する記入欄についての各読取画像の記載内容が「数字」だけである場合、データの型として「数字」であると判定する。 The data type determination unit 120 determines the type of data described in the entry field based on the content recognized by the sub recognition unit 118. Specifically, for example, when the description content of each read image for the entry column of interest is only “numbers”, it is determined that the data type is “numbers”.

なお、注目する記入欄についての各読取画像の記載内容が「金額」だけである場合、データの型として「金額」であると判定してもよいし、注目する記入欄についての各読取画像の記載内容が「日付」だけである場合、データの型として「日付」であると判定してもよい。例えば、データ型判定部１２０は、「￥」や「＄」といった金額を示す印と共に数字が記載されている場合には、「金額」と判定する。また、例えば、「／」などにより数字が区切られて記載されている場合には、データ型判定部１２０は、「日付」と判定する。 In addition, when the description content of each read image with respect to the entry column of interest is only “amount”, it may be determined that the data type is “amount”, or When the description content is only “date”, it may be determined that the data type is “date”. For example, the data type determination unit 120 determines “amount” when a number is written together with a mark indicating an amount such as “¥” or “$”. Further, for example, when numbers are separated and described by “/” or the like, the data type determination unit 120 determines “date”.

このように、本実施形態では、記入欄情報生成部１１６は、データ型判定部１２０により判定された記入欄のデータの型を示す情報をさらに含んだフォームフィールド情報を生成する。 Thus, in this embodiment, the entry field information generation unit 116 generates form field information further including information indicating the data type of the entry field determined by the data type determination unit 120.

図１４は、記入欄情報生成部１１６により生成されたフォームフィールド情報の一例について説明する模式図である。図１４に示した例では、図中のハッチングされた領域がそれぞれ記入欄として抽出された様子を示しており、ハッチング領域内に記載された「標準」、「金額」、「日付」は、各記入欄のデータの型を示している。なお、「標準」とは、文字列からなるデータであることを示す型である。 FIG. 14 is a schematic diagram for explaining an example of form field information generated by the entry field information generation unit 116. In the example shown in FIG. 14, the hatched areas in the figure are extracted as entry fields, and “standard”, “amount”, and “date” written in the hatched areas are Indicates the type of data in the entry field. The “standard” is a type indicating that the data is a character string.

図１５は、第二の実施形態において、フォームフィールド情報を生成する際の動作の一例について示すフローチャートである。 FIG. 15 is a flowchart illustrating an example of an operation when generating form field information in the second embodiment.

ステップ３００（Ｓ３００）において、第３の取得部１１０が、複数の記録媒体の読取り画像を取得する。 In step 300 (S300), the third acquisition unit 110 acquires read images of a plurality of recording media.

ステップ３０２（Ｓ３０２）において、差分抽出部１１２が、ステップ３００で取得した各読取画像の差分抽出処理を行う。この際、複数の画像間で共通する記載部分については（ステップ３０４でＮｏ）、ステップ３０６（Ｓ３０６）において、文書のフォーマット情報（雛形情報）として抽出される。一方、複数の画像間で共通しない記載部分については（ステップ３０４でＹｅｓ）、ステップ３０８（Ｓ３０８）において、記入欄として抽出される。 In step 302 (S302), the difference extraction unit 112 performs a difference extraction process for each read image acquired in step 300. At this time, a description portion common to a plurality of images (No in step 304) is extracted as document format information (model information) in step 306 (S306). On the other hand, a description portion that is not common among a plurality of images (Yes in Step 304) is extracted as an entry field in Step 308 (S308).

ステップ３１０（Ｓ３１０）において、副認識部１１８が、ステップ３０８で抽出された記入欄に記載された内容の認識処理を行う。 In step 310 (S310), the sub-recognition unit 118 performs recognition processing for the contents described in the entry column extracted in step 308.

ステップ３１２（Ｓ３１２）において、データ型判定部１２０は、ステップ３１０での認識結果に基づいて、記入欄へ記載されるデータの型を判定する。 In step 312 (S312), the data type determination unit 120 determines the type of data described in the entry field based on the recognition result in step 310.

ステップ３１４（Ｓ３１４）において、記入欄情報生成部１１６は、ステップ３０８及びステップ３１２で得られた記入欄についての情報に基づいて、フォームフィールド情報を生成する。 In step 314 (S314), the entry field information generation unit 116 generates form field information based on the information about the entry fields obtained in step 308 and step 312.

ステップ３１６（Ｓ３１６）において、ステップ３０６で生成された文書のフォーマット情報とステップ３１４で生成されたフォームフィールド情報とから、フォームフィールド付きの電子文書を作成する。 In step 316 (S316), an electronic document with a form field is created from the format information of the document generated in step 306 and the form field information generated in step 314.

以上、文書処理装置を画像形成装置により実現する実施形態について説明したが、文書処理装置としては、必ずしも画像形成装置により実現する必要はなく、文書処理装置が印刷装置及び読取装置などを設けていなくてもよい。 As described above, the embodiment in which the document processing apparatus is realized by the image forming apparatus has been described. However, the document processing apparatus does not necessarily have to be realized by the image forming apparatus, and the document processing apparatus does not include a printing apparatus and a reading apparatus. May be.

また、第１の取得部７０は、端末装置６から送信された画像データを取得する構成について説明したが、画像形成装置４自体が保持する画像データを取得してもよいし、画像形成装置４自体が作成した画像データを取得してもよい。 The first acquisition unit 70 has been described with respect to the configuration for acquiring the image data transmitted from the terminal device 6. However, the first acquisition unit 70 may acquire the image data held by the image forming apparatus 4 itself, or the image forming apparatus 4. You may acquire the image data which self produced.

また、以上説明した実施形態では、記入欄の位置及び範囲とこの記入欄に記入されるデータの型とを表す情報が付加された画像データに基づく印刷がなされた記録媒体の読取画像について内容情報を抽出する例について説明したが、データの型に関する情報を用いずに内容情報を抽出するよう構成してもよい。 In the embodiment described above, the content information about the read image of the recording medium printed based on the image data to which the information indicating the position and range of the entry field and the type of data entered in the entry field is added. Although an example of extracting the content information has been described, the content information may be extracted without using the information regarding the data type.

すなわち、例えば、第１の取得部７０が記入欄の位置及び範囲を表す情報が付加された第１の画像データを取得し、第２の取得部７６が第１の画像データに対応する画像が印刷された記録媒体を読み取った第２の画像データを取得し、認識部８０が第１の画像データに付加された位置及び範囲の情報から特定される領域に対応する第２の画像データの領域に記入された内容情報を認識するよう構成してもよい。 That is, for example, the first acquisition unit 70 acquires first image data to which information indicating the position and range of the entry field is added, and the second acquisition unit 76 generates an image corresponding to the first image data. An area of the second image data corresponding to an area specified by the recognition unit 80 from the position and range information added to the first image data by acquiring the second image data obtained by reading the printed recording medium It may be configured to recognize the content information entered in.

また、認識部８０が用いる第１の画像データに付加された位置及び範囲の情報は、記憶装置６４に記憶された第１の画像データ以外から取得してもよい。例えば、印刷制御部７４が、第１の取得部７０が取得した第１の画像の印刷の際に第１の画像に付加された記入欄の位置及び範囲を表す情報を第１の画像とともに記録媒体に印刷し、読取装置１４がこの記録媒体を読み取り、第２の取得部７６が第１の画像並びに記入欄の位置及び範囲を表す情報を含む第２の画像を取得し、認識部８０はこの記録媒体を読み取って得られた記入欄の位置及び範囲を表す情報に基づいて認識処理を行うよう構成してもよい。 Further, the position and range information added to the first image data used by the recognition unit 80 may be obtained from other than the first image data stored in the storage device 64. For example, the print control unit 74 records information indicating the position and range of the entry field added to the first image when the first image acquired by the first acquisition unit 70 is printed together with the first image. Printing on the medium, the reading device 14 reads the recording medium, the second acquisition unit 76 acquires the first image and the second image including information indicating the position and range of the entry field, and the recognition unit 80 You may comprise so that a recognition process may be performed based on the information showing the position and range of the entry column obtained by reading this recording medium.

２文書処理システム
４画像形成装置
６端末装置
１２印刷装置
１４読取装置
６４記憶装置
７０第１の取得部
７２記憶制御部
７４印刷制御部
７６第２の取得部
７８コード検出部
８０認識部
８２解析部
８４内容情報抽出部
８６文書更新部
１１０第３の取得部
１１２差分抽出部
１１４記入欄抽出部
１１６記入欄情報生成部
１１８副認識部
１２０データ型判定部 2 Document processing system 4 Image forming device 6 Terminal device 12 Printing device 14 Reading device 64 Storage device 70 First acquisition unit 72 Storage control unit 74 Print control unit 76 Second acquisition unit 78 Code detection unit 80 Recognition unit 82 Analysis unit 84 Content information extraction unit 86 Document update unit 110 Third acquisition unit 112 Difference extraction unit 114 Entry field extraction unit 116 Entry field information generation unit 118 Sub-recognition unit 120 Data type determination unit

Claims

First acquisition means for acquiring first image data to which information representing a position and a range is added;
Second acquisition means for acquiring second image data obtained by reading a recording medium on which an image corresponding to the first image data is printed;
Recognizing means for recognizing content information entered in the recording medium from the area of the second image data corresponding to the area specified from the position and range;
Updating means for updating the content information recognized by the recognition means to the third image data added to the first image data,
When there is an area where the content information cannot be recognized by the recognizing means, the updating means adds the second image data corresponding to the area to the first image data.
Processing equipment.

The first acquisition means acquires first image data to which information on the type of data to be filled is added in addition to the information indicating the position and range,
The processing device according to claim 1, wherein the recognizing unit recognizes the content information with reference to information on a type of the data.

It further comprises reading means for reading a recording medium on which information representing the position and range is printed,
The processing apparatus according to claim 1, wherein the second acquisition unit acquires information representing the position and range printed on the recording medium.

Further comprising reading means for reading a recording medium on which storage destination information in which information indicating the position and range is stored is printed;
The processing apparatus according to claim 1, wherein the second acquisition unit acquires the storage destination information printed on the recording medium.

The processing apparatus according to any one of claims 1 to 4, wherein the recognition unit recognizes an area in the first image data in which content information is not entered in the area.

It further has an extraction means for extracting a difference between a plurality of image data and extracting a position and a range from information regarding the difference,
The processing apparatus according to claim 1, wherein the recognition unit recognizes content information written in a region corresponding to the position and range extracted by the extraction unit.

A step in which information representing the position and range to obtain the first image data added,
Obtaining second image data obtained by reading a recording medium on which the first image data is printed;
Recognizing from the second image data content information entered on the recording medium in an area specified from the position and range;
An update step for updating the recognized content information to the third image data added to the first image data;
In the update step, when there is an area where the content information cannot be recognized, the program adds the second image data corresponding to the area to the first image data .