JP2011237840A

JP2011237840A - Document processing device

Info

Publication number: JP2011237840A
Application number: JP2010105894A
Authority: JP
Inventors: Masashi Kawasaki; 真史川崎
Original assignee: Murata Machinery Ltd
Current assignee: Murata Machinery Ltd
Priority date: 2010-04-30
Filing date: 2010-04-30
Publication date: 2011-11-24

Abstract

PROBLEM TO BE SOLVED: To provide a document processing device which easily generates templates for Optical Character Recognition(OCR).SOLUTION: A document processing device recognizes areas 51 to 53 highlighted in a template image, and stores the coordinates of points indicating each of the areas and character strings (specified characters) within each of the areas as template information. Based on the stored template information, on a newly loaded script image data, characters, other than the specified characters, within the areas defined based on the template coordinates are recognized by OCR and stored as variable parameters together with the loaded script image data.

Description

本発明は、原稿の画像から文字を抽出し、その画像の管理に用いる文書処理装置に関する。 The present invention relates to a document processing apparatus that extracts characters from an image of a document and is used for managing the image.

従来、帳票を読み取り、その後に帳票内の線分を抽出し、抽出された線分に基づいて、自動的に帳票内の特定の領域を抽出することで、記入済みの帳票をＯＣＲ（ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅｃｏｇｎｉｔｉｏｎ）を行うときのテンプレートを作成することが提案されている。 Conventionally, a form is read, then a line segment in the form is extracted, and a specific area in the form is automatically extracted based on the extracted line segment, whereby the completed form is converted into an OCR (Optical Character). It has been proposed to create a template for performing (Recognition).

特開２００５−２６７３９４号公報JP 2005-267394 A

従来の技術では、帳票以外、つまり線分のない原稿では、容易にＯＣＲテンプレートを作成することができない。
本発明の課題は、帳票に限らず様々な原稿を対象としたＯＣＲテンプレートを容易に作成することのできる技術を提供することにある。 With the conventional technology, it is not possible to easily create an OCR template for a document other than a form, that is, an original without a line segment.
An object of the present invention is to provide a technique capable of easily creating an OCR template for various originals as well as forms.

以下に、課題を解決するための手段として複数の態様を説明する。これら態様は、必要に応じて任意に組み合わせることができる。 Hereinafter, a plurality of modes will be described as means for solving the problems. These aspects can be arbitrarily combined as necessary.

本発明の第１見地に係る文書処理装置は、原稿から得られた画像において、指定文字に基づいて特定される文字を変動パラメータとして認識する。この文書処理装置は、画像読取部、領域位置情報取得部、指定文字取得部、およびテンプレート記憶部を備える。画像読取部は、原稿上の画像をテンプレート画像として読み取る。領域位置情報取得部は、テンプレート画像において、予め定められた特徴を有する指定領域の位置情報を領域位置情報として取得する。指定文字取得部は、指定領域に含まれる文字列から、指定文字を取得する。テンプレート記憶部は、指定文字を領域位置情報と関連付けて記憶する。
この文書処理装置によると、予め特定の様式を有する原稿を読み取って、そこから、未知の原稿において文字認識を行うべき領域を示す情報（指定領域の位置情報および指定文字）を取得することができる。従って、ＯＣＲテンプレートが容易に作成される。 The document processing apparatus according to the first aspect of the present invention recognizes a character specified based on a designated character as a variation parameter in an image obtained from an original. The document processing apparatus includes an image reading unit, a region position information acquisition unit, a designated character acquisition unit, and a template storage unit. The image reading unit reads an image on a document as a template image. The region position information acquisition unit acquires position information of a designated region having a predetermined feature as region position information in the template image. The designated character acquisition unit acquires a designated character from a character string included in the designated area. The template storage unit stores the designated character in association with the region position information.
According to this document processing apparatus, it is possible to read a document having a specific format in advance and obtain information (position information of a designated region and designated characters) indicating a region where character recognition is to be performed on an unknown document. . Therefore, an OCR template can be easily created.

文書処理装置は、指定文字に対する変動パラメータの相対位置を表す相対位置情報を取得する相対位置情報取得部をさらに備えていてもよい。さらに、テンプレート記憶部は、相対位置情報を指定文字と関連付けて記憶してもよい。これによって、文字認識するべき領域の位置を、より厳密に規定することができる。 The document processing apparatus may further include a relative position information acquisition unit that acquires relative position information indicating the relative position of the variation parameter with respect to the designated character. Further, the template storage unit may store the relative position information in association with the designated character. As a result, the position of the area where the character should be recognized can be more strictly defined.

領域位置情報取得部は、特定の色を示す領域を指定領域として認識することで、領域位置情報を取得してもよい。 The area position information acquisition unit may acquire area position information by recognizing an area indicating a specific color as a specified area.

また、本発明の第２見地に係る文書処理装置は、原稿から得られた画像において、指定文字に基づいて特定される文字を変動パラメータとして認識する。この文書処理装置は、画像読取部、テンプレート記憶部、テンプレート認識部、および変動パラメータ抽出部を備える。画像読取部は、原稿上の画像を原稿画像として読み取る。テンプレート記憶部は、指定文字と領域位置情報とを関連付けて記憶する。テンプレート認識部は、原稿画像において、領域位置情報によって特定される領域である抽出領域内で、指定文字を認識する。変動パラメータ抽出部は、指定文字以外に抽出領域に存在する文字を変動パラメータとして抽出する。
このように、上述の領域位置情報および指定文字に基づいて、原稿画像上で、文字認識を行って変動パラメータを取得すべき領域が特定される。つまり、原稿画像の全体に対して文字認識を行うことなく、特定の領域のみについて文字認識を行うことで、必要な文字列が容易に得られる。なお、変動パラメータ抽出部は、指定文字以外に抽出領域に存在する文字の全てを変動パラメータとして抽出するだけでなく、指定文字以外に抽出領域に存在する文字の一部のみを変動パラメータとして抽出してもよい。 Further, the document processing apparatus according to the second aspect of the present invention recognizes a character specified based on a designated character as a variation parameter in an image obtained from an original. The document processing apparatus includes an image reading unit, a template storage unit, a template recognition unit, and a variation parameter extraction unit. The image reading unit reads an image on a document as a document image. The template storage unit stores the designated character and the region position information in association with each other. The template recognizing unit recognizes a designated character in an extraction area that is an area specified by area position information in a document image. The variation parameter extraction unit extracts characters existing in the extraction area other than the designated character as variation parameters.
As described above, based on the above-described region position information and the designated character, the region on which the variation parameter is to be acquired by performing character recognition on the document image is specified. That is, a necessary character string can be easily obtained by performing character recognition on only a specific area without performing character recognition on the entire document image. Note that the fluctuation parameter extraction unit not only extracts all the characters existing in the extraction area other than the designated characters as fluctuation parameters, but also extracts only a part of the characters existing in the extraction area other than the designated characters as fluctuation parameters. May be.

文書処理装置は、抽出領域に指定文字が存在しない場合、より広い領域に対応するように領域位置情報を補正する領域位置情報補正部をさらに備えてもよい。 The document processing apparatus may further include an area position information correction unit that corrects the area position information so as to correspond to a wider area when the designated character does not exist in the extraction area.

文書処理装置は、変動パラメータを原稿画像と関連付けて記憶する文書記憶部をさらに備えていてもよい。 The document processing apparatus may further include a document storage unit that stores the variation parameter in association with the document image.

文書処理装置は、ユーザから検索語の入力を受け付ける検索受付部と、検索語と一致する変動パラメータと関連付けられた原稿画像を、文書記憶部内の原稿画像から選択する検索部と、をさらに備えてもよい。 The document processing apparatus further includes a search reception unit that receives an input of a search word from a user, and a search unit that selects a document image associated with a variation parameter that matches the search word from the document image in the document storage unit. Also good.

また、文書処理装置は、上述の第１見地および第２見地に係る構成の両方を備えていてもよい。すなわち、文書処理装置は、画像読取部、領域位置情報取得部、指定文字取得部、テンプレート記憶部、テンプレート認識部、および変動パラメータ抽出部を備えていてもよい。画像読取部は、原稿上の画像をテンプレート画像又は原稿画像として読み取る。領域位置情報取得部は、画像読取部によって読み取られたテンプレート画像において、予め定められた特徴を有する指定領域の位置を、領域位置情報として取得する。指定文字取得部は、指定領域に含まれる文字列から、指定文字を取得する。テンプレート記憶部は、指定文字を領域位置情報と関連付けて記憶する。テンプレート認識部は、画像読取部によって読み取られた原稿画像において、領域位置情報によって特定される領域である抽出領域内で、指定文字を認識する。変動パラメータ抽出部は、指定文字以外に抽出領域に存在する文字を変動パラメータとして抽出する。 Further, the document processing apparatus may include both of the configurations related to the first viewpoint and the second viewpoint. That is, the document processing apparatus may include an image reading unit, a region position information acquisition unit, a designated character acquisition unit, a template storage unit, a template recognition unit, and a variation parameter extraction unit. The image reading unit reads an image on a document as a template image or a document image. The area position information acquisition unit acquires the position of a designated area having a predetermined feature as area position information in the template image read by the image reading unit. The designated character acquisition unit acquires a designated character from a character string included in the designated area. The template storage unit stores the designated character in association with the region position information. The template recognizing unit recognizes a designated character in an extraction area that is an area specified by the area position information in the document image read by the image reading unit. The variation parameter extraction unit extracts characters existing in the extraction area other than the designated character as variation parameters.

本発明によると、予め特定の様式を有する原稿を読み取って、そこから、未知の原稿において文字認識を行うべき領域を示す情報（領域位置情報および指定文字）を取得することができる。従って、ＯＣＲテンプレートが容易に作成される。 According to the present invention, it is possible to read a document having a specific format in advance, and acquire information (region position information and designated characters) indicating a region where character recognition should be performed on an unknown document. Therefore, an OCR template can be easily created.

本発明の実施形態に係る複合機の概要を示すブロック図。1 is a block diagram showing an overview of a multifunction machine according to an embodiment of the present invention. 複合機におけるテンプレート登録処理のフローチャート。10 is a flowchart of template registration processing in the multifunction machine. テンプレート原稿の例を示す図。The figure which shows the example of a template original document. 指定領域の例を示す図。The figure which shows the example of a designated area | region. 仮テンプレートの例を示す図。The figure which shows the example of a temporary template. 登録されるテンプレートの例を示す図。The figure which shows the example of the template registered. 登録されるテンプレートの他の例を示す図。The figure which shows the other example of the template registered. 登録されるテンプレートのさらに他の例を示す図。The figure which shows the further another example of the template registered. 複合機における文書登録のフローチャート。6 is a flowchart of document registration in the multifunction peripheral. 原稿画像における、図４Ａのテンプレートを用いたテンプレート認識および変動パラメータの抽出を模式的に示す図。FIG. 4B is a diagram schematically showing template recognition and variation parameter extraction using the template of FIG. 4A in a document image. 原稿画像における、図４Ｂのテンプレートを用いたテンプレート認識および変動パラメータの抽出を模式的に示す図。FIG. 5 is a diagram schematically showing template recognition and variation parameter extraction using the template of FIG. 4B in a document image. 原稿画像における、図４Ｃのテンプレートを用いたテンプレート認識および変動パラメータの抽出を模式的に示す図。The figure which shows typically template recognition and extraction of a fluctuation parameter using the template of FIG. 4C in a document image.

（１）複合機の構成
図１を参照して、本実施形態の複合機について説明する。図１は、複合機１全体の構成を示すブロック図である。 (1) Configuration of MFP The MFP according to the present embodiment will be described with reference to FIG. FIG. 1 is a block diagram showing the overall configuration of the multifunction machine 1.

複合機１は、コピー機、プリンタ、スキャナ、およびファクシミリ装置としての機能を有する装置である。
図１に示すように、複合機１は、ＮＣＵ（ＮｅｔｗｏｒｋＣｏｎｔｒｏｌＵｎｉｔ）１１、モデム１２、ネットワークインターフェース１３、画像読取部１４、画像形成部１５、操作パネル１６、記憶部１７および制御装置１８を備える。複合機１内の各部は、バスによって接続されている。 The multifunction device 1 is a device having functions as a copier, a printer, a scanner, and a facsimile device.
As shown in FIG. 1, the multifunction device 1 includes an NCU (Network Control Unit) 11, a modem 12, a network interface 13, an image reading unit 14, an image forming unit 15, an operation panel 16, a storage unit 17, and a control device 18. . Each part in the multifunction device 1 is connected by a bus.

ＮＣＵ１１は、ＰＳＴＮ（公衆交換電話網：ＰｕｂｌｉｃＳｕｂｓｃｒｉｂｅｒＴｅｌｅｐｈｏｎｅＮｅｔｗｏｒｋ）１０１に接続されており、電話回線を介して発呼および着呼を含む通信を制御する。 The NCU 11 is connected to a PSTN (Public Subscriber Telephone Network) 101 and controls communication including outgoing and incoming calls via a telephone line.

モデム１２は、デジタル信号とアナログ信号を相互変換する変復調装置である。 The modem 12 is a modem device that converts between a digital signal and an analog signal.

ネットワークインターフェース１３は、複合機１のインターネット１０２への接続を可能とし、複合機１の外部の機器との通信を可能とする。ネットワークインターフェース１３として、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）に接続可能なインターフェースが設けられており、複合機１が同ＬＡＮに接続された他の機器との通信可能であってもよい。
画像読取部１４は、原稿上の画像を読み取ることで画像データを取得する。本実施形態では、画像読取部１４はカラー画像データを取得することができる。具体的には、画像読取部１４は、図示しない原稿搬送装置、プラテンガラス、光源、ミラー群、レンズ、読取素子、および信号処理回路等を備える。原稿搬送装置によって搬送される原稿またはプラテンガラス上に載置された原稿は、光源によって照明される。原稿からの反射光は、ミラー群およびレンズによって読取素子上に導かれる。読取素子は受けた光に応じた電気信号を出力し、信号処理回路がこの電気信号にデジタル化等の処理を施すことで、画像データが得られる The network interface 13 enables the MFP 1 to connect to the Internet 102 and enables communication with devices outside the MFP 1. As the network interface 13, an interface that can be connected to a LAN (Local Area Network) is provided, and the MFP 1 may be able to communicate with other devices connected to the LAN.
The image reading unit 14 acquires image data by reading an image on a document. In the present embodiment, the image reading unit 14 can acquire color image data. Specifically, the image reading unit 14 includes an unillustrated document conveying device, a platen glass, a light source, a mirror group, a lens, a reading element, a signal processing circuit, and the like. A document transported by a document transport device or a document placed on a platen glass is illuminated by a light source. Reflected light from the document is guided onto the reading element by a mirror group and a lens. The reading element outputs an electrical signal corresponding to the received light, and the signal processing circuit performs processing such as digitization on the electrical signal, thereby obtaining image data.

画像形成部１５は、用紙上に画像データに沿った画像を形成する。画像形成部１５としては、電子写真方式またはインクジェット方式等の方式によって画像形成を行う装置が採用される。 The image forming unit 15 forms an image along the image data on a sheet. As the image forming unit 15, an apparatus that forms an image by an electrophotographic method or an ink jet method is employed.

操作パネル１６は、表示パネルおよびタッチセンサを有するタッチパネル１６１；並びにハードキー１６２等を備える。タッチパネル１６１は、表示パネルによって画像を表示することで、画像形成装置の動作状況およびエラーの発生等の情報をユーザに伝えると共に、ソフトキーを介してユーザからの操作を受け付けることができる。また、ハードキーは、テンキー、実行キーおよびおキャンセルキー等のキーを含み、ユーザからの操作を受け付けることができる。ユーザは、操作パネル１６を用いて、例えば、コピー、スキャンまたはファクシミリ送信等を実行する指示を入力することができる。 The operation panel 16 includes a display panel and a touch panel 161 having a touch sensor; a hard key 162 and the like. The touch panel 161 displays an image on the display panel, thereby notifying the user of information such as the operation status of the image forming apparatus and the occurrence of an error, and accepting an operation from the user via a soft key. The hard keys include keys such as a numeric keypad, an execution key, and a cancel key, and can accept operations from the user. For example, the user can input an instruction to execute copying, scanning, facsimile transmission, or the like using the operation panel 16.

記憶部１７は、複合機１の電源がオフとなってもその記憶内容が消えない不揮発性メモリである。記憶部１７は、具体的にはテンプレート記憶領域（テンプレート記憶部）１７１および文書記憶領域（文書記憶部）１７２を有する。テンプレート記憶領域１７１には、後述のテンプレートが記憶される。また、文書記憶領域１７２には、後述の画像データおよび変動パラメータが記憶される。 The storage unit 17 is a non-volatile memory that does not erase the stored contents even when the power of the multifunction device 1 is turned off. Specifically, the storage unit 17 includes a template storage area (template storage unit) 171 and a document storage area (document storage unit) 172. The template storage area 171 stores a template described later. The document storage area 172 stores image data and variation parameters, which will be described later.

制御装置１８は、複合機１内の各部の動作を制御し、また種々の演算等の処理を実行する装置である。制御装置１８は、ＭＰＵ（ＭｉｃｒｏＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）等の演算装置並びにフラッシュメモリおよびＳＤＲＡＭ（ＳｙｎｃｈｒｏｎｏｕｓＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等の記憶装置を備える。ＭＰＵは、フラッシュメモリ内に格納されたプログラムを読み出して実行することで、制御装置１８の機能を実現する。具体的には、制御装置１８は、複合機１の各部の動作を制御することで、コピー、プリント、ファクシミリ送受信およびスキャン等の動作を実行させる。また、制御装置１８は、文字認識部２０、テンプレート取得部３１（領域位置情報取得部、指定文字取得部、相対位置情報取得部）、テンプレート認識部４１、座標補正部４２（領域位置情報補正部）、変動パラメータ抽出部４３、および検索部４４として機能する。各部の機能については後述する。フラッシュメモリは、種々のプログラムを格納する。ＳＤＲＡＭは、読込および書込が可能なメモリであり、ＭＰＵがプログラムを実行するときのワークエリアとして機能する。 The control device 18 is a device that controls the operation of each unit in the multi-function device 1 and executes processing such as various calculations. The control device 18 includes an arithmetic device such as an MPU (Micro Processing Unit) and a storage device such as a flash memory and an SDRAM (Synchronous Dynamic Random Access Memory). The MPU implements the function of the control device 18 by reading and executing a program stored in the flash memory. Specifically, the control device 18 controls the operation of each unit of the multifunction device 1 to execute operations such as copying, printing, facsimile transmission / reception, and scanning. The control device 18 includes a character recognition unit 20, a template acquisition unit 31 (region position information acquisition unit, designated character acquisition unit, relative position information acquisition unit), a template recognition unit 41, and a coordinate correction unit 42 (region position information correction unit). ), Function as a fluctuation parameter extraction unit 43 and a search unit 44. The function of each part will be described later. The flash memory stores various programs. The SDRAM is a readable / writable memory and functions as a work area when the MPU executes a program.

以上に述べた構成の他に、複合機１は、必要に応じて種々のデバイスを有する。例えば、複合機１は、図示しないＣＯＤＥＣ（ＣｏｄｅｒａｎｄＤｅｃｏｄｅｒ）を備えている。ＣＯＤＥＣ画像データを圧縮（符号化）／展開（復号化）する。 In addition to the configuration described above, the multifunction device 1 includes various devices as necessary. For example, the multifunction device 1 includes a CODEC (Coder and Decoder) (not shown). Compress (encode) / decompress (decode) the CODEC image data.

（２）テンプレート登録
図２、図３Ａ〜図３Ｃ、および図４Ａ〜図４Ｃを参照して、テンプレート登録について説明する。図２はテンプレート登録の流れを示すフローチャートであり、図３Ａはテンプレート原稿の例を示す図であり、図３Ｂは指定領域の例を示す図であり、図３Ｃは仮テンプレートの例を示す図である。図４Ａ〜図４Ｃは、登録されるテンプレートの例である。 (2) Template Registration Template registration will be described with reference to FIGS. 2, 3A to 3C, and 4A to 4C. FIG. 2 is a flowchart showing the flow of template registration, FIG. 3A is a diagram showing an example of a template document, FIG. 3B is a diagram showing an example of a designated area, and FIG. 3C is a diagram showing an example of a temporary template. is there. 4A to 4C are examples of registered templates.

（２−１）テンプレート原稿の読取：ステップＳ１
図２に示すように、まず、画像読取部１４がテンプレート原稿５０上の画像を読み取り、画像データを取得する（ステップＳ１）。図３Ａに示すように、テンプレート原稿５０は、色マーカーの付された第１指定領域５１、第２指定領域５２および第３指定領域５３を有する (2-1) Reading of template document: Step S1
As shown in FIG. 2, first, the image reading unit 14 reads an image on the template original 50 and acquires image data (step S1). As shown in FIG. 3A, the template document 50 has a first designated area 51, a second designated area 52, and a third designated area 53 to which color markers are attached.

（２−２）指定領域の座標取得：ステップＳ２〜Ｓ３
テンプレート取得部３１は、領域位置情報取得部として機能する。 (2-2) Acquisition of coordinates of designated area: steps S2 to S3
The template acquisition unit 31 functions as a region position information acquisition unit.

すなわち、テンプレート取得部３１は、予め指定された色が付された領域を認識することで指定領域を認識し（ステップＳ２）、その座標を領域位置情報として取得する（ステップＳ３）。例えば、予め黄色がマーカー色として指定されている場合、テンプレート取得部３１は、テンプレート原稿５０の画像の中で、黄色が付された領域を認識する。
なお、マーカーは、指定領域の有する特徴の一例に過ぎない。指定領域は、予め定められ、テンプレート原稿における他の領域と区別できる特徴を有していればよい。特徴としては、マーカー以外に、字体、下線、色文字等が挙げられる。 That is, the template acquisition unit 31 recognizes a designated area by recognizing an area with a color designated in advance (step S2), and obtains its coordinates as area position information (step S3). For example, when yellow is designated as the marker color in advance, the template acquisition unit 31 recognizes a yellowed area in the image of the template document 50.
The marker is only an example of the characteristics of the designated area. The designated area may be determined in advance and have a feature that can be distinguished from other areas in the template document. Features include fonts, underlines, color characters, etc., in addition to markers.

さらに、図３Ｂに示すように、テンプレート取得部３１は、認識された領域の対角の２点の座標を取得する。図３Ｂの例では、第１指定領域５１に対応する領域位置情報として、座標（Ｘ１１，Ｙ１１）および座標（Ｘ１２，Ｙ１２）が取得される。このように特定された２つの座標を対角とする長方形の領域を、以下では「テンプレート領域」と称し、図３Ａおよび図３Ｂでは符号“５４”を付す。 Furthermore, as illustrated in FIG. 3B, the template acquisition unit 31 acquires the coordinates of two diagonal points of the recognized area. In the example of FIG. 3B, coordinates (X11, Y11) and coordinates (X12, Y12) are acquired as area position information corresponding to the first designated area 51. The rectangular region having the two coordinates specified in this way as diagonals is hereinafter referred to as a “template region”, and denoted by reference numeral “54” in FIGS. 3A and 3B.

同様に、第２指定領域５２に対応する第２テンプレート領域５５の座標、および第３指定領域５３に対応する第３テンプレート領域５６の座標が取得される（図３Ａおよび図３Ｃ）。 Similarly, the coordinates of the second template area 55 corresponding to the second designated area 52 and the coordinates of the third template area 56 corresponding to the third designated area 53 are acquired (FIGS. 3A and 3C).

なお、色の指定は、デフォルトで設定されていてもよいし、操作パネル１６を介して入力されてもよい。 Note that the color designation may be set by default or may be input via the operation panel 16.

（２−３）仮テンプレートの取得：ステップＳ４〜Ｓ６
テンプレート取得部３１は、指定文字取得部および相対位置情報取得部としても機能する。テンプレート取得部３１によって、仮テンプレートが取得され、その後テンプレートが編集されることで、最終的なテンプレート（座標、指定文字、および相対位置情報を含む）が取得される。 (2-3) Acquisition of temporary template: Steps S4 to S6
The template acquisition unit 31 also functions as a designated character acquisition unit and a relative position information acquisition unit. The template acquisition unit 31 acquires a temporary template, and then edits the template to acquire a final template (including coordinates, designated characters, and relative position information).

まず、仮テンプレートの取得について説明する。テンプレート取得部３１の制御の下、テンプレート領域に対して、文字認識部２０によるＯＣＲ処理が実行される（ステップＳ４）。すなわち、画像データに基づいて、文字認識部２０はテンプレート領域に含まれる文字列を認識する。図３Ａに示すように、第１テンプレート領域５４に含まれる文字列は“技術文書：ＺＺＺ報告書”であり、第２テンプレート領域５５に含まれる文字列は“日付：２０１０年１月１５日”であり、第３テンプレート領域５６に含まれる文字列は“作成者：鈴木太郎”である。 First, acquisition of a temporary template will be described. Under the control of the template acquisition unit 31, the character recognition unit 20 performs OCR processing on the template area (step S4). That is, based on the image data, the character recognition unit 20 recognizes a character string included in the template area. As shown in FIG. 3A, the character string included in the first template area 54 is “technical document: ZZZ report”, and the character string included in the second template area 55 is “date: January 15, 2010”. The character string included in the third template area 56 is “author: Taro Suzuki”.

こうして認識された文字列およびステップＳ３で取得された座標は、仮テンプレートとして、上述のＳＤＲＡＭ等の記憶媒体に記憶される（ステップＳ５）。１つのテンプレート原稿５０内で、上述の指定された色を示す領域が他にあれば（ステップＳ６でＮｏ）、他の領域について、座標取得、ＯＣＲ処理および仮テンプレートの記憶が実行される（ステップＳ３〜Ｓ５）。１つのテンプレート原稿５０内で、上述の指定された色を示す全ての領域について仮テンプレートの記憶が完了すれば（ステップＳ６でＹｅｓ）、次の処理が行われる。
図３Ａの例では、３つの指定領域５１〜５３について以上の処理が実行される。 The character string thus recognized and the coordinates acquired in step S3 are stored as a temporary template in a storage medium such as the above-described SDRAM (step S5). If there is another area indicating the specified color in one template document 50 (No in step S6), coordinate acquisition, OCR processing, and temporary template storage are executed for the other area (step S6). S3 to S5). If the storage of the temporary template is completed for all the areas indicating the above-mentioned designated color in one template document 50 (Yes in step S6), the following processing is performed.
In the example of FIG. 3A, the above processing is executed for the three designated areas 51 to 53.

仮テンプレートについて、具体的に説明する。図３Ｃに示すように、仮テンプレート６０は、第１情報６１、第２情報６２および第３情報６３を有する。第１情報６１は、第１テンプレート領域５４の座標を有すると共に、Ｉｎｄｅｘとして、第１テンプレート領域５４から得られた文字列である“技術文書：ＺＺＺ報告書”を有する。第２情報６２は、第２テンプレート領域５５の座標を有すると共に、Ｉｎｄｅｘとして、第２テンプレート領域５５から得られた文字列である“日付：２０１０年１月１５日”を有する。第３情報６３は、第３テンプレート領域５６の座標を有すると共に、Ｉｎｄｅｘとして、第３テンプレート領域５６から得られた文字列である“作成者：鈴木太郎”を有する。 The temporary template will be specifically described. As illustrated in FIG. 3C, the temporary template 60 includes first information 61, second information 62, and third information 63. The first information 61 has the coordinates of the first template area 54 and has “technical document: ZZZ report” which is a character string obtained from the first template area 54 as an index. The second information 62 has the coordinates of the second template area 55 and has “date: January 15, 2010” which is a character string obtained from the second template area 55 as an index. The third information 63 has the coordinates of the third template region 56 and has “Creator: Taro Suzuki” which is a character string obtained from the third template region 56 as an Index.

（２−４）テンプレートの編集：ステップＳ７
次の処理として、仮テンプレートの編集が行われる（ステップＳ７）。
具体的には、タッチパネル１６１上に、仮テンプレート６０が表示されると共に、テンプレートに含まれる情報の編集を受け付けるソフトキーが表示される。“編集”とは、追加、変更、および削除を含む。なお、編集されない情報は表示される必要はない。例えば、座標が編集不可に設定されている場合、座標は仮テンプレートとして表示されなくてもよい。 (2-4) Template editing: Step S7
As the next processing, the temporary template is edited (step S7).
Specifically, temporary template 60 is displayed on touch panel 161, and soft keys for accepting editing of information included in the template are displayed. “Edit” includes addition, change, and deletion. Information that is not edited need not be displayed. For example, when the coordinates are set so as not to be edited, the coordinates may not be displayed as a temporary template.

テンプレート取得部３１は、これらのソフトキーを介してユーザが行った入力に基づいて仮テンプレートを編集することで、最終的なテンプレートを取得する。最終的に得られるテンプレートは、上述の座標を含むと共に、インデックス名、指定文字および相対位置情報を含んでいてもよい。 The template acquisition unit 31 acquires the final template by editing the temporary template based on the input made by the user via these soft keys. The template finally obtained may include the above-described coordinates, and may include an index name, designated characters, and relative position information.

インデックス名は、ユーザにより指定される文字列であり、一般的には、後述の変動パラメータの分類を示す文字列である。具体的には、インデックス名として、技術文書、年間報告書、月間報告書および実験報告書等の文書の種類；作成日および提出日等の日付の種類；並びに文書の作成者、責任者および受領者等の氏名の種類を表すアルファベットが指定される。 The index name is a character string designated by the user, and is generally a character string indicating a classification of a variation parameter described later. Specifically, the name of the index includes technical documents, annual reports, monthly reports and experimental reports, etc .; the types of dates such as the date of creation and submission; An alphabet representing the name type of the person or the like is designated.

指定文字とは、ＩｎｄｅｘおよびＤｅｌｉｍｉｔｅｒを含む概念である。指定文字には、記号も含まれる。Ｉｎｄｅｘは、後述の変動パラメータに付随して表記される文字列である。Ｄｅｌｉｍｉｔｅｒは、変動パラメータと他の文字列との間の区切りを示す記号であり、例えばＩｎｄｅｘと変動パラメータとの区切りを示す記号である。Ｄｅｌｉｍｉｔｅｒの記号としては、コロン又はセミコロン等が用いられる。 The designated character is a concept including Index and Delimiter. The designated character includes a symbol. The Index is a character string that is written in association with a variation parameter described later. Delimiter is a symbol indicating a delimiter between the variation parameter and another character string, for example, a symbol indicating a delimiter between the index and the variation parameter. As the symbol of Delimiter, a colon or a semicolon is used.

相対位置情報とは、指定文字に対する変動パラメータの相対位置を示す情報であり、例えば、指定文字に対する変動パラメータの方向を表す情報である。方向の入力は、“右”、“左”、“上”および“下”の選択肢を示すソフトキーが表示され、いずれかのソフトキーが押下されることで行われる。
また、テンプレート取得部３１は、ユーザからの指示に応じて、領域位置情報である座標を修正することもできる。つまり、タッチパネル１６１に表示された座標を見たユーザが、より適切な値を入力することで、上述の座標は修正可能である。 The relative position information is information indicating the relative position of the variation parameter with respect to the designated character, for example, information indicating the direction of the variation parameter with respect to the designated character. The direction input is performed by displaying soft keys indicating choices of “right”, “left”, “up”, and “down”, and pressing one of the soft keys.
Moreover, the template acquisition part 31 can also correct the coordinate which is area | region position information according to the instruction | indication from a user. That is, the above-mentioned coordinates can be corrected by the user who has seen the coordinates displayed on the touch panel 161 inputting a more appropriate value.

テンプレートの具体例について以下に説明する。
（ｉ）第１例
図４Ａのテンプレート７０は、第１テンプレート情報７０１、第２テンプレート情報７０２および第３テンプレート情報７０３を含む。第１テンプレート情報７０１、第２テンプレート情報７０２および第３テンプレート情報７０３はそれぞれ、第１情報６１、第２情報６２および第３情報６３が編集されることで得られる情報である。テンプレート情報７０１〜７０３は、第１テンプレート領域５４、第２テンプレート領域５５および第３テンプレート領域５６の座標をそれぞれ含むと共に、インデックス名、指定文字および相対位置情報を含む。 A specific example of the template will be described below.
(I) First Example The template 70 of FIG. 4A includes first template information 701, second template information 702, and third template information 703. The first template information 701, the second template information 702, and the third template information 703 are information obtained by editing the first information 61, the second information 62, and the third information 63, respectively. The template information 701 to 703 includes coordinates of the first template region 54, the second template region 55, and the third template region 56, respectively, and includes an index name, designated characters, and relative position information.

図４Ａに示すように、第１テンプレート情報７０１は、インデックス名（Ｉｎｄｅｘｎａｍｅ）として、文書名である“技術文書”に対応する文字列“ＴｅｃｈｎｉｃａｌＤｏｃ”を有する。第２テンプレート情報７０２は、“日付”に対応する文字列“Ｄａｔｅ”を有する。第３テンプレート情報７０３は、“作成者”に対応する文字列“Ｏｒｉｇｉｎａｔｏｒ”を有する。これらの文字列は、ユーザが入力することで得られる。 As illustrated in FIG. 4A, the first template information 701 includes a character string “Technical Doc” corresponding to a “technical document” that is a document name, as an index name. The second template information 702 has a character string “Date” corresponding to “date”. The third template information 703 includes a character string “Originator” corresponding to “Creator”. These character strings are obtained when the user inputs them.

図４Ａに示すように、指定文字として、第１テンプレート情報７０１は、Ｉｎｄｅｘとして文字列“技術文書：”を有する。この文字列は、仮テンプレートの第１情報６１（図３Ｃ）において、Ｉｎｄｅｘに含まれる文字列“ＺＺＺ技術文書”が、ユーザの指示に基づいて削除されることで得られる。また、第２テンプレート情報７０２は、Ｉｎｄｅｘとして文字列“日付：”を有する。この文字列は、第２情報６２のＩｎｄｅｘに含まれる文字列“２０１０年１月１５日”が削除されることで得られる。第３テンプレート情報７０３は、Ｉｎｄｅｘとして文字列“作成者：”を有する。この文字列は、第３情報のＩｎｄｅｘに含まれる文字列“鈴木太郎”が削除されることで得られる。このように、図４Ａのテンプレート情報７０１〜７０３において、指定文字であるＩｎｄｅｘは、変動パラメータの分類を示す文字列およびＤｅｌｉｍｉｔｅｒの両方を含む。 As shown in FIG. 4A, as the designated character, the first template information 701 has a character string “technical document:” as an index. This character string is obtained by deleting the character string “ZZZ technical document” included in the Index in the first information 61 (FIG. 3C) of the temporary template based on a user instruction. The second template information 702 includes a character string “Date:” as an Index. This character string is obtained by deleting the character string “January 15, 2010” included in the Index of the second information 62. The third template information 703 includes a character string “creator:” as an index. This character string is obtained by deleting the character string “Taro Suzuki” included in the index of the third information. As described above, in the template information 701 to 703 in FIG. 4A, the index that is the designated character includes both the character string indicating the classification of the variation parameter and the Delimiter.

テンプレート原稿５０は、指定領域５１〜５３中に、指定文字以外の文字列（後述の変動パラメータに相当する文字列である、“ＺＺＺ報告書”、“２０１０年１月１５日”、および“鈴木太郎”）を含む。それゆえ、仮テンプレート６０から文字列を削除することで、指定文字が得られる。ただし、テンプレート原稿が指定文字以外の文字列を指定領域中に含まない場合、このような削除の作業は必要ない。 The template document 50 includes character strings other than the designated characters (“ZZZ report”, “January 15, 2010”, and “Suzuki, which are character strings corresponding to the variation parameters described later) in the designated areas 51 to 53. Taro "). Therefore, the designated character can be obtained by deleting the character string from the temporary template 60. However, when the template document does not include a character string other than the designated character in the designated area, such deletion work is not necessary.

図４Ａに示すように、第１テンプレート情報７０１、第２テンプレート情報７０２および第３テンプレート情報７０３は、相対位置情報（Ｌｏｃａｔｉｏｎ）として、“右（Ｒｉｇｈｔ）”との情報を有する。 As shown in FIG. 4A, the first template information 701, the second template information 702, and the third template information 703 have information of “Right” as relative position information (Location).

（ｉｉ）第２例
図４Ｂのテンプレート７１は、指定文字として、Ｉｎｄｅｘを含まず、Ｄｅｌｉｍｉｔｅｒのみを有する。それ以外は、本例のテンプレート７１は、第１例のテンプレート７０と同様の構成である。具体的には、テンプレート７１の第１テンプレート情報７１１、第２テンプレート情報７１２、および第３テンプレート情報７１３は、Ｄｅｌｉｍｉｔｅｒとしてコロン（：）を有し、Ｉｎｄｅｘを含まない。
このように、テンプレート取得部３１は、仮テンプレート６０に含まれる“Ｉｎｄｅｘ”をユーザの指示に応じて削除することができるし、“Ｄｅｌｉｍｉｔｅｒ”を追加することもできる。 (Ii) Second Example The template 71 in FIG. 4B does not include an index as a designated character, and has only a delimiter. Otherwise, the template 71 of this example has the same configuration as the template 70 of the first example. Specifically, the first template information 711, the second template information 712, and the third template information 713 of the template 71 have a colon (:) as a delimiter and do not include an index.
As described above, the template acquisition unit 31 can delete “Index” included in the temporary template 60 in accordance with a user instruction, and can also add “Delimiter”.

（ｉｉｉ）第３例
図４Ｃのテンプレート７２は、指定文字として、ＩｎｄｅｘおよびＤｅｌｉｍｉｔｅｒの両方を、個別の情報として含む以外は、第１例のテンプレート７０と同様の構成である。 (Iii) Third Example The template 72 of FIG. 4C has the same configuration as the template 70 of the first example, except that both the index and the delimiter are included as individual characters as designated characters.

つまり、テンプレート７２の第１テンプレート情報７２１は、Ｉｎｄｅｘとして“技術文書”を含み、Ｄｅｌｉｍｉｔｅｒとしてコロン含む。第２テンプレート情報７２２は、Ｉｎｄｅｘとして“日付”を含み、Ｄｅｌｉｍｉｔｅｒとしてコロンを含む。第３テンプレート情報７２３は、Ｉｎｄｅｘとして“作成者”を含み、Ｄｅｌｉｍｉｔｅｒとしてコロンを含む。
このように、テンプレート取得部３１は、仮テンプレート６０に含まれるＩｎｄｅｘの一部を削除することができるし、“Ｄｅｌｉｍｉｔｅｒ”を追加することもできる。 That is, the first template information 721 of the template 72 includes “technical document” as an index and a colon as a delimiter. The second template information 722 includes “date” as an index and a colon as a delimiter. The third template information 723 includes “creator” as an index and a colon as a delimiter.
As described above, the template acquisition unit 31 can delete a part of the Index included in the temporary template 60 and can also add “Delimiter”.

（２−５）テンプレートの登録：ステップＳ８およびＳ９
編集が完了した後（ステップＳ８でＹｅｓ）、テンプレート取得部３１の制御の下、編集後の最新のテンプレートが、記憶部１７のテンプレート記憶領域１７１に記憶される（ステップＳ９）。こうして、テンプレート領域の座標と指定文字とが、関連付けて記憶される。 (2-5) Template registration: Steps S8 and S9
After the editing is completed (Yes in step S8), the latest template after editing is stored in the template storage area 171 of the storage unit 17 under the control of the template acquisition unit 31 (step S9). Thus, the coordinates of the template area and the designated character are stored in association with each other.

編集の完了は、例えば、ハードキーとして設けられたスタートキーの押下、ソフトキーとして設けられた“完了”キーの押下、ソフトキーとして設けられた“保存”キーの押下等によって指示される。 Completion of editing is instructed, for example, by pressing a start key provided as a hard key, pressing a “done” key provided as a soft key, pressing a “save” key provided as a soft key, or the like.

（３）文書登録
複合機１は、文字を含む原稿の画像（文書）を変動パラメータと関連付けて登録し、この変動パラメータを文書の管理に用いることができる。すなわち、変動パラメータは、インデックスとして利用される。
以下に、文書登録について図５を参照して説明する。図５は文書登録の流れを示すフローチャートである。 (3) Document Registration The multifunction device 1 can register an image (document) of a document including characters in association with a variation parameter, and use the variation parameter for document management. That is, the variation parameter is used as an index.
Hereinafter, document registration will be described with reference to FIG. FIG. 5 is a flowchart showing the flow of document registration.

（３−１）テンプレート選択：Ｓ１１
複合機１は、操作パネル１６により、ユーザから、以下の処理に用いるテンプレートの選択を受け付ける（ステップＳ１１）。 (3-1) Template selection: S11
The multi-function device 1 accepts selection of a template used for the following processing from the user via the operation panel 16 (step S11).

（３−２）原稿読取：Ｓ１２〜Ｓ２０
画像読取部１４は、ユーザにセットされた原稿から画像を読み取る（ステップＳ１２）。 (3-2) Document reading: S12 to S20
The image reading unit 14 reads an image from a document set by the user (step S12).

（３−３）変動パラメータの抽出：ステップＳ１３〜Ｓ１９
テンプレート認識部４１は、得られた画像から、ステップＳ１１で選択されたテンプレートに含まれる座標によって特定される領域を抽出する（ステップＳ１３）。 (3-3) Fluctuation parameter extraction: Steps S13 to S19
The template recognition unit 41 extracts a region specified by the coordinates included in the template selected in step S11 from the obtained image (step S13).

抽出された領域（以下、「抽出領域」と称する）に対して、テンプレート認識部４１の制御の下、文字認識部２０によってＯＣＲ処理が実行される（ステップＳ１４）。 The character recognition unit 20 performs OCR processing on the extracted region (hereinafter referred to as “extraction region”) under the control of the template recognition unit 41 (step S14).

テンプレート認識部４１は、こうして得られた文字列に、テンプレート中の指定文字（Ｉｎｄｅｘおよび／またはＤｅｌｉｍｉｔｅｒ）が含まれているかどうか判定する（ステップＳ１５）。 The template recognition unit 41 determines whether or not the character string obtained in this way includes a designated character (Index and / or Delimiter) in the template (step S15).

指定文字と一致する文字が含まれていれば（ステップＳ１６でＹｅｓ）、変動パラメータ抽出部４３は、指定文字に対して、テンプレート中の相対位置情報により特定される位置に存在する文字列を、変動パラメータとして抽出する（ステップＳ１７）。
変動パラメータ抽出部４３は、抽出領域に存在する文字の全てを変動パラメータとして抽出するだけでなく、指定文字以外に抽出領域に存在する文字の一部のみを変動パラメータとして抽出してもよい。例えば、変動パラメータとして抽出すべき文字数に制限値が設けられていてもよく、抽出領域中に存在する文字列がこの制限値を超えるときは、変動パラメータ抽出部４３は、この制限値に合致する数の文字列を抽出してもよい。
なお、変動パラメータ抽出部４３が抽出した文字が、タッチパネル１６１上に表示されることで、ユーザが抽出内容を確認することができる。変動パラメータ抽出部４３は、ユーザの指示等に応じて、変動パラメータの修正（一部の文字列の削除および変更等）をさらに行ってもよい。 If a character that matches the designated character is included (Yes in step S16), the variation parameter extracting unit 43 determines, for the designated character, a character string that exists at the position specified by the relative position information in the template, Extracted as a variation parameter (step S17).
The variation parameter extraction unit 43 may extract not only all characters existing in the extraction area as variation parameters, but also extract only a part of characters existing in the extraction region other than the designated characters as variation parameters. For example, a limit value may be provided for the number of characters to be extracted as the variation parameter, and when the character string existing in the extraction region exceeds the limit value, the variation parameter extraction unit 43 matches the limit value. A number of character strings may be extracted.
The characters extracted by the variation parameter extraction unit 43 are displayed on the touch panel 161, so that the user can check the extracted contents. The variation parameter extraction unit 43 may further modify the variation parameter (deletion and change of some character strings, etc.) in accordance with a user instruction or the like.

指定文字と一致する文字が含まれていなければ（ステップＳ１６でＮｏ）、座標補正部４２は、抽出領域がより広くなるように、座標を補正する（ステップＳ１８）。具体的には、１つのテンプレート情報における２つの座標のＸ成分同士の差および／またはＹ成分同士の差が大きくなるように、座標が補正される。補正後の座標報に基づいて、領域抽出以降の処理が再度行われる。 If a character that matches the designated character is not included (No in step S16), the coordinate correction unit 42 corrects the coordinates so that the extraction area becomes wider (step S18). Specifically, the coordinates are corrected so that the difference between the X components and / or the difference between the Y components of two coordinates in one template information becomes large. Based on the corrected coordinate information, the processing after the region extraction is performed again.

以上の処理は、１つのテンプレートによって特定される全ての抽出領域について変動パラメータの取得が完了するまで行われる（ステップＳ１９）。 The above processing is performed until the acquisition of the variation parameter is completed for all the extraction regions specified by one template (step S19).

変動パラメータ抽出の具体例について、図６Ａ〜図６Ｃを参照して説明する。図６Ａ〜図６Ｃは、同一の原稿から得られた画像において、上述のテンプレート７０〜７２をそれぞれ用いた場合のテンプレート認識および変動パラメータの抽出を模式的に示す図面である。以下に説明する。 A specific example of fluctuation parameter extraction will be described with reference to FIGS. 6A to 6C. 6A to 6C are diagrams schematically showing template recognition and variation parameter extraction when the above-described templates 70 to 72 are respectively used in images obtained from the same document. This will be described below.

（ｉ）第１例
ステップＳ１２で図４Ａのテンプレート７０が選択された場合、図６Ａに示すように、原稿画像８０において、第１テンプレート情報７０１、第２テンプレート７０２、および第３テンプレート７０３の座標に基づいて、第１抽出領域８１、第２抽出領域８２および第３抽出領域８３が抽出される（ステップＳ１３）。 (I) First Example When the template 70 of FIG. 4A is selected in step S12, as shown in FIG. 6A, the coordinates of the first template information 701, the second template 702, and the third template 703 in the document image 80 are displayed. Based on the above, the first extraction area 81, the second extraction area 82, and the third extraction area 83 are extracted (step S13).

第１抽出領域８１において、第１テンプレート情報７０１のＩｎｄｅｘ“技術文書：”に一致する文字列８１１が認識され、相対位置情報に基づいて、文字列８１１の右側に存在する変動パラメータ８１２である文字列“実験報告書”が抽出される（ステップＳ１４〜Ｓ１７）。 In the first extraction area 81, the character string 811 that matches the index “technical document:” of the first template information 701 is recognized, and the character that is the variation parameter 812 present on the right side of the character string 811 based on the relative position information. The column “experiment report” is extracted (steps S14 to S17).

第２抽出領域８２では、第２テンプレート情報７０２のＩｎｄｅｘ“日付：”に一致する文字列８２１が認識され、相対位置情報に基づいて、文字列８２１の右側に存在する変動パラメータ８２２である文字列“２０１０年３月１５日”が抽出される（ステップＳ１４〜Ｓ１７）。 In the second extraction area 82, the character string 821 that matches the index “date:” of the second template information 702 is recognized, and the character string that is the variation parameter 822 present on the right side of the character string 821 based on the relative position information. “March 15, 2010” is extracted (steps S14 to S17).

第３抽出領域８３では、第３テンプレート情報７０３のＩｎｄｅｘ“作成者：”に一致する文字列８３１が認識され、相対位置情報に基づいて、文字列８３１の右側に存在する変動パラメータ８３２である文字列“田中次郎”が抽出される（ステップＳ１４〜Ｓ１７）。 In the third extraction area 83, the character string 831 that matches the index “creator:” of the third template information 703 is recognized, and the character that is the variation parameter 832 present on the right side of the character string 831 based on the relative position information. The column “Jiro Tanaka” is extracted (steps S14 to S17).

（ｉｉ）第２例
本例では、用いられるテンプレートが異なる以外、特に指定文字が異なる以外は、第１例と同じ処理がなされ、抽出される変動パラメータも同じである。 (Ii) Second Example In this example, the same processing as that in the first example is performed except that the used template is different and the designated characters are different, and the extracted variation parameters are also the same.

ステップＳ１２で図４Ｂのテンプレート７１が選択された場合も、図６Ｂに示すように、原稿画像８０において、第１抽出領域８１、第２抽出領域８２および第３抽出領域８３が抽出される（ステップＳ１３）。 When the template 71 of FIG. 4B is selected in step S12, as shown in FIG. 6B, the first extraction area 81, the second extraction area 82, and the third extraction area 83 are extracted from the document image 80 (step S12). S13).

第１抽出領域８１において、第１テンプレート情報７１１のＤｅｌｉｍｉｔｅｒに一致する記号８４１が認識され、相対位置情報に基づいて、変動パラメータ８１２が抽出される（ステップＳ１４〜Ｓ１７）。 In the first extraction area 81, the symbol 841 that matches the Delimiter of the first template information 711 is recognized, and the variation parameter 812 is extracted based on the relative position information (steps S14 to S17).

第２抽出領域８２では、第２テンプレート情報７１２のＤｅｌｉｍｉｔｅｒに一致する記号８４２が認識され、相対位置情報に基づいて、変動パラメータ８２２が抽出される（ステップＳ１４〜Ｓ１７）。 In the second extraction area 82, the symbol 842 that matches the Delimiter of the second template information 712 is recognized, and the variation parameter 822 is extracted based on the relative position information (steps S14 to S17).

第３抽出領域８３では、第３テンプレート情報７１３のＤｅｌｉｍｉｔｅｒに一致する記号８４３が認識され、相対位置情報に基づいて、変動パラメータ８３２が抽出される（ステップＳ１４〜Ｓ１７）。 In the third extraction region 83, the symbol 843 that matches the Delimiter of the third template information 713 is recognized, and the variation parameter 832 is extracted based on the relative position information (steps S14 to S17).

（ｉｉｉ）第３例
本例では、用いられるテンプレートが異なる以外、特に指定文字が異なる以外は、第１例と同じ処理がなされ、抽出される変動パラメータも同じである。 (Iii) Third Example In this example, the same processing as that in the first example is performed except that the used template is different and the designated characters are different, and the extracted variation parameters are also the same.

ステップＳ１２で図４Ｃのテンプレート７２が選択された場合も、図６Ｃに示すように、原稿画像８０において、第１抽出領域８１、第２抽出領域８２および第３抽出領域８３が抽出される（ステップＳ１３）。 When the template 72 of FIG. 4C is selected in step S12, as shown in FIG. 6C, the first extraction area 81, the second extraction area 82, and the third extraction area 83 are extracted from the document image 80 (step S12). S13).

第１抽出領域８１において、第１テンプレート情報７２１のＩｎｄｅｘ“技術文書”に一致する文字列記号８５１およびＤｅｌｉｍｉｔｅｒに一致する記号８４１が認識される（ステップＳ１４〜Ｓ１６）。そして、相対位置情報に基づいて、記号８４１の右側に存在する変動パラメータ８１２が抽出される（ステップＳ１４〜Ｓ１７）。 In the first extraction area 81, the character string symbol 851 that matches the Index “technical document” of the first template information 721 and the symbol 841 that matches the Delimiter are recognized (steps S14 to S16). Based on the relative position information, the fluctuation parameter 812 existing on the right side of the symbol 841 is extracted (steps S14 to S17).

第２抽出領域８２では、第２テンプレート情報７２２のＩｎｄｅｘ“日付”に一致する文字列８５２およびＤｅｌｉｍｉｔｅｒに一致する記号８４２が認識される（ステップＳ１４〜Ｓ１６）。そして、相対位置情報に基づいて、記号８４２の右側に存在する変動パラメータ８２２が抽出される（ステップＳ１４〜Ｓ１７）。 In the second extraction area 82, the character string 852 that matches the index “date” of the second template information 722 and the symbol 842 that matches Delimiter are recognized (steps S14 to S16). Based on the relative position information, the fluctuation parameter 822 existing on the right side of the symbol 842 is extracted (steps S14 to S17).

第３抽出領域８３では、第３テンプレート情報７２３のＩｎｄｅｘ“作成者”に一致する文字列８５３およびＤｅｌｉｍｉｔｅｒに一致する記号８４３が認識される（ステップＳ１４〜Ｓ１６）。そして、相対位置情報に基づいて、記号８４３の右側に存在する変動パラメータ８３２が抽出される（ステップＳ１４〜Ｓ１７）。 In the third extraction area 83, the character string 853 that matches the index “creator” of the third template information 723 and the symbol 843 that matches Delimiter are recognized (steps S14 to S16). Then, based on the relative position information, the fluctuation parameter 832 present on the right side of the symbol 843 is extracted (steps S14 to S17).

（３−４）文書登録：ステップＳ２０
変動パラメータの取得が完了した後（ステップＳ１９でＹｅｓ）、原稿の画像データと変動パラメータとが関連付けられて、文書記憶領域１７２に記憶される（ステップＳ２０）。具体的には、変動パラメータはＸＭＬ（ＥｘｔｅｎｓｉｂｌｅＭａｒｋｕｐＬａｎｇｕａｇｅ）データとして保存される。 (3-4) Document registration: Step S20
After the acquisition of the variation parameter is completed (Yes in step S19), the image data of the document and the variation parameter are associated and stored in the document storage area 172 (step S20). Specifically, the variation parameter is stored as XML (Extensible Markup Language) data.

（４）文書検索
こうして登録された変動パラメータは、例えば文書の分類および検索に用いられる。
ユーザが操作、パネル１６を介して、検索語を入力し、さらに検索実行を指示したとき、検索部４４は、この検索語に一致する変動パラメータを有する画像を文書記憶領域１７２内で検索する。検索結果はタッチパネル１６１上に表示される。こうして、ユーザは容易に目的の画像を入手することができる。
なお、検索部４４においては、部分一致または完全一致等の検索方法が、デフォルトで設定されていてもよいし、ユーザの指定によって検索方法が変更可能であってもよい。 (4) Document Search The variation parameters registered in this way are used for document classification and search, for example.
When the user inputs a search word via the operation panel 16 and instructs the execution of the search, the search unit 44 searches the document storage area 172 for an image having a variation parameter that matches the search word. The search result is displayed on the touch panel 161. In this way, the user can easily obtain the target image.
In the search unit 44, a search method such as partial match or complete match may be set by default, or the search method may be changeable by user designation.

以上、本発明の一実施形態について説明したが、本発明は上記実施形態に限定されるものではなく、発明の要旨を逸脱しない範囲で種々の変更が可能である As mentioned above, although one Embodiment of this invention was described, this invention is not limited to the said embodiment, A various change is possible in the range which does not deviate from the summary of invention.

（５）特徴
複合機１は、原稿から得られた画像において、指定文字に基づいて特定される文字を変動パラメータとして認識する。この複合機１は、画像読取部１４、テンプレート取得部３１、およびテンプレート記憶領域１７１を備える。画像読取部１４は、原稿上の画像を読み取る。テンプレート取得部３１は、画像読取部１４によって読み取られたテンプレート画像において、予め定められた特徴を有する指定領域５１〜５４の座標を取得する。テンプレート取得部３１は、指定領域５１〜５４に含まれる文字列から、指定文字を取得する。テンプレート記憶領域１７１は、指定文字を座標と関連付けて記憶する。 (5) Features The multi function device 1 recognizes a character specified based on a designated character as a variation parameter in an image obtained from a document. The multi-function device 1 includes an image reading unit 14, a template acquisition unit 31, and a template storage area 171. The image reading unit 14 reads an image on a document. The template acquisition unit 31 acquires the coordinates of the designated areas 51 to 54 having predetermined characteristics in the template image read by the image reading unit 14. The template acquisition unit 31 acquires a designated character from the character string included in the designated areas 51 to 54. The template storage area 171 stores designated characters in association with coordinates.

この複合機１によると、予め特定の様式を有する原稿を読み取って、そこから、未知の原稿において文字認識を行うべき領域を示す情報（領域位置情報および指定文字）を取得することができる。従って、ＯＣＲテンプレートが容易に作成される。 According to the multi function device 1, it is possible to read a document having a specific format in advance, and acquire information (region position information and designated characters) indicating an area where character recognition should be performed on an unknown document. Therefore, an OCR template can be easily created.

複合機１において、テンプレート取得部３１は、指定文字に対する変動パラメータの相対位置を表す相対位置情報を取得する、相対位置情報取得部としても機能する。さらに、テンプレート記憶領域１７１は、相対位置情報を指定文字と関連付けて記憶してもよい。これによって、文字認識するべき領域の位置を、より厳密に規定することができる。 In the multi function device 1, the template acquisition unit 31 also functions as a relative position information acquisition unit that acquires relative position information indicating the relative position of the variation parameter with respect to the designated character. Further, the template storage area 171 may store the relative position information in association with the designated character. As a result, the position of the area where the character should be recognized can be more strictly defined.

複合機１において、テンプレート取得部３１は特定の色を示す領域を認識する。 In the multi function device 1, the template acquisition unit 31 recognizes a region showing a specific color.

また、複合機１は、テンプレート認識部４１および変動パラメータ抽出部４３を備える。テンプレート認識部４１は、原稿画像８０において、座標によって特定される領域である抽出領域８１〜８３内で、指定文字を認識する。変動パラメータ抽出部４３は、指定文字以外に抽出領域８１〜８３に存在する文字を変動パラメータとして抽出する。 In addition, the multifunction machine 1 includes a template recognition unit 41 and a variation parameter extraction unit 43. The template recognition unit 41 recognizes a designated character in the extraction areas 81 to 83 that are areas specified by coordinates in the document image 80. The variation parameter extraction unit 43 extracts characters existing in the extraction areas 81 to 83 as variation parameters other than the designated character.

このように、上述の座標および指定文字に基づいて、原稿画像上で、文字認識を行って変動パラメータを取得すべき領域が特定される。つまり、原稿画像の全体に対して文字認識を行うことなく、特定の領域のみについて文字認識を行うことで、必要な文字列が容易に得られる。 As described above, based on the above-described coordinates and the designated character, a region on which the variation parameter is to be acquired by performing character recognition on the document image is specified. That is, a necessary character string can be easily obtained by performing character recognition on only a specific area without performing character recognition on the entire document image.

複合機１は、抽出領域８１〜８３に指定文字が存在しない場合、より広い領域に対応するように領域位置情報を補正する座標補正部４２をさらに備える。
複合機１は、変動パラメータを原稿画像と関連付けて記憶する文書記憶領域１７２をさらに備える。 The multifunction device 1 further includes a coordinate correction unit 42 that corrects the region position information so as to correspond to a wider region when there is no designated character in the extraction regions 81 to 83.
The multi-function device 1 further includes a document storage area 172 for storing the variation parameter in association with the document image.

複合機１において、操作パネル１６は、ユーザから検索語の入力を受け付ける。検索部４４は、検索語と一致する変動パラメータと関連付けられた原稿画像を、文書記憶部内の原稿画像から選択する。 In the multi function device 1, the operation panel 16 receives an input of a search word from the user. The search unit 44 selects a document image associated with the variation parameter that matches the search word from the document images in the document storage unit.

本発明は、画像読取機能を有する装置、例えばファクシミリ装置、スキャナ、複合機等に利用可能である。 The present invention can be used for an apparatus having an image reading function, for example, a facsimile machine, a scanner, a multifunction machine, and the like.

１複合機
１２モデム
１３ネットワークインターフェース
１４画像読取部
１５画像形成部
１６操作パネル
１７記憶部
１８制御装置
２０文字認識部
３１テンプレート取得部（領域位置情報取得部、指定文字取得部、相対位置情報取得部）
４１テンプレート認識部
４２座標補正部
４３変動パラメータ抽出部
４４検索部
５０テンプレート原稿
５１第１指定領域
５２第２指定領域
５３第３指定領域
５４第１テンプレート領域
５５第２テンプレート領域
５６第３テンプレート領域
６０仮テンプレート
７０テンプレート
７１テンプレート
７２テンプレート
１０２インターネット
１６１タッチパネル（検索受付部）
１６２ハードキー
１７１テンプレート記憶領域（テンプレート記憶部）
１７２文書記憶領域 DESCRIPTION OF SYMBOLS 1 Multifunction machine 12 Modem 13 Network interface 14 Image reading part 15 Image forming part 16 Operation panel 17 Memory | storage part 18 Control apparatus 20 Character recognition part 31 Template acquisition part (Area position information acquisition part, designated character acquisition part, relative position information acquisition part )
41 Template recognition unit 42 Coordinate correction unit 43 Fluctuation parameter extraction unit 44 Search unit 50 Template document 51 First designated region 52 Second designated region 53 Third designated region 54 First template region 55 Second template region 56 Third template region 60 Temporary template 70 Template 71 Template 72 Template 102 Internet 161 Touch panel (search reception part)
162 Hard key 171 Template storage area (template storage unit)
172 Document storage area

Claims

A document processing apparatus that recognizes, as a variation parameter, a character specified based on a specified character in an area specified by area position information in an image obtained from a document.
An image reading unit that reads an image on a document as a template image;
In the template image, an area position information acquisition unit that acquires a position of a specified area having a predetermined characteristic as the area position information;
A designated character obtaining unit for obtaining the designated character from a character string included in the designated region;
A template storage unit for storing the designated character in association with the region position information;
A document processing apparatus comprising:

A relative position information acquisition unit that acquires relative position information representing a relative position of the variation parameter with respect to the designated character;
The document processing apparatus according to claim 1, wherein the template storage unit stores the relative position information in association with the designated character.

The document processing apparatus according to claim 1, wherein the area position information acquisition unit acquires the area position information by recognizing an area indicating a specific color as the designated area.

A document processing apparatus that recognizes, as a variation parameter, a character specified based on a specified character in an area specified by area position information in an image obtained from a document.
An image reading unit that reads an image on a document as a document image;
A template storage unit that associates and stores the designated character and the region position information;
A template recognizing unit that recognizes the designated character in an extraction area that is an area specified by the area position information in the document image;
A variation parameter extraction unit that extracts characters existing in the extraction region other than the designated character as a variation parameter;
A document processing apparatus comprising:

The document processing apparatus according to claim 4, further comprising an area position information correction unit that corrects the area position information so as to correspond to a wider area when the designated character does not exist in the extraction area.

The document processing apparatus according to claim 4, further comprising a document storage unit that stores the variation parameter in association with the document image.

A search reception unit that receives input of search terms from the user;
A search unit that selects the document image associated with the variation parameter that matches the search term from the document image in the document storage unit;
The document processing apparatus according to claim 6, further comprising:

A document processing apparatus that recognizes, as a variation parameter, a character specified based on a specified character in an area specified by area position information in an image obtained from a document.
An image reading unit that reads an image on a document as a template image or a document image;
In the template image, an area position information acquisition unit that acquires the position of a specified area having a predetermined feature as area position information;
A designated character obtaining unit for obtaining a designated character from a character string included in the designated region;
A template storage unit for storing the designated character in association with the region position information;
A template recognizing unit that recognizes the designated character in an extraction area that is an area specified by the area position information in the document image;
A variation parameter extraction unit that extracts characters existing in the extraction region other than the designated character as a variation parameter;
A document processing apparatus comprising: