JP2016111482A

JP2016111482A - Image processing device and control method of image processing device

Info

Publication number: JP2016111482A
Application number: JP2014246330A
Authority: JP
Inventors: 嘉建水野; Yoshitake Mizuno
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2014-12-04
Filing date: 2014-12-04
Publication date: 2016-06-20

Abstract

PROBLEM TO BE SOLVED: To provide a method of digitizing a manuscript by putting to a proper named single folder or digitizing the manuscript by dividing and putting into a plurality of proper named folders, in the case where a manuscript set of a plurality of pages is digitized.SOLUTION: An image processing device of the present invention for solving the above problem, comprises: input means of inputting image data obtained by reading a manuscript set formed by a plurality of pages by a plurality of sets; display control means of displaying a character string that is determined so that the character string is matched in a position common in the character string included in first image data corresponding to a first manuscript set and the character string included in second image data corresponding to a second manuscript set in the inputted image data or the character string that is determined so that the character string is not matched in such position in a different display form; and control means of controlling so as to store the first image data to a storing part by using the character string designated by the displayed character string.SELECTED DRAWING: Figure 17

Description

本発明は、画像データを用いて電子文書を生成する画像処理装置および画像処理装置の制御方法に関するものである。 The present invention relates to an image processing apparatus that generates an electronic document using image data and a control method for the image processing apparatus.

画像処理装置で、紙の原稿を読み取ることにより得られる画像データを、装置内部の記憶部に電子化して保存することができる。従来、このように電子化された画像データの保存を行う場合、装置固有のヘッダ情報や文字列に、通し番号を付与した文字列を、電子化された画像データのファイルのファイル名や、このファイルの保存先であるフォルダのフォルダ名に適用することが多い。このため、画像処理装置の使用者が、保存対象とするフォルダあるいはファイルに対して、任意のフォルダ名やファイル名を指定したい場合、装置の操作パネル等から、ファイル名や、フォルダ名の入力を必要とする。 Image data obtained by reading a paper document with an image processing apparatus can be digitized and stored in a storage unit inside the apparatus. Conventionally, when storing digitized image data in this way, a character string in which a serial number is added to the header information and character string unique to the device, the file name of the digitized image data file, and this file It is often applied to the folder name of the folder where the file is saved. For this reason, when the user of the image processing apparatus wants to specify an arbitrary folder name or file name for the folder or file to be saved, input the file name or folder name from the operation panel of the apparatus. I need.

そこで特許文献１は、画像処理装置が読み取った画像データを電子化して保存する場合、電子化された画像データが保存されるフォルダ名称或いはファイル名称と原稿画像データとの相関性を高め、使用者の利便性を向上させている。この方法は、原稿の所定位置に文字、文字列を認識出来た場合は認識した文字、あるいは文字列をファイル名とし、認識できなかった場合は読み取った日時をファイル名とすることが示されている。 Therefore, in Patent Document 1, when the image data read by the image processing apparatus is digitized and stored, the correlation between the folder name or file name in which the digitized image data is stored and the document image data is improved, and the user Has improved convenience. This method indicates that if a character or character string can be recognized at a predetermined position on the document, the recognized character or character string is used as the file name, and if it cannot be recognized, the read date and time is used as the file name. Yes.

特開２００５−５６３１５号公報JP 2005-56315 A

しかしながら、特許文献１に示されている方法では、認識された文字、あるいは文字列が使用者の指定したいフォルダ名やファイル名と合致しない場合もある。また、読み取った日時がフォルダ名やファイル名となった場合は、原稿を読み取ることで得られた画像データの内容とこの画像データを保存する際のファイルの名称やこのファイルの保存先であるフォルダの名称との相関がとり辛い。 However, in the method disclosed in Patent Document 1, the recognized character or character string may not match the folder name or file name that the user wants to specify. Also, if the scanned date / time becomes a folder name or file name, the content of the image data obtained by scanning the document, the name of the file when saving this image data, and the folder where this file is saved Correlation with the name of is difficult.

また、入力された画像データの保存時に、使用者がどのような文字列がフォルダおよびファイルに対する名称を指定すれば、この名称と入力された画像データの内容との相関が取れるのかわかりにくい場合がある。 In addition, when saving the input image data, it may be difficult to know what character string the user specifies the name for the folder and file so that the correlation between this name and the content of the input image data can be obtained. is there.

本発明は、前記課題を解決するためのものであり、複数ページからなる原稿のセットを複数セット読み取ることで得られた画像データを入力する入力手段と、前記入力手段により入力された画像データのうち、第１の原稿のセットに対応する第１の画像データに含まれる文字列と第２の原稿のセットに対応する第２の画像データに含まれる文字列とをページ毎に比較し、前記第１の画像データに含まれる文字列と前記第２の画像データに含まれる文字列とで共通する位置にて一致と判定された文字列と、一致と判定されなかった文字列とを異なる表示形態にて表示する表示制御手段と、前記表示制御手段により表示された文字列から指定された文字列を用いて前記第１の画像データを記憶部に記憶するように制御する制御手段と、を有することを特徴とする。 The present invention is to solve the above-described problem, and includes an input unit for inputting image data obtained by reading a plurality of sets of originals composed of a plurality of pages, and an image data input by the input unit. Of these, the character string included in the first image data corresponding to the first set of originals is compared with the character string included in the second image data corresponding to the second set of originals for each page, and A character string determined to match at a position common to the character string included in the first image data and the character string included in the second image data, and a character string not determined to match are displayed differently. Display control means for displaying in a form, and control means for controlling to store the first image data in a storage unit using a character string designated from the character string displayed by the display control means, Have The features.

本発明によれば、入力画像データの内容とこの入力画像データを保存する際のフォルダまたはファイルの名称との相関性を高めることが容易になる。この結果、画像データの電子化作業の効率化、および電子化後の認識性を向上させることが可能となり、電子化された画像データの管理が容易になる。 According to the present invention, it becomes easy to increase the correlation between the contents of input image data and the names of folders or files when the input image data is stored. As a result, it is possible to improve the efficiency of the digitization of the image data and to improve the recognition after the digitization, and the management of the digitized image data is facilitated.

実施形態１の構成を示す図。FIG. 3 is a diagram illustrating a configuration of the first embodiment. 文字判定部の構成図。The block diagram of a character determination part. 操作パネル部の構成図。The block diagram of an operation panel part. 文字判定結果の一例を示す図。The figure which shows an example of a character determination result. 入力画像データの一例。An example of input image data. 本実施形態１における表示形態の一例。An example of the display form in this Embodiment 1. FIG. 本実施形態１におけるフォルダ生成例。An example of folder generation in the first embodiment. 第１実施例の制御フロー。The control flow of 1st Example. 第２実施例の構成を示す図。The figure which shows the structure of 2nd Example. ファイル生成例。File generation example. 第２実施例の制御フロー。The control flow of 2nd Example. 第３実施例の制御フロー。The control flow of 3rd Example. 第３実施例の表示形態の一例。An example of the display form of 3rd Example. ページ選択部の構成を示す図。The figure which shows the structure of a page selection part. 入力画像のイメージを示す図。The figure which shows the image of an input image. 入力画像のイメージを示す図。The figure which shows the image of an input image. ページ選択部の制御フロー。Control flow of page selection part.

［第１実施例］
以下、本発明を実施するための形態について図面を用いて説明する。図１は、本実施例を実施するために必要な画像処理装置の構成を示す図である。本画像処理装置は、少なくとも主制御部４０を中心に、画像入力部１０、記憶部２０、操作パネル部３０により構成される。同図において、画像入力部１０は、紙の原稿を光学的に読み取ることで得られる画像データ、あるいは図示しない通信網を経由して入力される画像データを受ける。記憶部２０は、画像入力部１０により入力される画像データの記憶、あるいは主制御部４０の制御情報の他、過去に入力された画像データおよびそれを解析し生成された特徴量等中間データを記憶する。 [First embodiment]
Hereinafter, embodiments for carrying out the present invention will be described with reference to the drawings. FIG. 1 is a diagram showing a configuration of an image processing apparatus necessary for implementing this embodiment. The image processing apparatus includes at least a main control unit 40 and an image input unit 10, a storage unit 20, and an operation panel unit 30. In the figure, an image input unit 10 receives image data obtained by optically reading a paper document or image data input via a communication network (not shown). The storage unit 20 stores image data input by the image input unit 10 or control information of the main control unit 40, as well as image data input in the past and intermediate data such as feature amounts generated by analyzing the data. Remember.

なお、この記憶部２０は画像処理装置内に構成されていてもよいし、画像処理装置に接続可能な別の装置に構成されていてもよい。 The storage unit 20 may be configured in the image processing apparatus, or may be configured in another apparatus that can be connected to the image processing apparatus.

操作パネル部３０は、本画像処理装置に対して動作指示を行う入力部と、動作状態を表示する表示部により構成される。図３に操作パネル３０の詳細構成を示す。図３において、表示部３０１は、例えばタッチパネル等により構成される。テンキー３０２は操作に係る設定項目等を設定し、スタートキー３０３は本画像処理装置に対しての動作起動の指示を受ける際に押下される。ストップキー３０４は本画像処理装置の動作の中止を指示するためのキーである。リセットキー３０５はテンキー３０２による設定の初期化に用いられる。動作モード設定キー３０６は本画像処理装置の動作モードの設定を指示するためのキーである。なお、これらの各キーはハードキーに限定されるものではなく、表示部３０１に構成したソフトキーであっても良い。 The operation panel unit 30 includes an input unit that gives an operation instruction to the image processing apparatus and a display unit that displays an operation state. FIG. 3 shows a detailed configuration of the operation panel 30. In FIG. 3, the display unit 301 is configured by a touch panel, for example. The numeric keypad 302 sets setting items related to the operation, and the start key 303 is pressed when receiving an operation start instruction to the image processing apparatus. A stop key 304 is a key for instructing to stop the operation of the image processing apparatus. A reset key 305 is used for initialization of settings by the numeric keypad 302. An operation mode setting key 306 is a key for instructing setting of an operation mode of the image processing apparatus. Note that these keys are not limited to hard keys, and may be soft keys configured on the display unit 301.

図１の主制御部４０は以下を有する。すなわち、ＣＰＵ、ＣＰＵの起動プログラムを格納したＲＯＭ、このプログラムの実行領域としてのＲＡＭ、過去に入力された画像データから生成された中間データ保存用のストレージとしてＨＤＤやＳＳＤ、入出力インターフェース（いずれも不図示）を有する。そしてこの主制御部４０は、画像入力部１０、記憶部２０、操作パネル部３０を制御する。また、主制御部４０には、画像入力部１０から入力される画像データ中に含まれる文字を判定する文字判定部４０１、入力された画像データのレイアウト位置をふまえた一致箇所、あるいは相違箇所を判定する一致判定部４０２を含む。また、主制御部４０は、一致判定部４０２による判定結果に応じ、操作パネル部３０に構成した表示部３０１に表示する情報を制御する表示制御部４０３と、表示制御部４０３により表示した内容の選択に応じてフォルダを生成するフォルダ生成部４０４を含む。 The main control unit 40 of FIG. That is, a CPU, a ROM storing a CPU startup program, a RAM as an execution area of the program, a storage for storing intermediate data generated from image data input in the past, an HDD, an SSD, and an input / output interface (all (Not shown). The main control unit 40 controls the image input unit 10, the storage unit 20, and the operation panel unit 30. Further, the main control unit 40 includes a character determination unit 401 that determines characters included in the image data input from the image input unit 10, and matches or different points based on the layout position of the input image data. A match determination unit 402 for determination is included. In addition, the main control unit 40 controls the information displayed on the display unit 301 configured on the operation panel unit 30 according to the determination result by the coincidence determination unit 402, and the content displayed by the display control unit 403. A folder generation unit 404 that generates a folder according to the selection is included.

さらに、主制御部４０は、画像入力部１０から入力された画像データのうち、一致箇所あるいは不一致箇所の判定を実施するページを選択するページ選択部４０６を含む。 Further, the main control unit 40 includes a page selection unit 406 that selects a page on which the matching portion or the mismatching portion is determined from the image data input from the image input unit 10.

図２に文字判定部４０１の内部構成を示す。文字判定部４０１は、主にＯＣＲ（ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅａｄｅｒ）をもとに構成される。ＯＣＲとは、光学的文字認識のことである。入力された画像データが、紙原稿を光学的に読み取ることで得られるデータである場合、入力された画像データから切り出したデータと、事前に記憶されたパターンとの照合によって、文字を特定し、テキストデータに変換出力するものである。したがって、文字判定部４０１は、ＯＣＲを構成する各処理部であるレイアウト解析部４０１１、切り出し部４０１２、特徴抽出部４０１３、照合部４０１４、辞書部４０１５、一時記憶部４０１６により構成される。なお、入力された画像データが通信網を経由して得られる画像データである場合、画像データに含まれるテキストデータを分析するテキストデータ分析部（不図示）が構成されていればよい。 FIG. 2 shows an internal configuration of the character determination unit 401. The character determination unit 401 is mainly configured based on an OCR (Optical Character Reader). OCR is optical character recognition. When the input image data is data obtained by optically reading a paper document, the character is identified by comparing the data cut out from the input image data with a pattern stored in advance, This is converted to text data and output. Therefore, the character determination unit 401 includes a layout analysis unit 4011, a cutout unit 4012, a feature extraction unit 4013, a collation unit 4014, a dictionary unit 4015, and a temporary storage unit 4016, which are processing units that constitute the OCR. When the input image data is image data obtained via a communication network, a text data analysis unit (not shown) that analyzes text data included in the image data may be configured.

レイアウト解析部４０１１は、画像入力部１０から入力される画像データを、文字領域と画像領域に分離し、文字領域のかたまり配置の解析、および文字認識する順番を決定する。同一形態で構成される画像データを複数ページ入力した場合、それぞれページの画像データにおいて構成される文字領域および画像領域の配置結果が同一であるものとレイアウト解析部４０１１により解析される。よって、それぞれの画像データにおいて文字認識される順番は同一になる。 The layout analysis unit 4011 separates the image data input from the image input unit 10 into a character area and an image area, analyzes the cluster arrangement of the character area, and determines the character recognition order. When a plurality of pages of image data configured in the same form are input, the layout analysis unit 4011 analyzes that the arrangement result of the character area and the image area configured in the image data of each page is the same. Therefore, the order of character recognition in each image data is the same.

つまり、レイアウト解析部４０１１の解析結果に準じて、以降の文字データの変換を行えば、複数の原稿データ間の文字、文字列の一致箇所、相違箇所の判定が出来る。 In other words, according to the analysis result of the layout analysis unit 4011, if the subsequent character data conversion is performed, it is possible to determine a matching portion and a different portion of characters and character strings between a plurality of document data.

切り出し部４０１２は、レイアウト解析部４０１１で検出した文字領域のかたまりを、まず１行毎に分割し、分割した１行の文字領域を、更に１文字ずつに分解する。特徴抽出部４０１３は、分割された文字が持つ特徴、例えば縦方向、横方向、斜め方向にどのような線で構成されているかの抽出を行う。照合部４０１４は、特徴抽出部４０１３により抽出した特徴を辞書部４０１５に記憶した情報と照合し、文字データをテキストデータに変換する。照合部４０１４の照合により決定したテキストデータは、一時記憶部４０１６に記憶する。 The cutout unit 4012 first divides the chunk of the character area detected by the layout analysis unit 4011 into one line, and further decomposes the divided character area into one character. The feature extraction unit 4013 extracts features of the divided characters, for example, what lines are formed in the vertical direction, the horizontal direction, and the diagonal direction. The collation unit 4014 collates the feature extracted by the feature extraction unit 4013 with the information stored in the dictionary unit 4015, and converts the character data into text data. The text data determined by the collation by the collation unit 4014 is stored in the temporary storage unit 4016.

なお、入力画像データが通信網を経由して得られる画像データである場合、すでに画像データにテキストデータが含まれているため、このテキストデータを一時記憶部４０１６に記憶する。このようにして求められたレイアウトの情報や文字領域の座標、文字数や行数の情報に加え、画像のサイズや向き（例えばＡ４の横向き等）の情報も併せて一時記憶部４０１６に特徴量として記憶する。そしてこの特徴量を、フォーマット照合部（不図示）で記憶されている別の文章との照合に用いる。なおこの画像のサイズは縦横の画素数およびデータ解像度（例えばｄｐｉ（ｄｏｔｐｅｒｉｎｃｈ））から、また向きは文字認識結果の向きから解析が可能である。 Note that when the input image data is image data obtained via a communication network, the text data is already included in the image data, and thus the text data is stored in the temporary storage unit 4016. In addition to the layout information, the character area coordinates, the number of characters and the number of lines obtained in this way, information on the size and orientation of the image (for example, A4 landscape orientation, etc.) is also stored in the temporary storage unit 4016 as a feature quantity. Remember. And this feature-value is used for collation with another sentence memorize | stored in the format collation part (not shown). The size of this image can be analyzed from the number of vertical and horizontal pixels and data resolution (for example, dpi (dot per inch)), and the direction can be analyzed from the direction of the character recognition result.

別図を用いて、一時記憶部４０１６へのテキストデータの記憶構成に関して説明する。 A storage configuration of text data in the temporary storage unit 4016 will be described with reference to another drawing.

図４（ａ）は取得された画像データの一例を示すものである。レイアウト解析部４０１１の解析により、図４（ａ）に示す取得された画像データ中の文字領域は、図中の破線（１）〜（１３）に示すレイアウトとして解析される。切り出し部４０１２は、文字のかたまり番号の小さい方から順に１行ずつ切り出し、さらに１文字ずつ切り出す。切り出した文字データは、特徴抽出部４０１３により文字の特徴が抽出され、照合部４０１４による辞書部４０１５の情報との照合によりテキストデータに変換される。このとき、記憶部４０１６に記憶するテキストデータの構成を、変換されたテキストデータのブランク領域（白データ領域）で挟まれた１行構成とすると、記憶部４０１６には、図４（ｂ）に示すように記憶される。 FIG. 4A shows an example of the acquired image data. By the analysis of the layout analysis unit 4011, the character region in the acquired image data shown in FIG. 4A is analyzed as the layout shown by the broken lines (1) to (13) in the drawing. The cutout unit 4012 cuts out one line at a time in order from the smallest character group number, and cuts out one character at a time. Character features of the extracted character data are extracted by the feature extraction unit 4013 and converted to text data by collation with information in the dictionary unit 4015 by the collation unit 4014. At this time, if the configuration of the text data stored in the storage unit 4016 is a one-line configuration sandwiched between blank regions (white data regions) of the converted text data, the storage unit 4016 has the configuration shown in FIG. Stored as shown.

主制御部４０は、１枚の原稿を読み取ることで得られた画像データの特徴量解析およびテキストデータへ変換する処理が完了すると、一時記憶部４０１６に記憶した特徴量とテキストデータを含めた、中間データ（テンポラリのファイルデータ等）を生成する。次に、主制御部４０は、生成された中間データを、一時記憶部４０１６から読み取った特徴量とテキストデータと関連付けした状態で記憶部２０に記憶する。 When the main control unit 40 completes the feature amount analysis of the image data obtained by reading one document and the conversion to the text data, the main control unit 40 includes the feature amount and text data stored in the temporary storage unit 4016. Intermediate data (such as temporary file data) is generated. Next, the main control unit 40 stores the generated intermediate data in the storage unit 20 in a state associated with the feature amount read from the temporary storage unit 4016 and the text data.

主制御部４０の動作に関して、画像データの例を挙げて説明する。 The operation of the main control unit 40 will be described with an example of image data.

図５（ａ）〜図５（ｃ）は画像入力部１０に入力される画像データの一例である。いずれも、帳票データ、伝票データに見られる同一文書形態（同一レイアウト構成）を読み取ることで得られ、記載内容の一部が異なるものとなっている。例えば、『御請求書』の文字列、請求先の後の『御中』の文字列、請求元の会社名、住所などはいずれも同じものとなっている。一方、請求先への金額や、担当欄の内容は各々異なったものとなっている。 FIG. 5A to FIG. 5C are examples of image data input to the image input unit 10. Both are obtained by reading the same document form (same layout configuration) found in the form data and slip data, and some of the description contents are different. For example, the character string of “Invoice”, the character string of “Gochu” after the billing party, the company name and address of the billing source are all the same. On the other hand, the amount to the billing destination and the contents of the charge column are different.

図５（ａ）〜図５（ｃ）の画像データから抽出され記憶されているテキストデータに対し、文字判定部４０１によって行われた判定結果をそれぞれ図５（ｄ）〜図５（ｆ）に示す。主制御部４０は、図５（ｄ）〜図５（ｆ）のテキストデータと、図５（ａ）〜図５（ｃ）の画像データとその特徴量を示した中間データを記憶部２０に記憶する。なお、本実施例では、画像入力部１０に入力する図５（ａ）〜図５（ｃ）から抽出された文字データを全てテキストデータに変換した後に、一時記憶部４０１６からテキストデータを読み出し、記憶部２０に記憶するものとした。しかし、主制御部４０による記憶部２０への制御はこれに限定されるものではない。すなわち、画像入力部１０に入力される画像データ毎に、一時記憶部４０１６に記憶されたテキストデータを記憶部２０に記憶するように制御しても良い。 The determination results performed by the character determination unit 401 on the text data extracted and stored from the image data in FIGS. 5A to 5C are shown in FIGS. Show. The main control unit 40 stores the text data shown in FIGS. 5D to 5F and the image data shown in FIGS. 5A to 5C and intermediate data indicating the feature values in the storage unit 20. Remember. In this embodiment, after all the character data extracted from FIG. 5A to FIG. 5C input to the image input unit 10 is converted to text data, the text data is read from the temporary storage unit 4016, The data is stored in the storage unit 20. However, the control of the storage unit 20 by the main control unit 40 is not limited to this. That is, control may be performed so that the text data stored in the temporary storage unit 4016 is stored in the storage unit 20 for each image data input to the image input unit 10.

主制御部４０は、画像入力部１０の全ての入力画像データから抽出したテキストデータ、および電子化した中間ファイルを記憶部２０に記憶させると、一致判定制御を行う。主制御部４０による一致判定制御は、一致判定部４０２により、記憶部２０に記憶した各入力画像データから抽出したテキストデータ同士の比較により実施する。なお、記憶部２０に記憶したテキストデータの順番は、文字判定部４０１のレイアウト解析部４０１１の判定結果に基づくため、同じテキストデータであっても、異なる順番に記憶されているテキストデータとの一致判定は行われない。つまり、入力画像データから抽出したテキストデータの順番は、入力画像データの位置情報を含むものとなる。したがって、同じテキストデータが抽出されたとしても、抽出順番が異なっていれば、別の位置に存在するテキストデータと判断出来る。このため、一致判定は、記憶部２０に記憶したテキストデータの内容と順番（位置情報）に基づいて行う。なお、本説明においては、入力画像データから抽出したテキストデータの位置情報を、テキストデータを抽出した順番に基づくものとしたが、本実施形態はこれに限定されるものではない。例えば、入力画像データ中の抽出したテキストデータの位置情報を別途作成する形態とし、抽出したテキストデータと、その位置情報を記憶部２０に対応づけて記憶する。そして、一致判定部４０２による一致判定、あるいは不一致判定を、抽出したテキストデータと位置情報の２つの情報を用いて行う形態としても良い。 When the main control unit 40 stores the text data extracted from all the input image data of the image input unit 10 and the digitized intermediate file in the storage unit 20, the main control unit 40 performs matching determination control. The coincidence determination control by the main control unit 40 is performed by comparing the text data extracted from each input image data stored in the storage unit 20 by the coincidence determination unit 402. In addition, since the order of the text data memorize | stored in the memory | storage part 20 is based on the determination result of the layout analysis part 4011 of the character determination part 401, even if it is the same text data, it corresponds with the text data memorize | stored in a different order. No judgment is made. That is, the order of the text data extracted from the input image data includes the position information of the input image data. Therefore, even if the same text data is extracted, if the extraction order is different, it can be determined that the text data exists in another position. Therefore, the coincidence determination is performed based on the contents and order (position information) of the text data stored in the storage unit 20. In this description, the position information of the text data extracted from the input image data is based on the order in which the text data is extracted. However, the present embodiment is not limited to this. For example, the position information of the extracted text data in the input image data is separately created, and the extracted text data and the position information are stored in association with the storage unit 20. And it is good also as a form which performs the coincidence determination by the coincidence determination part 402, or a disagreement determination using two information of the extracted text data and position information.

次に一致判定、不一致判定の詳細に関して説明する。図５（ａ）の入力画像データを基準として、この入力画像データから抽出されたテキストデータ判定結果である図５（ｄ）に示した範囲５０１、範囲５０３および範囲５０５に該当するテキストデータに着目する。範囲５０１、範囲５０３および範囲５０５に該当するテキストデータは、入力画像データ（ｅ）、（ｆ）に含まれる範囲５０１、範囲５０３、範囲５０５に該当するテキストデータと一致している。よって、図５（ａ）、図５（ｂ）、図５（ｃ）のそれぞれの入力画像データから抽出されるテキストデータの範囲５０１、範囲５０３、範囲５０５に対応する箇所は一致していると判定出来る。一方、図５（ｄ）に示した範囲５０２、範囲５０４および範囲５０６に該当するテキストデータは、入力画像データ図５（ｅ）、図５（ｆ）に含まれる範囲５０２、範囲５０４、範囲５０６に該当するテキストデータと一致していない。よって、よって、図５（ａ）、図５（ｂ）、図５（ｃ）のそれぞれの入力画像データから抽出されるテキストデータの範囲５０２、範囲５０４、範囲５０６に対応する箇所は一致していないと判定出来る。 Next, details of match determination and mismatch determination will be described. Focusing on the text data corresponding to the range 501, range 503, and range 505 shown in FIG. 5D, which is the text data determination result extracted from the input image data with reference to the input image data in FIG. 5A. To do. The text data corresponding to the ranges 501, 503, and 505 matches the text data corresponding to the ranges 501, 503, and 505 included in the input image data (e) and (f). Therefore, the portions corresponding to the ranges 501, 503, and 505 of the text data extracted from the input image data in FIGS. 5A, 5B, and 5C are the same. Can be judged. On the other hand, the text data corresponding to the range 502, range 504, and range 506 shown in FIG. 5D is input image data in the range 502, range 504, and range 506 included in FIGS. 5E and 5F. Does not match the text data corresponding to. Therefore, the portions corresponding to the ranges 502, 504, and 506 of the text data extracted from the input image data in FIGS. 5A, 5B, and 5C are the same. It can be determined that there is no.

なお、一致判定部４０２による一致判定は、記憶部２０に記憶されたテキストデータとの完全一致でなくても良い。ＯＣＲによる文字データのテキストデータへの変換は、特徴抽出部４０１３により抽出された文字データの特徴と辞書部４０１５に記憶されているデータとの照合に基づくため、文字１つ１つに対しては誤判定を伴うことがある。したがって、文字列に対する一致判定数等に基づいて最終的な一致判定を行うようにしても良い。例えば、１０文字分のテキストデータに対し、８文字分のテキストデータが一致していた場合、その文字列を一致と判断するように制御しても良い。 Note that the match determination by the match determination unit 402 may not be complete match with the text data stored in the storage unit 20. The conversion of character data into text data by OCR is based on collation between the characteristics of the character data extracted by the feature extraction unit 4013 and the data stored in the dictionary unit 4015. May be accompanied by misjudgment. Therefore, the final matching determination may be performed based on the number of matching determinations for the character string. For example, when text data for 8 characters matches text data for 10 characters, control may be performed so that the character string is determined to be matched.

主制御部４０は、一致判定部４０２による一致判定制御が終わると、表示制御部４０３による表示制御を行う。表示制御は、画像入力部１０に入力された画像データを電子化することにより得た中間ファイルデータに対し、一致判定部４０２による一致判定結果を反映させる。図６（ａ）は、図５（ａ）の入力画像データを表示対象とした場合の操作パネル部３０の表示状態を示す図である。 When the match determination control by the match determination unit 402 ends, the main control unit 40 performs display control by the display control unit 403. The display control reflects the match determination result by the match determination unit 402 to the intermediate file data obtained by digitizing the image data input to the image input unit 10. FIG. 6A is a diagram illustrating a display state of the operation panel unit 30 when the input image data of FIG. 5A is a display target.

図６（ａ）において、一致判定部４０２により、図５（ａ）〜図５（ｃ）の読取られた複数の帳票の中で各帳票に共通する位置に対して記載されている内容が全ての帳票にて一致していると判定された文字列を符号６０１で示す網掛けで表示する。一方、図５（ａ）〜図５（ｃ）の読取られた複数の帳票の中で各帳票に共通する位置に対して記載されている内容が全ての帳票にて一致しない（不一致）と判定された文字列を符号６０２で示す網掛けで表示している。なお、一致および不一致の表示方法はこの表示形態に限定されるものではなく、一致した文字列と不一致の文字列とが判別出来るものであれば良い。例えば、操作パネル部３０に構成した表示部３０１が、カラー表示可能なものであれば、一致した文字列と不一致の文字列を色分けで表示しても良い。また、表示部３０１が単色のみ表示可能なものであれば、点灯、点滅のような表示形態としても良い。 In FIG. 6A, all the contents described for the positions common to each form among the plurality of read forms in FIGS. 5A to 5C by the coincidence determination unit 402 are all. The character string determined to match in the form is displayed with shading indicated by reference numeral 601. On the other hand, it is determined that the contents described in the positions common to the respective forms among the plurality of read forms in FIGS. 5A to 5C do not match (non-match) in all the forms. The character string thus obtained is displayed by shading indicated by reference numeral 602. The display method of matching and mismatching is not limited to this display form, and any display method can be used as long as it can distinguish between a matched character string and a mismatched character string. For example, if the display unit 301 configured in the operation panel unit 30 can display in color, the matched character string and the mismatched character string may be displayed in different colors. Further, if the display unit 301 can display only a single color, a display form such as lighting or blinking may be used.

更に、一致した文字列、不一致の文字列の全てを表示する形態でなくても良い。例えば、記憶部２０に、一致、不一致の表示対象とする文字の大きさ（フォントの大きさ）、文字列の長さを事前登録し、登録された内容に該当する一致文字列、不一致文字列のみを判別可能な形態で表示するようにしても良い。図６（ｂ）は、一致文字列、不一致文字列の表示を、所定閾値よりも大きな文字（フォント）、あるいは、文字列の長さを制限した場合の表示例である。まず、所定閾値よりも大きな文字（フォント）としていることで、図６（ａ）の小さい文字列が全て非表示となっている。また、文字列の長さを制限していることで、日付や定型文のような長い文字列が非表示となっている。 Furthermore, it is not necessary to display all the matched character strings and the mismatched character strings. For example, the storage unit 20 pre-registers the character size (font size) and the character string length to be displayed for matching and mismatching, and the matching character string and the mismatching character string corresponding to the registered contents. May be displayed in a form that can be distinguished only. FIG. 6B is a display example when the display of the matched character string and the mismatched character string is limited to a character (font) larger than a predetermined threshold or the length of the character string. First, since the characters (fonts) larger than the predetermined threshold are used, all the small character strings in FIG. 6A are not displayed. In addition, by restricting the length of the character string, long character strings such as dates and fixed phrases are not displayed.

図６（ｃ）は、記憶部２０に、一致、不一致の表示対象として、キーワード、『御請求書』、『株式会社』、『（株）』を登録し、これら登録されたキーワードを含む文字列のみに対して一致、不一致を判別可能な形態で表示する場合の表示状態の例である。このように制御することで、入力画像データの特徴的な一致文字列、不一致文字列のみが表示対象となり、選択時の利便性を向上させることが出来る。 FIG. 6C shows keywords, “invoice”, “stock”, and “(stock)” registered in the storage unit 20 as display items for matching and mismatching, and characters including these registered keywords. It is an example of a display state in the case of displaying in the form which can discriminate | determine a match and mismatch only with respect to a column. By controlling in this way, only the characteristic matched character string and the unmatched character string of the input image data are displayed, and convenience at the time of selection can be improved.

主制御部４０は、表示制御部４０３による表示制御を終了すると、表示部３０１に表示した一致箇所、不一致箇所に対する選択を待つ。そして、以降の主制御部４０による制御は、一致箇所が選択された場合と、不一致箇所が選択された場合とで異なる。なお、一致箇所、不一致箇所の選択方法は、表示部３０１が持つ機能に応じたものとなる。主制御部４０は、表示部３０１がタッチパネルであれば、一致箇所あるいは不一致箇所の押下、および押下された座標位置に応じて制御を切り替える。また、表示部３０１がタッチパネルでない場合、主制御部４０は、操作パネル部３０に構成した不図示のキー操作による一致箇所、不一致箇所の選択結果に応じて制御を切り替える。 When the display control by the display control unit 403 is completed, the main control unit 40 waits for selection of the coincidence portion and the disagreement portion displayed on the display unit 301. The subsequent control by the main control unit 40 differs depending on whether the coincidence portion is selected or not. Note that the method for selecting the coincidence portion and the disagreement portion depends on the function of the display unit 301. If the display unit 301 is a touch panel, the main control unit 40 switches control according to pressing of a coincidence place or a non-coincidence place and the pressed coordinate position. When the display unit 301 is not a touch panel, the main control unit 40 switches control according to the selection result of the coincidence portion and the mismatch portion by the key operation (not shown) configured in the operation panel unit 30.

主制御部４０は、一致箇所が選択された場合、選択された一致判定の文字列を使用した制御を行い、不一致箇所が選択された場合、選択された不一致判定位置に該当する不一致文字列を使用した制御を行う。本実施例では、フォルダ生成部４０４による、入力画像データの電子化結果を保存するフォルダ生成の方法を、一致箇所が選択されたか不一致箇所が選択されたかに応じて切り替える制御について説明する。 The main control unit 40 performs control using the selected character string for matching determination when a matching portion is selected. When the matching portion is selected, the main control unit 40 selects a mismatching character string corresponding to the selected mismatch determination position. Perform the control used. In the present embodiment, a description will be given of control for switching the folder generation method for storing the digitized result of the input image data by the folder generation unit 404 depending on whether a matching part is selected or a mismatching part is selected.

図７（ａ）は、一致判定となった文字列『御請求書』が選択された場合の、フォルダ生成部４０４によるフォルダ生成制御結果である。フォルダ生成部４０４は、読み取った複数の画像データの中で一致判定となった文字列が選択された場合、記憶部２０に生成するフォルダ名称として、選択された文字列を用いる。例えば、選択された一致判定の文字列が『御請求書』であれば、『御請求書』をフォルダ名（７０１）とする。また、一致判定となった文字列が選択された場合、画像入力部１０により入力された電子化後の中間ファイルデータを統合ファイル（１つのファイル）として、生成されたフォルダ内に記憶する。つまり、図５（ａ）、図５（ｂ）、図５（ｃ）の中間ファイルを統合し、選択された一致判定文字列を使用して、生成したフォルダ内に最終的な電子化ファイルを記憶する（７０２）。なお、記憶部２０に記憶するファイル名称に関しても、ファイル名称の一部に、選択された一致判定文字列を用いたものとする。例えば、前記『御請求書』であれば、『御請求書』に通し番号等を付与したファイル名（例：御請求書＿００１）とする。このように制御することで、使用者の意図した文字列を使用したフォルダ名、およびファイル名を容易に生成することが出来る。 FIG. 7A shows a folder generation control result by the folder generation unit 404 when the character string “invoice” that has been determined to match is selected. The folder generation unit 404 uses the selected character string as a folder name to be generated in the storage unit 20 when a character string that has been determined to match is selected from the plurality of read image data. For example, if the selected character string for matching is “invoice”, “invoice” is set as the folder name (701). When the character string that has been determined to match is selected, the intermediate file data after digitization input by the image input unit 10 is stored as an integrated file (one file) in the generated folder. That is, the intermediate files shown in FIGS. 5A, 5B, and 5C are integrated, and the final digitized file is stored in the generated folder using the selected match determination character string. Store (702). In addition, regarding the file name stored in the storage unit 20, the selected match determination character string is used as a part of the file name. For example, in the case of the “invoice”, a file name (eg, invoice_001) in which a serial number or the like is added to the “invoice”. By controlling in this way, it is possible to easily generate a folder name and a file name using a character string intended by the user.

次に、読み取った複数の画像データの中で不一致箇所が選択された場合のフォルダ生成部４０４の制御に関して説明する。図７（ｂ）は、不一致判定となった文字列『株式会社ＡＡＡ』が選択された場合の、フォルダ生成部４０４によるフォルダ生成制御結果である。フォルダ生成部４０４は、不一致判定となった文字列が選択された場合、記憶部２０に生成するフォルダ名称として、選択された文字列位置に存在する各文字列を用いる。本実施例において選択された不一致と判定された文字列は、『株式会社ＡＡＡ』である。その為、各画像データにおける『株式会社ＡＡＡ』の文字列の位置、すなわち、各入力画像データ中の『御中』の前に存在する文字列を、それぞれのフォルダ名（７０３、７０４、７０５）とする。つまり、『株式会社ＡＡＡ』、『ＢＢＢ（株）』、『ＣＣＣ工務店』の名称のフォルダを生成する。また、不一致判定となった文字列が選択された場合、画像入力部１０により入力された電子化後の画像データをそれぞれ別のファイルとして、生成されたフォルダ内に記憶する。 Next, the control of the folder generation unit 404 when a mismatched portion is selected from a plurality of read image data will be described. FIG. 7B shows a folder generation control result by the folder generation unit 404 when the character string “AAA Co., Ltd.” that has been determined to be inconsistent is selected. When a character string that has been determined to be inconsistent is selected, the folder generation unit 404 uses each character string that exists at the selected character string position as a folder name to be generated in the storage unit 20. The character string determined to be inconsistent and selected in the present embodiment is “AAA Corporation”. Therefore, the position of the character string “AAA” in each image data, that is, the character string existing before “Gochu” in each input image data, is set as the respective folder name (703, 704, 705). To do. That is, folders with names “AAA Co., Ltd.”, “BBB Co., Ltd.”, and “CCC Engineering” are generated. When a character string that has been determined to be inconsistent is selected, the digitized image data input by the image input unit 10 is stored as a separate file in the generated folder.

つまり、入力画像データが図５（ａ）であれば、『株式会社ＡＡＡ』の名称のフォルダ内に、図５（ａ）の電子化された中間ファイルデータ（７０６）を記憶する。同様に入力画像データが図５（ｂ）であれば、『ＢＢＢ（株）』の名称のフォルダ内に、図５（ｂ）の電子化された中間ファイルデータ（７０７）を記憶する。入力画像データが図５（ｃ）であれば、『ＣＣＣ工務店』の名称のフォルダ内に、図５（ｃ）の電子化された中間ファイルデータ（７０８）を記憶する。なお、各々作成したフォルダ内のファイル名称に関しても、ファイル名称の一部に、選択された不一致判定位置に存在する各文字列を用いる。例えば、前記『ＡＡＡ株式会社』であれば、『ＡＡＡ株式会社』の名称のフォルダ内に、『ＡＡＡ株式会社』に通し番号等を付与したファイル名（例：ＡＡＡ株式会社＿００１）を記憶する（中間ファイルデータのリネームを行い記憶する）。このように制御することで、使用者の意図した文字列を使用したフォルダ名、およびファイル名それぞれを容易に生成することが出来る。 That is, if the input image data is FIG. 5A, the digitized intermediate file data (706) of FIG. 5A is stored in the folder named “AAA Corporation”. Similarly, if the input image data is FIG. 5B, the computerized intermediate file data (707) of FIG. 5B is stored in the folder named “BBB Co., Ltd.”. If the input image data is FIG. 5C, the digitized intermediate file data (708) of FIG. 5C is stored in the folder named “CCC construction company”. For each file name in the created folder, each character string existing at the selected mismatch determination position is used as part of the file name. For example, in the case of “AAA Co., Ltd.”, a file name (eg, AAA Co., Ltd._001) in which a serial number is assigned to “AAA Co., Ltd.” is stored in the folder named “AAA Co., Ltd.” (intermediate) Rename and store file data). By controlling in this way, the folder name and the file name using the character string intended by the user can be easily generated.

図１４は、ページ選択部４０６の構成を示す図である。ページ選択部４０６は主制御部４０内のＣＰＵによって実行される。入力画像が複数ページで１文書を構成する複数文書である場合、ページ数設定部４０５１にて上記１文書を構成するページ数を設定する。これにより、主制御部４０は、各文書の先頭ページ同士を比較する。図１５は、具体的な入力画像のイメージを示す図である。まず、『株式会社ＡＡＡ』あての請求書１５０１は、料金内訳書１５０２と２枚で１セットの文書である。『ＢＢＢ株式会社』あての請求書１５０３は、料金内訳書１５０４と２枚で１セットの文書である。『ＣＣＣ工務店』あての請求書１５０５は、料金内訳書１５０６と２枚で１セットの文書である。図１５に示すような文書を入力する際に、ページ数設定部４０５１にはユーザー入力により２ページの指定をする。これにより、主制御部４０は、請求書１５０１と請求書１５０３と請求書１５０５とを比較する。そして、これらの文書を格納する際には２ページで１つの文書とし、文書単位で１つのファイルとして格納する。ここでページ選択部４０６は、画像の比較結果に不一致個所が無い場合、文書内の別のページによる比較を実施する。 FIG. 14 is a diagram illustrating a configuration of the page selection unit 406. The page selection unit 406 is executed by the CPU in the main control unit 40. When the input image is a plurality of documents constituting one document with a plurality of pages, the page number setting unit 4051 sets the number of pages constituting the one document. As a result, the main control unit 40 compares the first pages of the documents. FIG. 15 is a diagram illustrating a specific input image. First, an invoice 1501 addressed to “AAA Co., Ltd.” is a set of documents including two fee breakdown documents 1502. An invoice 1503 addressed to “BBB Co., Ltd.” is a set of documents including two fee breakdown documents 1504. The invoice 1505 addressed to the “CCC engineering firm” is a set of documents consisting of two fee breakdown documents 1506. When inputting a document as shown in FIG. 15, the page number setting unit 4051 designates two pages by user input. As a result, the main control unit 40 compares the bill 1501, the bill 1503, and the bill 1505. When these documents are stored, one page is stored as two documents and one document is stored as a file. Here, the page selection unit 406 performs comparison by another page in the document when there is no mismatch in the image comparison result.

図１４の比較ページ位置設定部４０５２は、比較するページの文書内での位置を設定する。主制御部４０は、ページ位置設定部４０５２で設定されたページ位置同士の画像を比較する。図１５に示す文書で２ページ目が指定された場合は、主制御部４０は、請求書１５０２、１５０４、１５０６を比較し、文書単位で１つのファイルとして格納する。 The comparison page position setting unit 4052 in FIG. 14 sets the position of the page to be compared in the document. The main control unit 40 compares the images at the page positions set by the page position setting unit 4052. When the second page is designated in the document shown in FIG. 15, the main control unit 40 compares the invoices 1502, 1504, and 1506, and stores them as one file in units of documents.

このように、２ページで構成される文書に対して、各文書のページ毎に比較をする。 In this way, a document composed of two pages is compared for each page of each document.

図１４の近似判定部４０５３は、入力画像の先頭ページに対して画像の特徴量からその後のページの近似度を算出し、近似ページかそうでないかを判定する。例えば図１５に示すような文書では、先頭の請求書１５０１に近似した１５０３、１５０５を近似ページとして選び出し、比較ページとする。近似判定部４０５３の特徴量算出方法は、線・輪郭線検出、ヒストグラムなどを主制御部４０内のＣＰＵやＡＳＩＣによって実施するが、手法は問わない。また、図１４の閾値設定部４０５４は、先頭ページの特徴量と、その他のページで近似ページと判断する為のユーザーから指定された閾値を設定する。更にページ選択部４０６は、近似ページだけではなく文書内の最終頁も一致・不一致の比較対象とする。 The approximate determination unit 4053 in FIG. 14 calculates the degree of approximation of the subsequent page from the feature amount of the image with respect to the first page of the input image, and determines whether the page is an approximate page. For example, in a document as shown in FIG. 15, 1503 and 1505 approximate to the first invoice 1501 are selected as approximate pages and set as comparison pages. As the feature amount calculation method of the approximate determination unit 4053, line / contour detection, histogram, and the like are performed by the CPU or ASIC in the main control unit 40, but any method may be used. Further, the threshold value setting unit 4054 in FIG. 14 sets the feature amount of the first page and the threshold value designated by the user for determining that the other pages are approximate pages. Further, the page selection unit 406 sets not only the approximate page but also the last page in the document as a comparison target for matching / mismatching.

図１６は、具体的な入力画像のイメージを示す２つ目の図である。『株式会社ＡＡＡ』あての請求書１６０１は、料金内訳書１６０２・１６０３と３枚で１セットの文書である。『ＢＢＢ株式会社』あての請求書１６０４は１枚で１つの文書である。『ＣＣＣ工務店』あての請求書１６０５は、料金内訳書１６０６と２枚で１セットの文書である。図１６に示すような文書を入力する際に、ページ数設定部４０５１は近似判定部４０５３によって判定された対象ページ１６０１、１６０４、１６０５を設定する。主制御部４０は、請求書１６０１、１６０４、１６０５を比較し、文書単位で１つのファイルとして格納する。ここでページ選択部４０６は、１６０１〜１６０３を１文書、１６０４を１文書、１６０５・１６０６を１文書と判定する。ここでページ選択部４０６は、各文書の最終ページに定型文書があることが考えられる為、ユーザーの指定にしたがって最終ページ同士の比較も実施する。しかし文書の最終ページ１６０３、１６０６の合計が比較したい対象であるケースのように、比較したい対象の位置が画像上の同じ位置にならない場合が考えられる。ここでページ選択部４０６は、最終ページ画像の後端余白よりも前の原稿部分を後端余白の生じる部分の位置に合せて比較を実施する。 FIG. 16 is a second diagram showing a specific input image. An invoice 1601 addressed to “AAA Co., Ltd.” is a set of three documents, fee breakdown statements 1602 and 1603. One bill 1604 addressed to “BBB Corporation” is one document. The invoice 1605 addressed to the “CCC engineering shop” is a set of documents including two fee breakdown documents 1606. When a document as shown in FIG. 16 is input, the page number setting unit 4051 sets the target pages 1601, 1604, and 1605 determined by the approximation determination unit 4053. The main control unit 40 compares the invoices 1601, 1604, and 1605, and stores them as one file in document units. Here, the page selection unit 406 determines that 1601 to 1603 are one document, 1604 is one document, and 1605 and 1606 are one document. Here, since it is considered that there is a standard document on the last page of each document, the page selection unit 406 also compares the last pages according to the user's specification. However, there may be a case where the position of the object to be compared is not the same position on the image as in the case where the sum of the last pages 1603 and 1606 of the document is the object to be compared. Here, the page selection unit 406 performs a comparison by matching the document portion before the trailing edge margin of the final page image with the position of the portion where the trailing edge margin occurs.

次に主制御部４０の制御フローに関して説明する。図８は、第１実施例における制御フローを示す図である。なお、本フローの各ステップにおける処理は、以下に示す手順を記述したコンピュータ実行可能なプログラムをＲＯＭ（不図示）からＲＡＭ（不図示）上に読み込んだ後に、ＣＰＵ（不図示）によって該プログラムを実行することによって実施される。 Next, the control flow of the main control unit 40 will be described. FIG. 8 is a diagram showing a control flow in the first embodiment. The processing in each step of this flow is performed by reading a computer-executable program describing the following procedure from a ROM (not shown) onto a RAM (not shown) and then executing the program by a CPU (not shown). Implemented by executing.

使用者により、同一形態で構成される複数の画像データの電子化を行う動作モードが選択されると（ステップＳ１０１）、画像入力部１０は画像データを受け付ける（ステップＳ１０２）。 When the user selects an operation mode for digitizing a plurality of pieces of image data configured in the same form (step S101), the image input unit 10 receives the image data (step S102).

受け付けられた画像データは、光学的な読み取りを伴うものであればスキャンされた画像データとなり、通信網を介して入力されるものであれば受信データとなる。ステップＳ１０３にて画像入力部１０は、全ての画像データが入力されたか否か判定を行い、画像入力部１０により、全ての画像データが入力されたと判定されると、ステップＳ１０４に進む。 The received image data is scanned image data if it is accompanied by optical reading, and is received data if it is input via a communication network. In step S103, the image input unit 10 determines whether all image data has been input. If the image input unit 10 determines that all image data has been input, the process proceeds to step S104.

入力された画像データは複数ページで構成される原稿を１セットとし、この原稿のセットを複数セット読み取ることで入力される。 The input image data is input by reading a plurality of sets of originals including a set of originals composed of a plurality of pages.

そしてステップＳ１０４にて主制御部４０は文字判定部４０１による文字判定処理を実施する。この文字判定部４０１による、文字判定処理は、画像入力部１０により入力される１枚毎のレイアウト解析、データ切り出し、特徴抽出、照合、照合結果の記憶を行う。 In step S <b> 104, the main control unit 40 performs character determination processing by the character determination unit 401. The character determination processing by the character determination unit 401 performs layout analysis, data extraction, feature extraction, verification, and verification result storage for each sheet input by the image input unit 10.

ステップＳ１０５にて、主制御部４０により、使用者によって設定された動作モードが同一形態で構成される複数の画像データの電子化を行うものでないと判断された場合、ステップＳ１１２に進む。そしてステップＳ１１２にて、生成するフォルダ名を所定のものとする。そして、生成したフォルダ内に、文字判定部４０１により得たテキストデータを用いた電子化ファイルを記憶する。ステップＳ１０５にて主制御部４０により、使用者によって設定された動作モードが、同一形態で構成される複数の画像データの電子化を行う動作モードであると判定された場合、ステップＳ１２０に進む。後述のステップＳ１２０のフローを経過し、処理対象ページが決定された後、ステップＳ１０６に進む。そしてステップＳ１０６にて、一致判定部４０２による一致判定処理を実施する。一致判定部４０２による一致判定処理が終了すると、ステップＳ１０７にて、主制御部４０の表示制御部４０３は、一致箇所、不一致箇所の表示制御を行う。 If the main control unit 40 determines in step S105 that the operation mode set by the user is not to digitize a plurality of image data configured in the same form, the process proceeds to step S112. In step S112, the folder name to be generated is a predetermined name. Then, an electronic file using text data obtained by the character determination unit 401 is stored in the generated folder. When the main control unit 40 determines in step S105 that the operation mode set by the user is an operation mode for digitizing a plurality of image data configured in the same form, the process proceeds to step S120. After a flow of step S120 described later has passed and the processing target page has been determined, the process proceeds to step S106. In step S <b> 106, a match determination process is performed by the match determination unit 402. When the coincidence determination process by the coincidence determination unit 402 ends, in step S107, the display control unit 403 of the main control unit 40 performs display control of the coincidence portion and the disagreement portion.

ステップＳ１０８にて、使用者により、一致箇所が選択された場合、ステップＳ１０９に進む。そして、主制御部４０のフォルダ生成部４０４は、選択された一致箇所の内容を反映したフォルダを生成し、フォルダ内に、全ての入力画像データの中間ファイルを統合した単一のファイルを記憶する。一方、ステップＳ１０８において、不一致箇所が選択された場合、ステップＳ１１０に進む。そしてステップＳ１１０にて、相違箇所が選択されていると判定されると、ステップＳ１１１へ進む。そして、ステップＳ１１１にて、主制御部４０のフォルダ生成部４０４は、各入力画像データ中の、選択された不一致箇所位置の内容を反映したフォルダを各々生成し、各フォルダ内に、対応する文書単位の中間ファイルを記憶する。なお、ステップＳ１０９、およびステップＳ１１１において記憶する中間ファイルの名称は、既に説明した一致箇所、不一致箇所の選択に対応した名称となる。 In step S108, when the coincidence portion is selected by the user, the process proceeds to step S109. Then, the folder generation unit 404 of the main control unit 40 generates a folder reflecting the contents of the selected matching part, and stores a single file in which intermediate files of all input image data are integrated in the folder. . On the other hand, if a mismatched portion is selected in step S108, the process proceeds to step S110. If it is determined in step S110 that a different location has been selected, the process proceeds to step S111. In step S111, the folder generation unit 404 of the main control unit 40 generates folders that reflect the contents of the selected mismatched location in each input image data, and the corresponding document is stored in each folder. Store unit intermediate files. Note that the name of the intermediate file stored in steps S109 and S111 is a name corresponding to the selection of the matching part and the mismatching part already described.

図１７はステップＳ１２０で実行される処理の詳細を示すフローであり、ページ選択部４０６の制御フローである。この制御フローを実行し、読みこんだページが比較対象のページであるか否かを判断する。 FIG. 17 is a flow showing details of the processing executed in step S120, and is a control flow of the page selection unit 406. This control flow is executed to determine whether or not the read page is a comparison target page.

ステップＳ４０１にて、主制御部４０は、ユーザーから文書のページ数指定がされているかどうかを判定し、指定されている場合はステップＳ４０２に、自動検出指定されている場合はステップＳ４０３に進む。ステップＳ４０２にて、主制御部４０は、現在のページが比較する対象ページかを判定し、対象ページであれば次のステップへ進むが、そうでない場合はステップＳ４０８に進む。ステップＳ４０３にて、主制御部４０は、閾値指定がある場合はステップＳ４０４に進み、閾値指定がない場合はステップＳ４０５に進む。ステップＳ４０４にて、主制御部４０は、ユーザーから指定された閾値を取得してステップＳ４０６に進む。ステップＳ４０５にて、主制御部４０は、デフォルトの閾値を取得しステップＳ４０６に進む。ステップＳ４０６にて、主制御部４０は、取得した閾値から特徴量の比較を実施してステップＳ４０７に進む。ステップＳ４０７にて、主制御部４０は、特徴量の比較結果から比較ページの近似ページかどうかを判定し、近似ページであれば次のステップに進み、近似ページでないと判断すれば、ステップＳ４０８に進む。ステップＳ４０８にて、主制御部４０は、直前のページの文書に属するページとして一次記録して、次のページの読み込みを行う（ステップＳ４０９）。 In step S401, the main control unit 40 determines whether or not the number of pages of the document has been designated by the user. If designated, the process proceeds to step S402, and if automatic detection is designated, the process proceeds to step S403. In step S402, the main control unit 40 determines whether the current page is a target page to be compared. If the current page is the target page, the main control unit 40 proceeds to the next step. If not, the main control unit 40 proceeds to step S408. In step S403, the main control unit 40 proceeds to step S404 if a threshold value is specified, and proceeds to step S405 if no threshold value is specified. In step S404, the main control unit 40 acquires a threshold value designated by the user and proceeds to step S406. In step S405, the main control unit 40 acquires a default threshold value and proceeds to step S406. In step S406, the main control unit 40 compares feature amounts from the acquired threshold value, and proceeds to step S407. In step S407, the main control unit 40 determines whether it is an approximate page of the comparison page from the comparison result of the feature amount. If it is an approximate page, the process proceeds to the next step, and if it is determined that the page is not an approximate page, the main control unit 40 proceeds to step S408. move on. In step S408, the main control unit 40 performs primary recording as a page belonging to the document of the previous page, and reads the next page (step S409).

以上、説明したように本第１実施例においては、全頁のフォーマットが必ずしも同一ではない複数の画像データの電子化を行う場合、使用者によって選択された内容（一致箇所選択／不一致箇所選択）に応じた形態で文書単位でフォルダが構築される。更に、フォルダ内に記憶される電子化ファイル名、ファイル形態も、使用者によって選択された内容（一致箇所選択／不一致箇所選択）に適合したものとなるため、使用者の利便性を高めることが出来る。 As described above, in the first embodiment, when digitizing a plurality of image data in which the format of all pages is not necessarily the same, the content selected by the user (selection of coincidence portion / non-coincidence portion selection) A folder is constructed for each document in a form corresponding to the above. Furthermore, the computerized file name and file format stored in the folder are also adapted to the contents selected by the user (selection of matching part / selection of non-matching part), so that convenience for the user can be improved. I can do it.

［第２実施例］
第１実施例は、使用者によって選択された内容に応じ、フォルダ構成、ファイル構成、さらにはフォルダ名、ファイル名を適宜制御するものであった。第２実施例では、使用者によって選択された内容に応じたファイル構成、ファイル名を制御する場合に関して説明する。 [Second Embodiment]
In the first embodiment, the folder configuration, file configuration, and further the folder name and file name are appropriately controlled according to the content selected by the user. In the second embodiment, a description will be given of a case where the file structure and the file name are controlled according to the content selected by the user.

なお、画像処理装置の構成は図１に代わって図９に示すような構成となり、本実施例ではフォルダ生成部４０４に代わってファイル生成部４０５を有している。 The configuration of the image processing apparatus is as shown in FIG. 9 instead of FIG. 1, and in this embodiment, a file generation unit 405 is provided instead of the folder generation unit 404.

図１０は、本実施例にて、電子化された画像データのファイル管理方法について示す図である。同図において、第１実施例と同一機能を有するものは同一符号を付与しているが、本実施例特有の処理部は、主制御部４０に構成したファイル生成部４０５のみである。したがって、画像入力部１０、記憶部２０、操作パネル部３０、および主制御部４０に構成した文字判定部４０１、一致判定部４０２、表示制御部４０３の機能、動作は第１実施例で説明したものと同一である。一方、本実施例では、主制御部４０に構成した表示制御部４０３による表示制御後の動作が第１実施例と異なる。つまり、使用者による一致箇所、不一致箇所の選択後の制御が本実施例で実行される処理の特徴となる。このため、以下、一致箇所選択時、および不一致箇所選択時の制御に関して説明する。また、以下の説明に適用する入力画像データは図５に示したものとする。 FIG. 10 is a diagram showing a file management method for digitized image data in this embodiment. In the figure, components having the same functions as those in the first embodiment are given the same reference numerals, but the processing unit unique to this embodiment is only the file generation unit 405 configured in the main control unit 40. Therefore, the functions and operations of the character determination unit 401, the coincidence determination unit 402, and the display control unit 403 configured in the image input unit 10, the storage unit 20, the operation panel unit 30, and the main control unit 40 have been described in the first embodiment. Is the same. On the other hand, in the present embodiment, the operation after the display control by the display control unit 403 configured in the main control unit 40 is different from that in the first embodiment. That is, the control after selection of the coincidence portion and the disagreement portion by the user is a feature of the processing executed in this embodiment. For this reason, hereinafter, the control at the time of selecting the coincidence portion and the control at the time of selecting the non-match portion will be described. Further, input image data applied to the following description is assumed to be as shown in FIG.

図１０（ａ）は、一致判定となった文字列『御請求書』が選択された場合の、ファイル生成部４０５によるファイル生成制御結果である。ファイル生成部４０５は、一致判定となった文字列が選択された場合、記憶部２０に生成するファイル名称として、選択された文字列を用いる。例えば、選択された一致判定の文字列が『御請求書』であれば、『御請求書』をファイル名（１００１）とする。また、文字判定部４０１が生成した図５（ａ）、図５（ｂ）、図５（ｃ）の中間ファイルを統合した形態で記憶する。このように制御することで、使用者の意図した文字列を使用したファイル名を容易に生成することが出来る。 FIG. 10A shows a file generation control result by the file generation unit 405 when the character string “invoice” that has been determined to match is selected. The file generation unit 405 uses the selected character string as the file name to be generated in the storage unit 20 when the character string that has been determined to match is selected. For example, if the selected character string for matching determination is “invoice”, “invoice” is set as the file name (1001). In addition, the intermediate files generated by the character determination unit 401 in FIGS. 5A, 5B, and 5C are stored in an integrated form. By controlling in this way, a file name using a character string intended by the user can be easily generated.

次に、不一致箇所が選択された場合のファイル生成部４０５の制御に関して説明する。図１０（ｂ）は、不一致判定となった文字列『株式会社ＡＡＡ』が選択された場合の、ファイル生成部４０５によるフォルダ生成制御結果である。ファイル生成部４０５は、不一致判定となった文字列が選択された場合、記憶部２０に生成するファイル名称として、選択された文字列位置に存在する各文字列を用いる。本説明において選択された不一致判定の文字列は、『株式会社ＡＡＡ』である。そのため、各画像データにおける『株式会社ＡＡＡ』の文字列の位置、すなわち、各入力画像データ中の『御中』の前に存在する文字列を、それぞれのファイル名（１００２、１００３、１００４）とする。また、不一致判定となった文字列が選択された場合、画像入力部１０により入力された電子化後の画像データをそれぞれ別のファイルとして記憶する。 Next, the control of the file generation unit 405 when a mismatched part is selected will be described. FIG. 10B shows a folder generation control result by the file generation unit 405 when the character string “AAA Co., Ltd.” that has been determined to be inconsistent is selected. When a character string that has been determined to be inconsistent is selected, the file generation unit 405 uses each character string that exists at the selected character string position as a file name to be generated in the storage unit 20. The character string for mismatch determination selected in the present description is “AAA Co., Ltd.”. Therefore, the position of the character string “AAA Co., Ltd.” in each image data, that is, the character string existing before “Gochu” in each input image data is set as the respective file name (1002, 1003, 1004). . When a character string that has been determined to be inconsistent is selected, the digitized image data input by the image input unit 10 is stored as a separate file.

つまり、入力画像データが図５（ａ）であれば、ファイル名株式会社ＡＡＡとして、図５（ａ）の電子化されたファイルを記憶する。同様に、入力画像データが図５（ｂ）であれば、ファイル名をＢＢＢ（株）として、図５（ｂ）の電子化されたファイルを記憶し、入力画像データが図５（ｃ）であれば、ファイル名をＣＣＣ工務店として、図５（ｃ）の電子化されたファイルを記憶する。つまり、本実施例においては、入力画像データ各々の中間ファイルがリネームされて記憶される。このように制御することで、使用者の意図した文字列を使用したファイル名を容易に生成することが出来る。 That is, if the input image data is FIG. 5A, the digitized file of FIG. 5A is stored as the file name AAA. Similarly, if the input image data is FIG. 5 (b), the file name is BBB, and the electronic file of FIG. 5 (b) is stored, and the input image data is FIG. 5 (c). If there is, the computerized file shown in FIG. 5C is stored with the file name as the CCC contractor. That is, in this embodiment, the intermediate file of each input image data is renamed and stored. By controlling in this way, a file name using a character string intended by the user can be easily generated.

次に主制御部４０の制御フローに関して説明する。図１１は、本実施例における制御フローを示す図である。なお、本フローの各ステップにおける処理は、以下に示す手順を記述したコンピュータ実行可能なプログラムをＲＯＭ（不図示）からＲＡＭ（不図示）上に読み込んだ後に、ＣＰＵ（不図示）によって該プログラムを実行することによって実施される。 Next, the control flow of the main control unit 40 will be described. FIG. 11 is a diagram showing a control flow in the present embodiment. The processing in each step of this flow is performed by reading a computer-executable program describing the following procedure from a ROM (not shown) onto a RAM (not shown) and then executing the program by a CPU (not shown). Implemented by executing.

使用者により、同一形態で構成される複数の画像データの電子化を行う動作モードが選択されると（ステップＳ２０１）、画像入力部１０は画像データを受け付ける（ステップＳ２０２）。受け付ける画像データは、光学的な読み取りを伴うものであれば、スキャンされた画像データとなり、通信網を介して入力されるものであれば受信データとなる。ステップＳ２０３にて画像入力部１０により、全ての画像データが入力されたと判断されるとステップＳ２０４に進む。 When the user selects an operation mode for digitizing a plurality of pieces of image data configured in the same form (step S201), the image input unit 10 receives the image data (step S202). The received image data is scanned image data if it is accompanied by optical reading, and is received data if it is input via a communication network. If the image input unit 10 determines in step S203 that all image data has been input, the process proceeds to step S204.

そしてステップＳ２０４にて、主制御部４０は文字判定部４０１による文字判定処理を実施する。この文字判定部４０１による、文字判定処理は、画像入力部１０により入力される１枚毎のレイアウト解析、データ切り出し、特徴抽出、照合、照合結果の記憶である。 In step S <b> 204, the main control unit 40 performs character determination processing by the character determination unit 401. The character determination processing by the character determination unit 401 is layout analysis, data cutout, feature extraction, verification, and storage of verification results for each sheet input by the image input unit 10.

続いて、ステップＳ２０５にて、主制御部４０は、使用者によって設定された動作モードが、同一形態で構成される複数の画像データの電子化を行うモードであるか否か判断し、そうでなければステップＳ２１２に進む。そしてステップＳ２１２にて、生成するファイル名を所定のものとして記憶する。一方、ステップＳ２０５にて主制御部４０は、使用者によって設定された動作モードが、同一形態で構成される複数の画像データの電子化を行う動作モードであると判定された場合、ステップＳ１２０に進む。ステップＳ１２０のフローを経過し処理対象のページが決定された後、ステップＳ２０６へ進む。そしてステップＳ２０６にて一致判定部４０２による一致判定処理を実施する。この一致判定部４０２による一致判定処理が終了すると、ステップＳ２０７へ進み、主制御部４０の表示制御部４０３は、一致箇所、不一致箇所の表示制御を行う。 Subsequently, in step S205, the main control unit 40 determines whether or not the operation mode set by the user is a mode for digitizing a plurality of image data configured in the same form. If not, the process proceeds to step S212. In step S212, the file name to be generated is stored as a predetermined name. On the other hand, if the main control unit 40 determines in step S205 that the operation mode set by the user is an operation mode for digitizing a plurality of image data configured in the same form, the main control unit 40 proceeds to step S120. move on. After the flow of step S120 has passed and the page to be processed has been determined, the process proceeds to step S206. In step S206, the coincidence determination unit 402 performs a coincidence determination process. When the coincidence determination process by the coincidence determination unit 402 ends, the process proceeds to step S207, and the display control unit 403 of the main control unit 40 performs display control of the coincidence portion and the disagreement portion.

続いて、ステップＳ２０８にて使用者により、一致箇所が選択された場合ステップＳ２０９に進む。そしてステップＳ２０９にて、主制御部４０のファイル生成部４０５は、選択された一致箇所の内容を反映したファイルを生成する。なお、生成するファイルは、全ての入力画像データの中間ファイルを統合した単一のファイルである。一方、ステップＳ２０８において、不一致箇所が選択されたと判定された場合は、ステップＳ２１０に進む。そして、ステップＳ２１０にて、ユーザーが選択した箇所が相違箇所であると判定されると、ステップＳ２１１に進む。そして、ステップＳ２１１にて、主制御部４０のファイル生成部４０５は、各入力画像データ中の、選択された不一致箇所位置の内容を反映した文書単位のファイルを各々生成する。なお、ステップＳ２０９、およびステップＳ２１１において記憶するファイルの名称は、既に説明した一致箇所、不一致箇所の選択に対応した名称となる。 Subsequently, when the coincident part is selected by the user in step S208, the process proceeds to step S209. In step S209, the file generation unit 405 of the main control unit 40 generates a file reflecting the contents of the selected matching portion. The file to be generated is a single file obtained by integrating intermediate files of all input image data. On the other hand, if it is determined in step S208 that a mismatched portion has been selected, the process proceeds to step S210. If it is determined in step S210 that the location selected by the user is a different location, the process proceeds to step S211. In step S211, the file generation unit 405 of the main control unit 40 generates files in document units that reflect the contents of the selected inconsistent position in each input image data. Note that the names of the files stored in step S209 and step S211 are names corresponding to the selections of the matching points and the mismatching points already described.

以上、説明したように本実施例においては、全頁のフォーマットが必ずしも同一ではない複数の画像データの電子化を行う場合、使用者によって選択された内容（一致箇所選択／不一致箇所選択）に応じた形態で文書単位でファイルが構築される。更に、記憶される電子された画像データのファイル名も、使用者によって選択された内容（一致箇所選択／不一致箇所選択）に適合したものとなるため、使用者の利便性を高めることが出来る。 As described above, in this embodiment, when digitizing a plurality of image data in which the format of all pages is not necessarily the same, according to the content selected by the user (selection of coincidence / non-coincidence) A file is constructed for each document in the form. Furthermore, since the file name of the stored electronic image data is also adapted to the content selected by the user (matching location selection / non-matching location selection), user convenience can be improved.

［第３実施例］
次に、本発明の第３実施例に関して説明する。本実施例は、同一形態で構成される複数の画像データの電子化を行う場合、生成するファイルを統合して生成（単一ファイルとして生成）するか、個別に生成するかの動作モードが事前に設定された場合の制御に関するものである。なお、本実施例の画像処理装置の構成は、図１に示す構成であり、画像入力部１０、記憶部２０、操作パネル部３０、主制御部４０となる。また、以降の説明において、画像入力部１０に入力される画像データは、図５に示した画像データとする。 [Third embodiment]
Next, a third embodiment of the present invention will be described. In this embodiment, when digitizing a plurality of pieces of image data configured in the same form, the operation mode of whether the files to be generated are integrated (generated as a single file) or individually generated is preliminarily set. It relates to control when set to. The configuration of the image processing apparatus according to the present embodiment is the configuration illustrated in FIG. 1 and includes an image input unit 10, a storage unit 20, an operation panel unit 30, and a main control unit 40. In the following description, the image data input to the image input unit 10 is the image data shown in FIG.

以下、図面を用いて、本実施例における制御フローに関して説明する。図１２は、本実施例における制御フローを示す図である。なお、本フローの各ステップにおける処理は、以下に示す手順を記述したコンピュータ実行可能なプログラムをＲＯＭ（不図示）からＲＡＭ（不図示）上に読み込んだ後に、ＣＰＵ（不図示）によって該プログラムを実行することによって実施される。 Hereinafter, the control flow in the present embodiment will be described with reference to the drawings. FIG. 12 is a diagram showing a control flow in the present embodiment. The processing in each step of this flow is performed by reading a computer-executable program describing the following procedure from a ROM (not shown) onto a RAM (not shown) and then executing the program by a CPU (not shown). Implemented by executing.

ステップＳ３０１にて、使用者により、同一形態で構成される複数の画像データの電子化を、統合して生成するか、個別で生成するかの動作モードが設定されると、ステップＳ３０２に進む。そして、ステップＳ３０２にて画像入力部１０は画像データを受け付ける。受け付ける画像データは、第１実施例および第２実施例同様、光学的な読み取りを伴うものであれば、スキャンされた画像データとなり、通信網を介して入力されるものであれば受信データとなる。画像入力部１０は、全ての画像データが入力されるまで画像データの受け付けを継続的に行う（ステップＳ３０３のＮｏ）。 In step S301, when the user sets an operation mode for generating a plurality of pieces of image data configured in the same form in an integrated manner or individually, the process proceeds to step S302. In step S302, the image input unit 10 receives image data. As in the first and second embodiments, the received image data is scanned image data if it is optically read, and is received data if it is input via a communication network. . The image input unit 10 continuously accepts image data until all the image data is input (No in step S303).

ステップＳ３０３にて画像入力部１０により、全ての画像データの入力がされたと判定されると、ステップＳ３０４に進む。 If it is determined in step S303 that the image input unit 10 has input all image data, the process proceeds to step S304.

そしてステップＳ３０４にて、主制御部４０は、文字判定部４０１による文字判定処理を実施する。この文字判定部４０１による、文字判定処理は、画像入力部１０により入力される１枚毎のレイアウト解析、データ切り出し、特徴抽出、照合、照合結果の記憶である。 In step S <b> 304, the main control unit 40 performs character determination processing by the character determination unit 401. The character determination processing by the character determination unit 401 is layout analysis, data cutout, feature extraction, verification, and storage of verification results for each sheet input by the image input unit 10.

文字判定部４０１による文字判定が終了すると、ステップＳ１２０に進む。ステップＳ１２０のフローを経過し処理対象ページが決定された後、ステップＳ３０５に進み、主制御部４０の一致判定部４０２は、一致判定処理を実施する。続いてステップＳ３０６にて主制御部４０は、使用者により設定された動作モードが、入力画像データの電子化結果を統合して生成する動作モードである場合、ステップＳ３０７に進む。そして、ステップＳ３０７にて表示制御部４０３による処理により、一致判定部４０２で一致判定となった文字列を選択出来る形態で表示を行う。 When the character determination by the character determination unit 401 ends, the process proceeds to step S120. After the flow of step S120 has passed and the processing target page has been determined, the process proceeds to step S305, where the match determination unit 402 of the main control unit 40 performs a match determination process. Subsequently, in step S306, the main control unit 40 proceeds to step S307 when the operation mode set by the user is an operation mode in which the digitized result of the input image data is generated. In step S307, the display control unit 403 performs display in a form in which the character string determined to be coincidence by the coincidence determination unit 402 can be selected.

この場合の操作パネル部３０の状態例を図１３（ａ）に示す。図１３（ａ）において、網掛けで示した文字列は、一致判定部４０２により、入力画像データ、すなわち図５（ａ）、図５（ｂ）、図５（ｃ）それぞれで一致していると判定された文字列である。なお、この網掛けで示した文字列は、使用者が選択可能なものである。また、薄く示した文字列は、一致判定部４０２により、入力画像データで不一致と判定された文字列であり、使用者によって選択不可能なものである。なお、使用者により選択可能な一致判定された文字列の表示は、カラー表示での色分けでも良いし、点灯／点滅といった表示形態であっても良い。 An example of the state of the operation panel unit 30 in this case is shown in FIG. In FIG. 13A, the character string indicated by shading matches the input image data, that is, FIG. 5A, FIG. 5B, and FIG. It is a character string determined to be. Note that the character string indicated by shading is selectable by the user. Moreover, the character string shown lightly is a character string that has been determined to be inconsistent in the input image data by the match determination unit 402 and cannot be selected by the user. It should be noted that the display of the character string determined to be coincident that can be selected by the user may be a color display by color display or a display form such as lighting / flashing.

ステップＳ３０８にて、使用者により、一致判定された文字列の選択がなされると、ステップＳ３０９に進み、主制御部４０は、選択された文字列を用いたフォルダ名のフォルダを記憶部２０に生成する。そして、記憶部２０に記憶されている図５（ａ）、図５（ｂ）、図５（ｃ）の中間ファイルを統合し、選択された一致文字列を用いたファイル名でフォルダ内に記憶する。例えば、使用者により、一致判定の文字列『御請求書』が選択された場合、フォルダ名を『御請求書』とし、そのフォルダ内に、図５（ａ）、図５（ｂ）、図５（ｃ）を統合したファイルを、御請求書＿００１という名称で記憶する。 In step S308, when the user selects a character string determined to match, the process proceeds to step S309, and the main control unit 40 stores a folder with a folder name using the selected character string in the storage unit 20. Generate. Then, the intermediate files of FIGS. 5A, 5B, and 5C stored in the storage unit 20 are integrated and stored in a folder with a file name using the selected matching character string. To do. For example, when the user selects the character string “invoice” for matching, the folder name is “invoice”, and the folder name is shown in FIGS. 5 (a), 5 (b), and FIG. The file in which 5 (c) is integrated is stored under the name of invoice_001.

一方、ステップＳ３０６にて、主制御部４０は、使用者により設定された動作モードが、入力画像データの電子化結果を個別に生成する動作モードである場合ステップＳ３１０に進む。そして、ステップＳ３１０にて、表示制御部４０３による処理により、一致判定部４０２で不一致判定となった文字列が選択出来る形態で表示を行う。 On the other hand, in step S306, the main control unit 40 proceeds to step S310 when the operation mode set by the user is an operation mode for individually generating the digitized result of the input image data. In step S310, the display control unit 403 performs display in a form in which the character string that has been determined to be inconsistent by the match determination unit 402 can be selected.

この場合の操作パネル部３０の状態例を図１３（ｂ）に示す。図１３（ｂ）において、網掛けで示した文字列は、一致判定部４０２により、入力画像データ、すなわち入力画像データで不一致と判定された文字列である。なお、この網掛けで示した文字列は、使用者が選択可能なものである。また、薄く示した文字列は、一致判定部４０２により、図５（ａ）、図５（ｂ）、図５（ｃ）全体で一致と判定された文字列であり、使用者によって選択不可能なものである。なお、使用者により選択可能な不一致判定された文字列の表示は、カラー表示での色分けでも良いし、点灯／点滅といった表示形態であっても良い。 An example of the state of the operation panel unit 30 in this case is shown in FIG. In FIG. 13B, the character string indicated by shading is a character string that has been determined to be inconsistent in the input image data, that is, the input image data, by the match determination unit 402. Note that the character string indicated by shading is selectable by the user. Further, the thinly illustrated character string is a character string that is determined to be a match by the match determination unit 402 in FIG. 5 (a), FIG. 5 (b), and FIG. 5 (c) as a whole, and cannot be selected by the user. It is a thing. Note that the display of the character strings determined to be inconsistent that can be selected by the user may be color-coded by color display or may be a display form such as lighting / flashing.

ステップＳ３１１にて、主制御部４０は、使用者により、不一致判定された文字列の選択がなされたと判断されるとステップＳ３１２に進む。そして、ステップＳ３１２にて、入力画像データ中の選択された不一致文字列位置の各文字列を用いたフォルダを記憶部２０に生成し、生成した文書単位のファイルを記憶部２０に記憶する。 In step S311, the main control unit 40 proceeds to step S312 when it is determined by the user that the character string determined to be inconsistent has been selected. In step S312, a folder using each character string at the selected mismatched character string position in the input image data is generated in the storage unit 20, and the generated document unit file is stored in the storage unit 20.

具体的には、図５（ａ）、図５（ｂ）、図５（ｃ）の中間ファイルを、各々該当するフォルダ内に記憶する。例えば、使用者により、不一致判定の文字列『株式会社ＡＡＡ』が選択された場合、『株式会社ＡＡＡ』、『ＢＢＢ（株）』、『ＣＣＣ工務店』の名称でフォルダを作成する。そして、その各フォルダ内に、図５（ａ）、図５（ｂ）、図５（ｃ）各々の中間ファイルを株式会社ＡＡＡ＿０１１、ＢＢＢ（株）＿００１、ＣＣＣ工務店＿００１にリネームして記憶する。 Specifically, the intermediate files shown in FIGS. 5A, 5B, and 5C are stored in the corresponding folders. For example, if the character string “AAA Co., Ltd.” for determining the inconsistency is selected by the user, a folder is created with the names “AAA Co., Ltd.”, “BBB Co., Ltd.”, and “CCC Contractor”. In each of the folders, the intermediate files shown in FIGS. 5A, 5B, and 5C are renamed and stored in AAA_011, BBB Co., Ltd._001, and CCC engineering company_001. .

なお、本実施例における、一致判定の文字列、あるいは不一致判定の文字列の表示部３０１への表示対象は、全ての一致判定文字列、全ての不一致判定文字列としなくても良い。例えば、記憶部２０に、一致、不一致の表示対象とする文字の大きさ（フォントの大きさ）、文字列の長さを事前登録し、登録された内容に該当する一致文字列、不一致文字列のみを表示するようにしても良い。また、記憶部２０に、キーワードを事前登録し、事前登録されたキーワードに該当する一致文字列、不一致文字列のみを表示するようにしても良い。 It should be noted that in the present embodiment, the match determination character string or the mismatch determination character string to be displayed on the display unit 301 may not be all match determination character strings or all mismatch determination character strings. For example, the storage unit 20 pre-registers the character size (font size) and the character string length to be displayed for matching and mismatching, and the matching character string and the mismatching character string corresponding to the registered contents. May be displayed only. Alternatively, keywords may be pre-registered in the storage unit 20, and only the matched character strings and unmatched character strings corresponding to the pre-registered keywords may be displayed.

以上、説明したように、本実施例においては、事前に設定された動作モードに応じて、文書単位で生成するフォルダ名、ファイル名の候補を、一致判定された文字列、あるいは不一致判定された文字列とすることで使用者の利便性を向上させている。 As described above, according to the present embodiment, according to a preset operation mode, a folder name and a file name candidate generated for each document are determined as a character string determined to match or as a mismatch. User convenience is improved by using a character string.

（その他の実施例）
本発明は、以下の処理を実行することによっても実現される。即ち、上述した実施例の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。 (Other examples)
The present invention is also realized by executing the following processing. That is, software (program) for realizing the functions of the above-described embodiments is supplied to a system or apparatus via a network or various storage media, and a computer (or CPU, MPU, etc.) of the system or apparatus reads the program. It is a process to be executed.

Claims

An input means for inputting image data obtained by reading a plurality of sets of originals composed of a plurality of pages;
Of the image data input by the input means,
A character string included in the first image data corresponding to the first document set is compared with a character string included in the second image data corresponding to the second document set for each page,
A character string determined to match at a position common to a character string included in the first image data and a character string included in the second image data is different from a character string not determined to match. Display control means for displaying in display form;
Control means for controlling to store the first image data in a storage unit using a character string designated from the character string displayed by the display control means;
An image processing apparatus.

The selection unit compares a page configuring the first image data with a page configuring the second image data, and selects a page determined to be approximate as a page to be processed. The image processing apparatus according to claim 1.

2. The image according to claim 1, wherein the selection unit selects a last page constituting the first image data and a last page constituting the second image data as the pages to be processed. Processing equipment.

The control means stores the first image data as an electronic file in the storage unit, and assigns a file name using the designated character string to the file. The image processing apparatus according to 1.

The control means stores the first image data as an electronic file in the storage unit, and assigns a folder name using the designated character string to a folder storing the file. The image processing apparatus according to claim 1.

An input step for inputting image data obtained by reading a plurality of sets of originals composed of multiple pages;
Of the image data input in the input step,
A character string included in the first image data corresponding to the first document set is compared with a character string included in the second image data corresponding to the second document set for each page,
A character string determined to match at a position common to a character string included in the first image data and a character string included in the second image data is different from a character string not determined to match. A display control step for displaying in a display form;
A control step for controlling to store the first image data in a storage unit using a character string designated from the character string displayed in the display control step;
An image processing method.

A program for causing a computer to execute the image processing method according to claim 6.