JP5673277B2

JP5673277B2 - Image processing apparatus and program

Info

Publication number: JP5673277B2
Application number: JP2011065403A
Authority: JP
Inventors: 訓稔山本
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2011-03-24
Filing date: 2011-03-24
Publication date: 2015-02-18
Anticipated expiration: 2031-03-24
Also published as: JP2012204906A

Description

本発明は、画像処理装置およびプログラムに関する。 The present invention relates to an image processing apparatus and a program.

スキャナ等の画像読取手段で読み取った原稿画像中の文字を認識し、認識した文字列を利用する技術が開発されている。 A technique for recognizing characters in a document image read by an image reading unit such as a scanner and using the recognized character string has been developed.

特許文献１には、画像ファイルに付与すべきファイル名を生成する生成手段と、ファイル名を付与するべき画像ファイルに対して文字認識処理を行い、ファイル名を構成する要素となる文字列を抽出文字列として抽出可能な文字列抽出手段と、を備えるファイル名生成システムが開示されている。 Japanese Patent Application Laid-Open No. 2004-151867 extracts a character string that is an element constituting a file name by generating a file name to be assigned to an image file and performing character recognition processing on the image file to which the file name is to be assigned. A file name generation system including a character string extraction unit that can be extracted as a character string is disclosed.

特許文献２には、イメージデータから最大文字サイズの文字列を探索し、探索された最大文字サイズの文字列を含む所定領域のイメージデータを抽出して文書タイトルとして登録する電子ファイル装置が開示されている。 Patent Document 2 discloses an electronic file device that searches a character string having a maximum character size from image data, extracts image data of a predetermined area including the searched character string having the maximum character size, and registers the image data as a document title. ing.

特許文献３には、印鑑が押印された印影領域を検出し、印影領域のカラー画像から印影を消去した２値画像を生成し、入力された白黒２値画像に当該印影を消去した２値画像を貼り付け合成し、合成した２値画像を基に文字を認識する帳票読取装置が開示されている。 Patent Document 3 discloses a binary image in which an imprint area in which a seal is stamped is detected, a binary image in which the imprint is erased from a color image in the imprint area is generated, and the imprint is erased in an input black and white binary image. A form reading apparatus that recognizes characters based on a synthesized binary image is disclosed.

特開２０１０−１６５０１９号公報JP 2010-165019 A 特開平７−２００６２５号公報JP-A-7-200265 特開２００５−９２５４３号公報JP 2005-92543 A

本発明は、ユーザによる書き込みがなされた原稿から、該書き込みを反映した文字列を取得する画像処理装置及びプログラムを提供することを目的とする。 It is an object of the present invention to provide an image processing apparatus and a program for acquiring a character string reflecting the writing from a document written by a user.

請求項１に記載の画像処理装置は、原稿を読み取って生成した画像データを構成する複数の色成分のうち、第１の色成分の画像データにおいて、予め定めた第１の領域に対して文字認識処理を行い、文字列を抽出する文字列抽出部と、前記第１の色成分とは異なる他の色成分の画像データにおいて前記抽出された文字列を含む第２の領域に存在する他の色成分の画像と、前記抽出された文字列とが予め定めた関係にある場合に、前記他の色成分の画像データにおいて、前記第２の領域に対して文字認識処理を行い、前記他の色成分の文字を認識する他色文字認識部と、少なくとも前記他の色成分の画像と前記抽出された文字列との関係に基づいて、前記抽出された文字列を修正する修正部と、を備える。 The image processing apparatus according to claim 1, wherein, in the image data of the first color component among the plurality of color components constituting the image data generated by reading the document, a character is applied to a predetermined first area. A character string extraction unit that performs a recognition process and extracts a character string, and another image that exists in the second region including the extracted character string in image data of another color component different from the first color component When the image of the color component and the extracted character string have a predetermined relationship, character recognition processing is performed on the second region in the image data of the other color component, and the other recognizing other colors character recognition unit character color components, and a correction unit that at least the other based on a relationship between an image and the extracted character string color components, modifying the extracted character strings, Is provided.

請求項２に記載の画像処理装置は、請求項１に記載の画像処理装置において、前記修正部は、前記抽出された文字列に前記他の色成分の画像が重なる場合、前記抽出された文字列に含まれる文字のうち、前記他の色成分の画像と重なる文字を、前記抽出された文字列から削除することを特徴とする。 The image processing apparatus according to claim 2, wherein, in the image processing apparatus according to claim 1, when the image of the other color component overlaps the extracted character string, the correction unit Of the characters included in the column, the character overlapping with the image of the other color component is deleted from the extracted character string.

請求項３に記載の画像処理装置は、請求項１又は２に記載の画像処理装置において、前記修正部は、前記抽出された文字列に前記他の色成分の画像が重なる場合、前記抽出された文字列に含まれる文字のうち、前記他の色成分の画像と重なる文字を、前記他色文字認識部が認識した前記他の色成分の文字によって置換することを特徴とする。 According to a third aspect of the present invention, in the image processing device according to the first or second aspect, the correction unit is configured to extract the extracted color string when the image of the other color component overlaps the extracted character string. Of the characters included in the character string, the character overlapping the image of the other color component is replaced with the character of the other color component recognized by the other color character recognition unit.

請求項４に記載の画像処理装置は、請求項１から３のいずれか１項に記載の画像処理装置において、前記他の色成分の画像データにおいて、前記第２の領域に予め定められた記号が存在する場合、前記修正部は、前記他色文字認識部が認識した文字を前記抽出された文字列に挿入することを特徴とする。 The image processing device according to claim 4 is the image processing device according to any one of claims 1 to 3, wherein in the image data of the other color component, a symbol predetermined in the second region. When there is a character, the correction unit inserts the character recognized by the other color character recognition unit into the extracted character string.

請求項５に記載の画像処理装置は、請求項１から４のいずれか１項に記載の画像処理装置において、ユーザに使用されたことがある文字列を格納する辞書格納部と、前記修正部が修正した文字列が前記辞書格納部に含まれない場合、前記辞書格納部に格納された文字列の中から、前記修正部により修正された文字列と類似する文字列を検出し、前記第１の領域に含まれる文字列の抽出結果として決定する文字列決定部と、を備えることを特徴とする。 The image processing device according to claim 5 is the image processing device according to any one of claims 1 to 4, wherein the dictionary storage unit stores a character string that has been used by a user, and the correction unit. Is not included in the dictionary storage unit, a character string similar to the character string corrected by the correction unit is detected from the character strings stored in the dictionary storage unit, And a character string determination unit that determines the extraction result of the character string included in one area.

請求項６に記載のプログラムは、コンピュータに、原稿を読み取って生成した画像データを構成する複数の色成分のうち、第１の色成分の画像データにおいて、予め定めた第１の領域に対して文字認識処理を行い、文字列を抽出する文字列抽出ステップと、前記第１の色成分とは異なる他の色成分の画像データにおいて前記抽出された文字列を含む第２の領域に存在する他の色成分の画像と、前記抽出された文字列とが予め定めた関係にある場合に、前記他の色成分の画像データにおいて、前記第２の領域に対して文字認識処理を行い、前記他の色成分の文字を認識する他色文字認識ステップと、少なくとも前記他の色成分の画像と前記抽出された文字列との関係に基づいて、前記抽出された文字列を修正する修正ステップと、を実行させる。 According to a sixth aspect of the present invention, there is provided a program for a predetermined first region in image data of a first color component among a plurality of color components constituting image data generated by reading a document on a computer. A character string extraction step for performing character recognition processing and extracting a character string, and other existing in the second region including the extracted character string in image data of other color components different from the first color component When the image of the color component and the extracted character string have a predetermined relationship, character recognition processing is performed on the second region in the image data of the other color component, and the other a character other colors character recognition step recognizes the color components of the correction step in which at least the other based on a relationship between an image and the extracted character string color components, modifying the extracted character string , Run

請求項１に記載の画像処理装置によれば、ユーザによる書き込みがなされた原稿から、該書き込みを反映した文字列が取得される。 According to the image processing apparatus of the first aspect, a character string reflecting the writing is acquired from the original written by the user.

請求項２に記載の画像処理装置によれば、ユーザが元の原稿に対して行った、削除を意図する書き込みを反映した文字列が取得される。 According to the image processing apparatus of the second aspect, the character string reflecting the writing intended to be deleted performed by the user on the original document is acquired.

請求項３に記載の画像処理装置によれば、ユーザが元の原稿に対して行った、置換を意図する書き込みを反映した文字列が取得される。 According to the image processing apparatus of the third aspect, the character string reflecting the writing intended for replacement performed by the user on the original document is acquired.

請求項４に記載の画像処理装置によれば、ユーザが元の原稿に対して行った、文字の挿入を意図する書き込みを反映した文字列が取得される。 According to the image processing apparatus of the fourth aspect, the character string reflecting the writing intended to be inserted by the user on the original document is acquired.

請求項５に記載の画像処理装置によれば、本構成を有しない場合と比較して、抽出結果として提示される文字列の認識精度が向上する。 According to the image processing apparatus of the fifth aspect, the recognition accuracy of the character string presented as the extraction result is improved as compared with the case where the present configuration is not provided.

請求項６に記載のプログラムによれば、ユーザによる書き込みがなされた原稿から、該書き込みを反映した文字列が取得される。 According to the program of the sixth aspect, a character string reflecting the writing is acquired from the original written by the user.

実施形態に係る画像処理装置を含む画像処理システムの構成の一例を示す図である。1 is a diagram illustrating an example of a configuration of an image processing system including an image processing apparatus according to an embodiment. 画像処理装置のハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of an image processing apparatus. 画像処理装置が備える機能の一例を示す機能ブロック図である。It is a functional block diagram which shows an example of the function with which an image processing apparatus is provided. 色分解部の処理により作成される画像データの一例を示す図である。It is a figure which shows an example of the image data produced by the process of a color separation part. ファイル名の候補となる文字列を抽出する領域、及び、ファイル名の候補となる文字列を含む領域について説明するための図である。It is a figure for demonstrating the area | region which extracts the character string used as a file name candidate, and the area | region containing the character string used as a file name candidate. 記号格納部に格納される情報の例を示す図である。It is a figure which shows the example of the information stored in a symbol storage part. 辞書格納部に格納されるユーザ辞書の一例を示す図である。It is a figure which shows an example of the user dictionary stored in a dictionary storage part. 画像処理装置が実行するファイル名決定処理の一例を示すフローチャートである。It is a flowchart which shows an example of the file name determination process which an image processing apparatus performs. 画像処理装置が実行するファイル名決定処理の一例を示すフローチャートである。It is a flowchart which shows an example of the file name determination process which an image processing apparatus performs. 画像読取装置から受け付けた各画像データに対し、図８のステップＳ１１〜ステップＳ４７の処理を実行した場合に得られるファイル名候補１を説明するための図である。FIG. 9 is a diagram for explaining a file name candidate 1 obtained when the processes of steps S11 to S47 in FIG. 8 are executed for each image data received from the image reading apparatus.

以下、本発明の実施形態について、添付図面を参照しつつ説明する。 Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings.

まず、実施形態に係る画像処理装置について説明する。図１（Ａ）及び（Ｂ）は、実施形態に係る画像処理装置を含む画像処理システムの構成の一例を示す図である。 First, an image processing apparatus according to an embodiment will be described. 1A and 1B are diagrams illustrating an example of a configuration of an image processing system including an image processing apparatus according to an embodiment.

図１（Ａ）において、画像処理システム１００Ａは、情報処理装置２００と画像形成装置３００とを備える。情報処理装置２００と画像形成装置３００とは、通信手段４００で接続されており、相互に通信が可能である。情報処理装置２００としては、例えば、パーソナルコンピュータが挙げられる。画像形成装置３００としては、例えば、コピー機、プリンタ、及びファクシミリ（ＦＡＸ）送受信装置等の他、コピー機能、プリント機能、ＦＡＸ送受信機能等のうちいずれか複数の機能を備える、いわゆる複合機があげられる。 1A, an image processing system 100A includes an information processing apparatus 200 and an image forming apparatus 300. The information processing apparatus 200 and the image forming apparatus 300 are connected by a communication unit 400 and can communicate with each other. An example of the information processing apparatus 200 is a personal computer. As the image forming apparatus 300, for example, a so-called multi-function machine having any one of a copy function, a print function, a FAX transmission / reception function, and the like in addition to a copy machine, a printer, a facsimile (FAX) transmission / reception apparatus, and the like. It is done.

画像形成装置３００は、通信装置１０、画像処理装置２０、画像読取装置３０、及び画像出力装置４０を備える。通信装置１０は、通信手段４００を介して画像形成装置３００と接続する他の装置（例えば、情報処理装置２００）とデータの送受信を行う。例えば、通信装置１０は、画像処理を施した画像を画像処理装置２０から受付け、情報処理装置２００に送信する。また、通信装置１０は、例えば、情報処理装置２００から受け付けた画像データを画像処理装置２０に出力する。 The image forming apparatus 300 includes a communication device 10, an image processing device 20, an image reading device 30, and an image output device 40. The communication apparatus 10 transmits / receives data to / from another apparatus (for example, the information processing apparatus 200) connected to the image forming apparatus 300 via the communication unit 400. For example, the communication device 10 receives an image subjected to image processing from the image processing device 20 and transmits it to the information processing device 200. For example, the communication device 10 outputs the image data received from the information processing device 200 to the image processing device 20.

画像読取装置３０は、例えばスキャナであり、原稿台に置かれた原稿を読み取り、読み取った原稿画像を画像処理装置３０に出力する。 The image reading device 30 is, for example, a scanner, reads a document placed on a document table, and outputs the read document image to the image processing device 30.

画像処理装置２０は、通信装置１０から受け付けた画像データを、画像出力装置４０が出力可能なデータ形式に変換する。また、画像処理装置２０は、画像読取装置３０が読み取った画像データを画像読取装置３０から受付け、画像出力装置４０が出力可能なデータ形式に変換する。画像処理装置２０は、変換した画像データを画像出力装置４０に出力する。 The image processing device 20 converts the image data received from the communication device 10 into a data format that can be output by the image output device 40. Further, the image processing apparatus 20 receives the image data read by the image reading apparatus 30 from the image reading apparatus 30 and converts it into a data format that can be output by the image output apparatus 40. The image processing device 20 outputs the converted image data to the image output device 40.

また、画像処理装置２０は、画像読取装置３０が読み取った画像データに基づき、当該画像データに対して付されるファイル名を自動で決定し、通信装置１０を介して情報処理装置２００へと送信する。 Further, the image processing device 20 automatically determines a file name given to the image data based on the image data read by the image reading device 30 and transmits the file name to the information processing device 200 via the communication device 10. To do.

画像出力装置４０は、画像処理装置２０から出力対象の画像データを受け付け、受け付けた画像データに基づく画像を用紙上に形成し、出力する。 The image output device 40 receives image data to be output from the image processing device 20, forms an image based on the received image data on a sheet, and outputs the image.

図１（Ａ）では、画像形成装置３００が画像処理装置２０を備える実施形態について示した。図１（Ｂ）では、情報処理装置（例えば、パーソナルコンピュータ等）が画像処理装置として機能する実施形態について示す。図１（Ｂ）において、画像処理システム１００Ｂは、画像処理装置として機能する情報処理装置５００と、画像読取装置６００とを備える。情報処理装置５００及び画像読取装置６００は、通信手段４００により接続され相互に通信可能である。画像読取装置６００は、例えばスキャナであり、情報処理装置５００からの指示に基づいて、原稿台に置かれた原稿を読み取り、読み取った原稿画像を情報処理装置５００に送信する。 FIG. 1A illustrates an embodiment in which the image forming apparatus 300 includes the image processing apparatus 20. FIG. 1B illustrates an embodiment in which an information processing apparatus (for example, a personal computer) functions as an image processing apparatus. 1B, an image processing system 100B includes an information processing apparatus 500 that functions as an image processing apparatus, and an image reading apparatus 600. The information processing apparatus 500 and the image reading apparatus 600 are connected by the communication unit 400 and can communicate with each other. The image reading device 600 is a scanner, for example, and reads a document placed on the document table based on an instruction from the information processing device 500 and transmits the read document image to the information processing device 500.

情報処理装置５００は、画像読取装置６００に対して、原稿の読み取りを指示し、画像読取装置６００が読み取った原稿画像を受信する。情報処理装置５００は、受信した原稿画像を情報処理装置５００が備える記憶装置に記憶する。情報処理装置５００は、原稿画像を記憶装置に記憶する際のファイル名を、原稿画像に基づき決定する。 The information processing apparatus 500 instructs the image reading apparatus 600 to read a document, and receives a document image read by the image reading apparatus 600. The information processing apparatus 500 stores the received document image in a storage device included in the information processing apparatus 500. The information processing apparatus 500 determines a file name for storing the document image in the storage device based on the document image.

以降は、画像形成装置３００が備える画像処理装置２０について説明を行う。しかしながら、図１（Ｂ）の情報処理装置５００が、画像処理装置として機能してもよい。また、以降の説明では、画像読取装置３０が読み取った画像データ中の文字列を認識し、認識した文字列をファイル名として利用する例について説明する。また、以下の説明では、ユーザが書き込みを行う前の原稿を「元の原稿（または、元原稿）」と記載し、ユーザが書き込みを行った後の原稿を単に「原稿」と記載する。また、本実施形態において、元の原稿は白黒印刷されており、ユーザによる書き込みは、黒以外の色を用いて行われているものとする。 Hereinafter, the image processing apparatus 20 included in the image forming apparatus 300 will be described. However, the information processing apparatus 500 in FIG. 1B may function as an image processing apparatus. In the following description, an example in which a character string in image data read by the image reading device 30 is recognized and the recognized character string is used as a file name will be described. Further, in the following description, a document before writing by the user is described as “original document (or original document)”, and a document after writing by the user is simply described as “document”. In this embodiment, the original document is printed in black and white, and writing by the user is performed using a color other than black.

画像処理装置２０のハードウェア構成の一例について説明する。図２は、画像処理装置２０のハードウェア構成の一例を示す図である。 An example of the hardware configuration of the image processing apparatus 20 will be described. FIG. 2 is a diagram illustrating an example of a hardware configuration of the image processing apparatus 20.

画像処理装置２０は、入出力部２０１、ＲＯＭ（Read Only Memory）２０２、ＣＰＵ（Central Processing Unit）２０３、ＲＡＭ（Random Access Memory）２０４、及びＨＤＤ（Hard Disk Drive）２０５を備える。 The image processing apparatus 20 includes an input / output unit 201, a ROM (Read Only Memory) 202, a CPU (Central Processing Unit) 203, a RAM (Random Access Memory) 204, and an HDD (Hard Disk Drive) 205.

入出力部２０１は、通信装置１０、画像読取装置３０、及び画像出力装置４０とデータの送受信を行う。ＲＯＭ２０２は、画像読取装置３０から受け付けた画像データからファイル名を作成するプログラム（詳細は、後述する）等を格納する。ＣＰＵ２０３は、ＲＯＭ２０２に格納されたプログラムを読み込んで実行する。ＲＡＭ２０４は、プログラムを実行する際に使用される一時的なデータを保存する。ＨＤＤ２０５は、画像読取装置３０から受け付けた画像データ、画像データからファイル名を作成するプログラムで使用される挿入系記号に係るデータ（詳細は後述する）等を記憶する。 The input / output unit 201 transmits / receives data to / from the communication device 10, the image reading device 30, and the image output device 40. The ROM 202 stores a program for creating a file name from image data received from the image reading device 30 (details will be described later) and the like. The CPU 203 reads and executes a program stored in the ROM 202. The RAM 204 stores temporary data used when executing the program. The HDD 205 stores image data received from the image reading device 30, data relating to insertion system symbols used in a program for creating a file name from the image data (details will be described later), and the like.

次に、画像処理装置２０が備える機能の一例について説明する。図３は、画像処理装置２０が備える機能の一例を示す機能ブロック図である。画像処理装置２０は、色分解部２１１、文字列抽出部２１２、修正部２１３、記号格納部２１４、他色文字認識部２１５、ファイル名決定部２１６、及び辞書格納部２１７を備える。記号格納部２１４及び辞書格納部２１７は、例えば、ＨＤＤ２０５で構成される。色分解部２１１、文字列抽出部２１２、修正部２１３、他色文字認識部２１５、及びファイル名決定部２１６は、ＲＯＭ２０２に格納されたプログラムのＣＰＵ２０３による演算によって実現される。なお、ファイル名決定部２１６は、文字列決定部の一例である。 Next, an example of functions provided in the image processing apparatus 20 will be described. FIG. 3 is a functional block diagram illustrating an example of functions provided in the image processing apparatus 20. The image processing apparatus 20 includes a color separation unit 211, a character string extraction unit 212, a correction unit 213, a symbol storage unit 214, an other color character recognition unit 215, a file name determination unit 216, and a dictionary storage unit 217. The symbol storage unit 214 and the dictionary storage unit 217 are configured by the HDD 205, for example. The color separation unit 211, the character string extraction unit 212, the correction unit 213, the other color character recognition unit 215, and the file name determination unit 216 are realized by calculation by the CPU 203 of a program stored in the ROM 202. The file name determination unit 216 is an example of a character string determination unit.

色分解部２１１は、画像読取装置３０からファイル名を作成する対象となる画像データを受け付ける。なお、色分解部２１１は、ＨＤＤ２０５等に保存されている画像データを、ファイル名を作成する対象となる画像データとして取得してもよい。 The color separation unit 211 receives image data for which a file name is to be created from the image reading device 30. Note that the color separation unit 211 may acquire the image data stored in the HDD 205 or the like as image data for which a file name is to be created.

色分解部２１１は、画像読取装置３０から受け付けた画像データを、例えば、ＣＭＹＫ（Ｃ：シアン、Ｍ：マゼンタ、Ｙ：イエロー、Ｋ：クロ）の各色成分の画像データに分解する。あるいは、色分解部２１１は、Ｋ色の画像データとＫ色以外の色成分の画像データ（以後、他色の画像データと記載する）とに画像データを分解してもよい。例えば、色分解部２１１が、図４（Ａ）に示す画像データを画像読取装置３０から受け付けたとする。ここで、図４（Ａ）においては、「標準化規格会議議事録」及び「■日時：２０１０／１０／１５」以降の文字が、あらかじめ元の原稿に印刷されており、それ以外の文字や図形（図４（Ｃ））が、ユーザによって書き込まれたものとする。また、「標準化規格会議議事録」及び「■日時：２０１０／１０／１５」以降の文字の色は、Ｋ色である。この場合、色分解部２１１は、Ｋ色の画像データ（図４（Ｂ））と、他色の画像データ（図４（Ｃ））とに、画像読取装置３０から受け付けた画像データを分解する。 The color separation unit 211 separates the image data received from the image reading device 30 into image data of each color component of, for example, CMYK (C: cyan, M: magenta, Y: yellow, K: black). Alternatively, the color separation unit 211 may separate the image data into K-color image data and image data of color components other than the K color (hereinafter referred to as other color image data). For example, it is assumed that the color separation unit 211 receives the image data illustrated in FIG. Here, in FIG. 4A, characters after “standardization meeting minutes” and “■ date and time: 2010/10/15” are printed on the original document in advance, and other characters and graphics are used. (FIG. 4C) is written by the user. In addition, the color of characters after “standardization meeting minutes” and “■ date and time: 2010/10/15” is K color. In this case, the color separation unit 211 separates the image data received from the image reading device 30 into K-color image data (FIG. 4B) and other-color image data (FIG. 4C). .

色分解部２１１は、各色成分の画像データを文字列抽出部２１２、修正部２１３、及び他色文字認識部２１５に出力する。 The color separation unit 211 outputs the image data of each color component to the character string extraction unit 212, the correction unit 213, and the other color character recognition unit 215.

文字列抽出部２１２は、色分解部２１１から各色成分の画像データを受け付ける。文字列抽出部２１２は、各色成分の画像データのうち、例えば、Ｋ色の画像データからファイル名候補となる文字列（以下、ファイル名候補文字列と記載する）を抽出する。具体的には、文字列抽出部２１２は、Ｋ色の画像データにおいて、予め定められた領域（領域Ａと記載する）に対して文字認識を行い、ファイル名候補となる文字列を抽出する。例えば、Ｋ色の画像データに含まれる最初の一行をファイル名として利用するとする。この場合、領域Ａは、例えば、Ｋ色の画像データの左上原点から、２行目のＫ色文字列（■日時：２０１０／１０／１５）の前の行までの矩形の領域（図５（Ａ）に破線で示す）となる。なお、領域Ａの決定方法は、本実施形態に限られるものではない。 The character string extraction unit 212 receives image data of each color component from the color separation unit 211. The character string extraction unit 212 extracts, for example, a character string serving as a file name candidate (hereinafter referred to as a file name candidate character string) from the image data of K color from the image data of each color component. Specifically, the character string extraction unit 212 performs character recognition on a predetermined region (described as region A) in the K-color image data, and extracts a character string as a file name candidate. For example, assume that the first line included in the K-color image data is used as the file name. In this case, the area A is, for example, a rectangular area from the upper left origin of the K color image data to the previous line of the K color character string (■ date / time: 2010/10/15) on the second line (FIG. 5 ( A) is indicated by a broken line. Note that the method of determining the region A is not limited to this embodiment.

文字列抽出部２１２は、抽出した文字列を修正部２１３に出力する。なお、図５（Ａ）の例では、文字列抽出部２１２は、「標準化規格会議議事録」の文字列を抽出する。この場合、「標準化規格会議議事録」がファイル名候補文字列となる。また、文字列抽出部２１２は、抽出した文字列が存在する画像上の座標情報を、修正部２１３に出力する。 The character string extraction unit 212 outputs the extracted character string to the correction unit 213. In the example of FIG. 5A, the character string extraction unit 212 extracts a character string of “standardized meeting minutes”. In this case, the “standardized standard meeting minutes” is the file name candidate character string. Further, the character string extraction unit 212 outputs coordinate information on the image where the extracted character string exists to the correction unit 213.

修正部２１３は、色分解部２１１から、各色成分の画像データを受け付ける。また、修正部２１３は、文字列抽出部２１２から、ファイル名候補文字列、及び、ファイル名候補文字列が存在する画像上の座標情報を受け付ける。 The correction unit 213 receives image data of each color component from the color separation unit 211. Further, the correction unit 213 receives the file name candidate character string and the coordinate information on the image where the file name candidate character string exists from the character string extraction unit 212.

修正部２１３は、Ｋ色の画像データにおいて存在するファイル名候補文字列と、Ｋ色以外の色成分（ＣＭＹ色）の画像データに基づいて、ファイル名候補文字列を修正する。具体的には、修正部２１３は、他色の画像データにおいて、ファイル名候補文字列を含む領域（領域Ｂと記載する）に存在する図形、記号、及び文字に基づいて、ファイル名候補文字列を修正する。なお、本実施形態では、図５（Ｂ）に示すように、領域Ｂは、画像データの左上を原点として垂直方向をＹ軸、水平方向をＸ軸とした場合に、ファイル名候補文字列が存在するＹ座標（図中、点線で表示）を中心とし、垂直方向に予め定めた画素数を含む領域（図中、一点鎖線で示す）であるとする。なお、本実施形態では、領域Ａと領域Ｂとは、位置及び大きさが異なっているが、領域Ａと領域Ｂとは同一の領域でもよい。 The correcting unit 213 corrects the file name candidate character string based on the file name candidate character string existing in the K color image data and the image data of the color component other than the K color (CMY color). Specifically, the correction unit 213 uses the file name candidate character string based on graphics, symbols, and characters existing in an area including the file name candidate character string (described as region B) in the image data of other colors. To correct. In the present embodiment, as shown in FIG. 5B, the region B has a file name candidate character string when the upper left corner of the image data is the origin, the vertical direction is the Y axis, and the horizontal direction is the X axis. A region (indicated by a dashed line in the figure) including a predetermined number of pixels in the vertical direction centering on an existing Y coordinate (indicated by a dotted line in the figure). In the present embodiment, the area A and the area B are different in position and size, but the area A and the area B may be the same area.

修正部２１３は、後述する記号格納部２１４に格納されている図形が、他色の画像データの領域Ｂに存在するか否か、及び、他色の画像データに含まれる画像が、ファイル名候補文字列と重なる位置に存在するか否か判定する。領域Ｂに記号格納部２１４に格納されている図形が存在する場合、または、他色の画像データに含まれる画像がファイル名候補文字列と重なる位置に存在する場合、修正部２１３は、他色の画像データに対して文字認識処理を行うよう、他色文字認識部２１５に指示する。他色文字認識部２１５によって文字が認識された場合、修正部２１３は、認識された文字を用いて、ファイル名候補文字列を修正する。他色文字認識部２１５によって文字が認識されなかった場合、ファイル名候補文字列と他色の画像データに含まれる画像との関係に基づいて、ファイル名候補文字列を修正する。ファイル名候補文字列の修正処理の詳細については、後にフローチャートを用いて詳述する。 The correction unit 213 determines whether or not a graphic stored in a symbol storage unit 214 (to be described later) exists in the area B of the image data of other colors, and an image included in the image data of other colors is a file name candidate. It is determined whether or not it exists at a position overlapping the character string. When the graphic stored in the symbol storage unit 214 exists in the area B, or when the image included in the image data of the other color exists at a position overlapping the file name candidate character string, the correction unit 213 displays the other color. The other color character recognition unit 215 is instructed to perform character recognition processing on the image data. When the other color character recognition unit 215 recognizes the character, the correction unit 213 corrects the file name candidate character string using the recognized character. When the character is not recognized by the other color character recognition unit 215, the file name candidate character string is corrected based on the relationship between the file name candidate character string and the image included in the image data of the other color. Details of the file name candidate character string correction process will be described later with reference to a flowchart.

修正部２１３は、修正したファイル名候補文字列をファイル名決定部２１６に出力する。 The correcting unit 213 outputs the corrected file name candidate character string to the file name determining unit 216.

記号格納部２１４は、文字列に対する修正が必要なことを意味する図形に関する情報を格納する。ここで、図６を用いて記号格納部２１４が格納する情報について説明する。図６（Ａ）は、記号格納部２１４に格納される情報の一例を示す図である。 The symbol storage unit 214 stores information related to a figure that means that correction to the character string is necessary. Here, information stored in the symbol storage unit 214 will be described with reference to FIG. FIG. 6A is a diagram illustrating an example of information stored in the symbol storage unit 214.

図６（Ａ）に示すように、記号格納部２１４は、例えば、「Ｎｏ．」、「図形」、「Ｋ色文字列に対する相対位置」、「意味」、及び「手書き文字の位置」を格納する。「Ｎｏ．」は、各データを一意に識別するための番号である。 As shown in FIG. 6A, the symbol storage unit 214 stores, for example, “No.”, “figure”, “relative position with respect to the K color character string”, “meaning”, and “position of the handwritten character”. To do. “No.” is a number for uniquely identifying each data.

「図形」は、ファイル名候補文字列を含む領域Ｂに存在するか否かが判定される図形を表す。「Ｋ色文字列に対する相対位置」は、「図形」が、ファイル名候補文字列に対して相対的に上に位置するのか、下に位置するのかを表す。「意味」は、「図形」が「Ｋ色文字列に対する相対位置」に存在する場合に、ファイル名候補文字列に対して実行される処理を表す。図６（Ａ）の例では、「意味」は、ユーザが元の原稿に書き入れた文字を、いずれの位置に挿入すればよいかを表す。例えば、“右下挿入”は、図形の右下が指す位置に、手書き文字を挿入することを意味する。 “Figure” represents a figure for which it is determined whether or not it exists in the area B including the file name candidate character string. The “relative position with respect to the K color character string” indicates whether the “figure” is positioned relatively above or below the file name candidate character string. “Meaning” represents a process to be executed for a file name candidate character string when “figure” exists at “relative position with respect to the K color character string”. In the example of FIG. 6A, the “meaning” represents at which position the character written on the original document by the user should be inserted. For example, “lower right insertion” means that a handwritten character is inserted at the position indicated by the lower right of the figure.

「手書き文字の位置」は、「図形」が「Ｋ色文字列に対する相対位置」にある場合に、「図形」に対してどの位置に存在する手書きの文字を、ファイル候補文字列に対して挿入するのかを表す。例えば、“規定矩形内、左上”は、ファイル候補文字列を含む予め定めた領域（領域Ｂ）内であって、「図形」に対して左上に存在する手書きの文字をファイル候補文字列に対して挿入することを表す。 “Handwritten character position” is the position where the handwritten character is inserted in the file candidate character string when the “graphic” is “relative to the K-colored character string”. Indicates whether to do. For example, “within the specified rectangle, upper left” is a predetermined region (region B) including the file candidate character string, and the handwritten character existing at the upper left with respect to the “graphic” is the file candidate character string. To insert.

例えば、図６（Ｂ）に示す画像データが存在し、「標準化企会議議事録」のみが、Ｋ色の文字列であるとする。この場合、他色の画像データに含まれる図形は、Ｋ色の文字列に対して相対的に上の位置に存在するため、Ｎｏ．２の図形に該当する。この場合、図形の右上に存在する「画」の手書きの文字が、図形の左下が指す「企」と「会」との間に挿入される。 For example, it is assumed that the image data shown in FIG. 6B exists and only the “standard meeting minutes” is a K-color character string. In this case, since the graphic included in the image data of the other colors exists at a position relatively higher than the character string of K color, It corresponds to the figure of 2. In this case, the handwritten character of “Picture” existing at the upper right of the figure is inserted between “Plan” and “Meeting” indicated by the lower left of the figure.

また、例えば、図６（Ｃ）に示す画像データが存在し、「標準化企会議議事録」のみが、Ｋ色の文字列であるとする。この場合、他色の画像データに含まれる図形は、Ｋ色の文字列に対して相対的に下の位置に存在し、Ｎｏ．９の図形に該当する。この場合、図形の下に存在する「画」の手書きの文字が、図形の頂点の上の位置（「企」と「会」との間）に挿入される。 Further, for example, it is assumed that the image data shown in FIG. 6C exists and only the “standardized meeting minutes” is a character string of K color. In this case, the graphic included in the image data of the other color exists at a lower position relative to the character string of K color. This corresponds to 9 figures. In this case, the handwritten character of “Picture” existing below the figure is inserted at a position above the vertex of the figure (between “plan” and “meeting”).

また、例えば、図６（Ｄ）に示す画像データが存在し、「標準化企会議議事録」のみが、Ｋ色の文字列であるとする。この場合、他色の画像データに含まれる図形は、Ｋ色の文字列に対して相対的に上の位置に存在し、Ｎｏ．１０の図形に該当する。この場合、図形の上に存在する「画」の手書きの文字が、図形の頂点の下の位置（「企」と「会」との間）に挿入される。 Further, for example, it is assumed that the image data shown in FIG. 6D exists and only the “standardized meeting minutes” is a K-color character string. In this case, the graphic included in the image data of the other color exists at a position relatively higher than the character string of K color. Corresponds to 10 figures. In this case, the handwritten character of “draw” existing on the figure is inserted at a position below the vertex of the figure (between “plan” and “meeting”).

図３に戻り説明を続ける。他色文字認識部２１５は、色分解部２１１から、各色成分の画像データを受け付ける。また、他色文字認識部２１５は、他色の画像データに対する文字認識処理の実行指示を修正部２１３から受け付ける。他色文字認識部２１５は、文字列の抽出が行われたＫ色以外の色成分の画像データにおいて、文字認識処理を行う。他色文字認識部２１５は、認識した文字を修正部２１３に出力する。 Returning to FIG. 3, the description will be continued. The other color character recognition unit 215 receives image data of each color component from the color separation unit 211. In addition, the other color character recognition unit 215 receives an instruction to perform character recognition processing for image data of other colors from the correction unit 213. The other color character recognition unit 215 performs character recognition processing on image data of color components other than the K color from which the character string has been extracted. The other color character recognition unit 215 outputs the recognized character to the correction unit 213.

辞書格納部２１７は、過去にファイル名として使用された文字列を登録したユーザ辞書を格納する。図７は、ユーザ辞書の一例を示す図である。図７に示すように、ユーザ辞書は、「文字列」と「登録日」とを項目として有している。「文字列」には、ユーザが過去にファイル名として使用した文字列が格納される。「登録日」には、「文字列」が登録された日付が格納される。例えば、「標準化会議議事録」という文字列は、２０１０年１０月１０日に登録されている。 The dictionary storage unit 217 stores a user dictionary in which character strings used as file names in the past are registered. FIG. 7 is a diagram illustrating an example of a user dictionary. As shown in FIG. 7, the user dictionary has “character string” and “registration date” as items. The “character string” stores a character string that the user has used as a file name in the past. The date when the “character string” is registered is stored in the “registration date”. For example, the character string “minutes of standardization meeting” is registered on October 10, 2010.

ファイル名決定部２１６は、修正部２１３からファイル名候補文字列を受け付ける。ファイル名決定部２１６は、ファイル名候補文字列が、辞書格納部２１７に格納されたユーザ辞書に存在するか否か判定する。ファイル名候補文字列が辞書格納部２１７に格納されたユーザ辞書に存在する場合、ファイル名決定部２１６は、修正部２１３から受け付けたファイル名候補文字列をユーザに提示する。ユーザが提示したファイル名候補文字列を承認した場合、修正部２１３は、ファイル名候補文字列をファイル名として決定し、通信装置１０に出力する。 The file name determination unit 216 receives a file name candidate character string from the correction unit 213. The file name determination unit 216 determines whether the file name candidate character string exists in the user dictionary stored in the dictionary storage unit 217. When the file name candidate character string exists in the user dictionary stored in the dictionary storage unit 217, the file name determination unit 216 presents the file name candidate character string received from the correction unit 213 to the user. When the file name candidate character string presented by the user is approved, the correcting unit 213 determines the file name candidate character string as a file name and outputs the file name candidate character string to the communication device 10.

また、ファイル名候補文字列が辞書格納部２１７に格納された辞書に存在しない場合、ファイル名決定部２１６は、辞書格納部２１７に格納された辞書に存在する文字列の中から、ファイル名候補文字列と類似度が高い文字列を取得する。ファイル名決定部２１６は、取得した文字列と、ファイル名候補文字列とをユーザに提示し、ファイル名の選択を受け付ける。ファイル名決定部２１６は、選択されたファイル名を通信装置１０に出力する。通信装置１０は、画像読取装置３０で読み取った画像データに、ファイル名決定部２１６が決定したファイル名を付して、情報処理装置２００に送信する。なお、ファイル名候補文字列と、ユーザ辞書に存在する文字列との類似度は、例えば、特開２０１１−８５５３号公報等に記載の技術の他、従来から存在する技術を用いて算出される。 When the file name candidate character string does not exist in the dictionary stored in the dictionary storage unit 217, the file name determination unit 216 selects the file name candidate from the character strings existing in the dictionary stored in the dictionary storage unit 217. Get a string with high similarity to the string. The file name determination unit 216 presents the acquired character string and the file name candidate character string to the user, and accepts selection of the file name. The file name determination unit 216 outputs the selected file name to the communication device 10. The communication device 10 attaches the file name determined by the file name determination unit 216 to the image data read by the image reading device 30 and transmits the image data to the information processing device 200. Note that the similarity between the file name candidate character string and the character string existing in the user dictionary is calculated using, for example, the technique described in Japanese Patent Application Laid-Open No. 2011-8553, etc., and the conventional technique. .

次に、画像処理装置２０が実行するファイル名の決定処理の一例について説明する。図８及び９は、画像処理装置２０が実行する処理の一例を示すフローチャートである。 Next, an example of a file name determination process executed by the image processing apparatus 20 will be described. 8 and 9 are flowcharts illustrating an example of processing executed by the image processing apparatus 20.

まず、画像処理装置２０は、画像読取装置３０が読み取った画像データに対して付すファイル名を、ユーザがマニュアルで入力するのか、画像処理装置２０が自動で決定するのかについてファイル名の決定方法をユーザから受け付ける（ステップＳ１１）。具体的には、画像処理装置２０は、不図示の表示装置に表示された操作画面を介して、ユーザからファイル名の決定方法を受け付ける。 First, the image processing apparatus 20 uses a file name determination method as to whether a user manually inputs a file name to be assigned to image data read by the image reading apparatus 30 or whether the image processing apparatus 20 determines automatically. Accept from the user (step S11). Specifically, the image processing apparatus 20 receives a file name determination method from the user via an operation screen displayed on a display device (not shown).

次に、色分解部２１１は、ファイル名を画像処理装置２０が自動で決定するファイル名の自動決定が選択されたか否か判定する（ステップＳ１３）。 Next, the color separation unit 211 determines whether or not automatic file name determination in which the image processing apparatus 20 automatically determines the file name is selected (step S13).

ファイル名の自動決定が選択されていない場合（ステップＳ１３／ＮＯ）、ファイル名決定部２１６は、操作画面を介してファイル名の入力を受け付け（ステップＳ４７）、本処理を終了する。 If automatic determination of the file name is not selected (step S13 / NO), the file name determination unit 216 accepts input of the file name via the operation screen (step S47), and ends this process.

ファイル名の自動決定が選択された場合（ステップＳ１３／ＹＥＳ）、色分解部２１１は、画像読取装置３０から受け付けた画像を、ＣＭＹＫの各色成分に分解する。次に、色分解部２１１は、各色成分に分解された画像に対し２値化処理を行い、色成分毎の２値化画像を生成する（ステップＳ１７）。この２値化画像のデータが、各色成分の画像データとなる。 When the automatic determination of the file name is selected (step S13 / YES), the color separation unit 211 separates the image received from the image reading device 30 into each color component of CMYK. Next, the color separation unit 211 performs binarization processing on the image separated into each color component, and generates a binarized image for each color component (step S17). This binarized image data becomes image data of each color component.

次に、文字列抽出部２１２が、ファイル名候補文字列を抽出する画像データとして、Ｋ色の画像データを選択する（ステップＳ１９）。文字列抽出部２１２は、Ｋ色の画像データの領域Ａに対して文字認識処理を行い、ファイル名候補文字列を抽出する（ステップＳ２１）。 Next, the character string extraction unit 212 selects K color image data as the image data from which the file name candidate character string is extracted (step S19). The character string extraction unit 212 performs character recognition processing on the area A of the K color image data, and extracts a file name candidate character string (step S21).

次に、修正部２１３は、ファイル名候補文字列が含まれる矩形領域Ｂを決定する（ステップＳ２３）。修正部２１３は、ＣＭＹ各色成分の画像データにおいて、矩形領域Ｂ内にＣＭＹ色の画像（図形画像や文字画像等）が存在するか否か判定する（ステップＳ２５）。領域Ｂ内に、ＣＭＹ色の画像が存在しない場合（ステップＳ２５／ＮＯ）、修正部２１３は、ファイル名候補文字列に対する修正は無いと判断する（ステップＳ２７）。修正部２１３は、ステップＳ２１で抽出したファイル名候補文字列を、ファイル名候補１として、例えばＨＤＤ２０５に保存する（ステップＳ４５）。 Next, the correcting unit 213 determines a rectangular area B including the file name candidate character string (step S23). The correction unit 213 determines whether a CMY color image (graphic image, character image, or the like) exists in the rectangular area B in the image data of each CMY color component (step S25). When no CMY color image exists in the area B (step S25 / NO), the correcting unit 213 determines that there is no correction to the file name candidate character string (step S27). The correcting unit 213 stores the file name candidate character string extracted in step S21 as the file name candidate 1, for example, in the HDD 205 (step S45).

矩形領域Ｂ内にＣＭＹ色の画像が存在する場合（ステップＳ２５／ＹＥＳ）、修正部２１３は、他色（ＣＭＹ色のいずれか）の画像が、ステップＳ２１で取得したＫ色の文字列に重なるか否か判定する（ステップＳ２７）。 When the CMY color image exists in the rectangular area B (step S25 / YES), the correction unit 213 overlaps the K color character string acquired in step S21 with the other color image (any one of the CMY colors). Whether or not (step S27).

他色（ＣＭＹ色のいずれか）の画像が、Ｋ色の文字列に重なる場合（ステップＳ２７／ＹＥＳ）、修正部２１３は、他色の画像と重なるＫ色の文字を修正対象と判断する（ステップＳ２９）。 When the image of the other color (any one of CMY colors) overlaps with the K character string (step S27 / YES), the correction unit 213 determines that the K character that overlaps the image of the other color is a correction target ( Step S29).

次に、他色文字認識部２１５が、他色の画像と重なるＫ色文字を含む領域Ｃで、他色の画像データの文字認識処理を行う（ステップＳ３１）。ここで、本実施形態では、他色の画像と重なるＫ色文字を含む領域Ｃは、例えば、他色の画像と重なるＫ色文字が存在するＹ座標から垂直方向に予め定めた画素数を含む領域であるとするが、領域Ｃは任意の領域に設定可能である。 Next, the other color character recognizing unit 215 performs character recognition processing of the image data of the other color in the region C including the K color character overlapping with the image of the other color (step S31). Here, in the present embodiment, the region C including the K-color character overlapping with the image of the other color includes, for example, a predetermined number of pixels in the vertical direction from the Y coordinate where the K-color character overlapping with the image of the other color exists. Although it is an area, the area C can be set to an arbitrary area.

修正部２１３は、他色の画像データにおいて、文字が認識されたか否か判定する（ステップＳ３３）。 The correcting unit 213 determines whether or not a character is recognized in the image data of other colors (step S33).

他色の画像データにおいて文字が認識されなかった場合（ステップＳ３３／ＮＯ）、修正部２１３は、他色と重なるＫ色文字を文字列から削除する（ステップＳ３５）。他色の画像データにおいて文字が認識された場合（ステップＳ３３／ＹＥＳ）、修正部２１３は、他色と重なるＫ色文字を、ステップＳ３１で認識した他色の文字で置換する（ステップＳ３７）。 When the character is not recognized in the image data of the other color (step S33 / NO), the correcting unit 213 deletes the K color character overlapping with the other color from the character string (step S35). When the character is recognized in the image data of the other color (step S33 / YES), the correction unit 213 replaces the K color character overlapping with the other color with the character of the other color recognized in step S31 (step S37).

次に、修正部２１３は、矩形領域Ｂ内に挿入系記号が存在するか否か判定する（ステップＳ３９）。ここで、挿入系記号とは、図４（Ａ）に示すように、記号格納部２１４に格納される、文字と文字との間に他の文字の挿入することを表す記号をいう。言い換えれば、修正部２１３は、記号格納部２１４に格納された記号が、矩形領域Ｂ内に存在するか否か判定する。 Next, the correcting unit 213 determines whether or not there is an insertion system symbol in the rectangular area B (step S39). Here, as shown in FIG. 4A, the insertion-type symbol refers to a symbol that is stored in the symbol storage unit 214 and represents insertion of another character between characters. In other words, the correction unit 213 determines whether or not the symbol stored in the symbol storage unit 214 exists in the rectangular area B.

矩形領域Ｂ内に挿入系記号が存在する場合（ステップＳ３９／ＹＥＳ）、他色文字認識部２１５は、他色の画像データの領域Ｂにおいて、文字認識処理を行い、他色文字を認識する（ステップＳ４１）。そして、修正部２１３は、ステップＳ４１で認識された他色文字のうち、図６（Ａ）の「手書き文字の位置」に存在する文字を、記号が示すＫ色文字列の文字間に挿入する（ステップＳ４３）。 When an insertion symbol exists in the rectangular area B (step S39 / YES), the other color character recognition unit 215 performs character recognition processing in the area B of the image data of the other color to recognize the other color character ( Step S41). Then, the correcting unit 213 inserts the character existing at the “position of the handwritten character” in FIG. 6A among the other color characters recognized in step S41 between the characters of the K color character string indicated by the symbol. (Step S43).

ファイル名決定部２１６は、ステップＳ２７〜ステップＳ４３の処理で得られた文字列を、ファイル名候補１として保存する（ステップＳ４７）。 The file name determination unit 216 stores the character string obtained by the processes in steps S27 to S43 as the file name candidate 1 (step S47).

ここで、図１０を用いて、ステップＳ１１〜ステップＳ４７の処理によって得られるファイル名候補１の例について説明する。図１０は、画像読取装置３０から受け付けた各画像データに対し、ステップＳ１１〜ステップＳ４７の処理を実行した場合に得られるファイル名候補１を説明するための図である。 Here, an example of the file name candidate 1 obtained by the processing in steps S11 to S47 will be described with reference to FIG. FIG. 10 is a diagram for explaining the file name candidate 1 obtained when the processing in steps S11 to S47 is executed for each image data received from the image reading device 30. FIG.

図１０のＡの例において、元画像データ中、「標準化規格会議議事録」の文字のみが、Ｋ色であったと仮定する。この場合、ステップＳ２１で抽出されるファイル名候補文字列は、「標準化規格会議議事録」となる。Ａの例では、領域Ｂに他色の画像が存在するため(ステップＳ２５／ＹＥＳ)、修正部２１３は、他色画像がＫ色の文字列に重なるか否か判定する（ステップＳ２７）。Ａの例では、他色画像（削除を表す「Ｘ」の図形画像）が、Ｋ色文字列のうち、「規格」と重なっているため（ステップＳ２７）、修正部２１３は、他色画像と重なるＫ色文字「規格」を修正対象と判断する（ステップＳ２９）。そして、他色文字認識部２１５は、他色画像と重なるＫ色文字を含む領域Ｃに対して、文字認識処理を行う（ステップＳ３１）。Ａの例では、他色の画像データにおいて、「規格」を含む領域Ｃに対して文字認識処理が行われ、「企画」の文字が他色文字として認識される（ステップＳ３３／ＹＥＳ）。修正部２１３は、他色画像と重なるＫ色文字（規格）を、他色文字（企画）で置換する（ステップＳ３７）。Ａの例では、領域Ｂに挿入系記号が存在しないため（ステップＳ３９／ＮＯ）、ステップＳ４１の処理は行われない。この結果、Ａの元画像データに基づくファイル名候補１は「標準化企画会議議事録」となる。 In the example of FIG. 10A, it is assumed that only the characters of “standardized meeting minutes” in the original image data are K color. In this case, the file name candidate character string extracted in step S21 becomes “standardized standard meeting minutes”. In the example of A, since an image of another color exists in the region B (step S25 / YES), the correcting unit 213 determines whether or not the other color image overlaps with the K character string (step S27). In the example of A, the other color image (“X” graphic image representing deletion) overlaps “standard” in the K color character string (step S27). The overlapping K-color character “standard” is determined to be corrected (step S29). Then, the other color character recognition unit 215 performs a character recognition process on the region C including the K color character overlapping the other color image (step S31). In the example of A, the character recognition process is performed on the area C including “standard” in the image data of other colors, and the character “plan” is recognized as the other color characters (YES in step S33). The correcting unit 213 replaces the K color character (standard) overlapping the other color image with the other color character (plan) (step S37). In the example of A, since no insertion system symbol exists in the region B (step S39 / NO), the process of step S41 is not performed. As a result, the file name candidate 1 based on the original image data of A becomes “standardization planning meeting minutes”.

続いて、Ｂの例において、元画像データ中、「標準化企画会議議事録」の文字のみがＫ色であったと仮定する。この場合、ステップＳ２１で抽出されるファイル名候補文字列は、「標準化企画会議議事録」である。Ｂの例では、領域Ｂに他色の画像が存在するため（ステップＳ２５／ＹＥＳ）、修正部２１３は、他色画像がＫ色の文字列に重なるか否か判定する（ステップＳ２７）。Ｂの例では、他色画像（削除を表す「Ｘ」の図形画像）が、Ｋ色文字列のうち、「規格」と重なっているため（ステップＳ２７）、修正部２１３は、他色画像と重なるＫ色文字「規格」を修正対象と判断する（ステップＳ２９）。そして、他色文字認識部２１５は、他色画像と重なるＫ色文字を含む領域Ｃに対して、文字認識処理を行う（ステップＳ３１）。Ｂの例では、他色の画像データにおいて、「規格」を含む領域Ｃに対して文字認識処理が行われるが、領域Ｃに他色文字が存在しないため（ステップＳ３１／ＮＯ）、修正部２１３は、他色の画像と重なるＫ色文字（規格）を、文字列から削除する（ステップＳ３５）。Ｂの例では、領域Ｂに挿入系記号が存在しないため（ステップＳ３９／ＮＯ）、ステップＳ４１の処理は行われない。この結果、Ｂの元画像データに基づくファイル名候補１は「標準化会議議事録」となる。 Subsequently, in the example B, it is assumed that only the characters of “standardization planning meeting minutes” are K color in the original image data. In this case, the file name candidate character string extracted in step S 21 is “standardization planning meeting minutes”. In the example of B, since an image of another color exists in the region B (step S25 / YES), the correcting unit 213 determines whether or not the other color image overlaps the K character string (step S27). In the example of B, the other color image (“X” graphic image representing deletion) overlaps “standard” in the K color character string (step S27). The overlapping K-color character “standard” is determined to be corrected (step S29). Then, the other color character recognition unit 215 performs a character recognition process on the region C including the K color character overlapping the other color image (step S31). In the example of B, character recognition processing is performed on the area C including “standard” in the image data of other colors, but there is no other color character in the area C (step S31 / NO). Deletes the K color character (standard) overlapping the image of the other color from the character string (step S35). In the example of B, since there is no insertion system symbol in the region B (step S39 / NO), the process of step S41 is not performed. As a result, the file name candidate 1 based on the original image data of B becomes “standardized meeting minutes”.

さらに、Ｃの例において、元画像データ中、「標準化格会議議事録」の文字のみがＫ色であったと仮定する。この場合、ステップＳ２１で抽出されるファイル名候補文字列は、「標準化格会議議事録」である。Ｃの例では、領域Ｂに他色の画像が存在するため（ステップＳ２５／ＹＥＳ）、修正部２１３は、他色画像がＫ色の文字列に重なるか否か判定する（ステップＳ２７）。Ｃの例では、他色画像（削除を表す「Ｘ」の図形画像）が、Ｋ色文字列のうち、「格」と重なっているため（ステップＳ２７／ＹＥＳ）、修正部２１３は、他色画像と重なるＫ色文字「規格」を修正対象と判断する（ステップＳ２９）。そして、他色文字認識部２１５は、他色画像と重なるＫ色文字を含む領域Ｃに対して、文字認識処理を行う（ステップＳ３１）。Ｃの例では、他色の画像データにおいて、「格」を含む領域Ｃに対して文字認識が行われ、「画」の文字が他色文字として認識される（ステップＳ３３／ＹＥＳ）。修正部２１３は、他色画像と重なるＫ色文字（格）を、他色文字（画）で置換する（ステップＳ３７）。ステップＳ３７の処理後のファイル名候補文字列は「標準化画会議議事録」となる。また、Ｃの例では、領域Ｂに挿入系記号が存在するため（ステップＳ３９／ＹＥＳ）、他色文字認識部２１５は、他色の画像データにおいて、領域Ｂに対して文字認識処理を行う（ステップＳ４１）。ステップＳ４１の処理の結果、「企」及び「画」の文字が他色文字として認識される。修正部２１３は、「企」及び「画」のうち、挿入系記号に対し左上に存在する「企」の文字を、図形が示す「化」と「格」の間に挿入する（ステップＳ４３）。この結果、Ｃの元画像データに基づくファイル名候補１は、「標準化企画会議議事録」となる。 Further, in the example of C, it is assumed that only the characters of “standardized meeting minutes” are K color in the original image data. In this case, the file name candidate character string extracted in step S21 is “standardized meeting minutes”. In the example of C, since an image of another color exists in the region B (step S25 / YES), the correcting unit 213 determines whether or not the other color image overlaps the K character string (step S27). In the example of C, since the other color image (“X” graphic image representing deletion) overlaps “case” in the K color character string (step S27 / YES), the correcting unit 213 uses the other color. The K-color character “standard” overlapping the image is determined as a correction target (step S29). Then, the other color character recognition unit 215 performs a character recognition process on the region C including the K color character overlapping the other color image (step S31). In the example of C, character recognition is performed for the region C including “case” in the image data of other colors, and the character of “image” is recognized as the other color characters (step S33 / YES). The correcting unit 213 replaces the K color character (case) overlapping the other color image with the other color character (image) (step S37). The file name candidate character string after the process of step S37 becomes “standardized image meeting minutes”. In the example of C, since an insertion symbol exists in the area B (step S39 / YES), the other color character recognition unit 215 performs character recognition processing on the area B in the image data of other colors ( Step S41). As a result of the processing in step S41, the characters “plan” and “draw” are recognized as other color characters. The correcting unit 213 inserts the character of “plan” existing in the upper left of the insertion system symbol between “plan” and “draw” between “formation” and “case” indicated by the graphic (step S43). . As a result, the file name candidate 1 based on the original image data of C becomes “standardization planning meeting minutes”.

次に、図６を参照し、ステップＳ４５及びＳ４７以降の処理について説明する。ファイル名決定部２１６は、ファイル名候補１を辞書格納部２１７に格納されているユーザ辞書と比較する（ステップＳ５１）。ファイル名決定部２１６は、ファイル名候補１がユーザ辞書に存在するか否か判定する（ステップＳ５３）。 Next, with reference to FIG. 6, the process after step S45 and S47 is demonstrated. The file name determination unit 216 compares the file name candidate 1 with the user dictionary stored in the dictionary storage unit 217 (step S51). The file name determination unit 216 determines whether the file name candidate 1 exists in the user dictionary (step S53).

ファイル名候補１がユーザ辞書に存在する場合（ステップＳ５３／ＹＥＳ）、ファイル名決定部２１６は、ファイル名候補１のみをユーザに提示する（ステップＳ５５）。例えば、ファイル名決定部２１６は、画像形成装置３００が備える表示装置に、ファイル名候補１を表示する。 When the file name candidate 1 exists in the user dictionary (step S53 / YES), the file name determination unit 216 presents only the file name candidate 1 to the user (step S55). For example, the file name determination unit 216 displays the file name candidate 1 on the display device included in the image forming apparatus 300.

次に、ファイル名候補１は、ステップＳ５５で提示したファイル名候補１が、ユーザによりファイル名として承認されたか否か判定する（ステップＳ５７）。ファイル名候補１がファイル名としてユーザに承認された場合（ステップＳ５７／ＹＥＳ）、ファイル名決定部２１６は、ファイル名候補１をファイル名として決定し（ステップＳ５９）、処理を終了する。 Next, the file name candidate 1 determines whether or not the file name candidate 1 presented in step S55 has been approved as a file name by the user (step S57). When the file name candidate 1 is approved by the user as a file name (step S57 / YES), the file name determination unit 216 determines the file name candidate 1 as a file name (step S59) and ends the process.

ファイル名候補１が辞書に存在しない場合（ステップＳ５３／ＮＯ）、ファイル名決定部２１６は、ユーザ辞書から、ファイル名候補１との類似度が高い単語を検索し、ファイル名候補２とする（ステップＳ６１）。次に、ファイル名決定部２１６は、ファイル名候補１及びファイル名候補２をユーザに提示する（ステップＳ６３）。例えば、ファイル名候補１が「標準化会議議事録」であったとする。この場合に、図７に示したユーザ辞書に、「標準化会議議事録」は存在しない。そこで、ファイル名決定部２１６は、「標準化会議議事録」との類似度が高い「標準会議議事録」をファイル名候補２とする。そして、ファイル名決定部２１６は、「標準化会議議事録」と「標準会議議事録」とを、ユーザに提示する。 When the file name candidate 1 does not exist in the dictionary (step S53 / NO), the file name determination unit 216 searches the user dictionary for a word having a high similarity to the file name candidate 1 and sets it as the file name candidate 2 ( Step S61). Next, the file name determination unit 216 presents the file name candidate 1 and the file name candidate 2 to the user (step S63). For example, it is assumed that the file name candidate 1 is “standardized meeting minutes”. In this case, there is no “standardized meeting minutes” in the user dictionary shown in FIG. Therefore, the file name determination unit 216 sets the “standard meeting minutes” having a high similarity to the “standard meeting minutes” as the file name candidate 2. Then, the file name determination unit 216 presents “standardized meeting minutes” and “standard meeting minutes” to the user.

次に、ファイル名決定部２１６は、ファイル名候補１がユーザにより選択されたか否か判定する（ステップＳ６５）。ファイル名決定部２１６は、ファイル名候補１がユーザにより選択された場合（ステップＳ６５）、ファイル名候補１をユーザ辞書に追記する（ステップＳ６７）。そして、ファイル名決定部２１６は、ファイル名候補１をファイル名として決定し（ステップＳ６９）、処理を終了する。 Next, the file name determination unit 216 determines whether or not the file name candidate 1 has been selected by the user (step S65). When the file name candidate 1 is selected by the user (step S65), the file name determination unit 216 adds the file name candidate 1 to the user dictionary (step S67). Then, the file name determination unit 216 determines the file name candidate 1 as the file name (step S69), and ends the process.

ファイル名候補１がユーザにより選択されなかった場合（ステップＳ６５／ＮＯ）、ファイル名決定部２１６は、ファイル名候補２が選択されたか否か判定する（ステップＳ７１）。ファイル名候補２が選択された場合（ステップＳ７１／ＹＥＳ）、ファイル名決定部２１６は、ファイル名候補２をファイル名として決定し（ステップＳ７３）、処理を終了する。なお、ファイル名候補２は、ユーザ辞書に既に登録されているため、ファイル名決定部２１６はユーザ辞書への登録を行わない。 When the file name candidate 1 is not selected by the user (step S65 / NO), the file name determination unit 216 determines whether the file name candidate 2 is selected (step S71). When the file name candidate 2 is selected (step S71 / YES), the file name determination unit 216 determines the file name candidate 2 as the file name (step S73), and ends the process. Since the file name candidate 2 has already been registered in the user dictionary, the file name determination unit 216 does not register in the user dictionary.

ファイル名候補２が選択されなかった場合（ステップＳ７１／ＮＯ）、又は、ステップＳ５７において、ユーザがファイル名候補１をファイル名として承認しなかった場合（ステップＳ５７／ＮＯ）、ファイル名決定部２１６は、ファイル名の入力をユーザから受け付ける（ステップＳ７５）。 If the file name candidate 2 is not selected (step S71 / NO), or if the user does not approve the file name candidate 1 as a file name in step S57 (step S57 / NO), the file name determination unit 216 Accepts input of a file name from the user (step S75).

ファイル名決定部２１６は、入力されたファイル名をユーザ辞書に追記する（ステップＳ７７）。そして、ファイル名決定部２１６は、入力されたファイル名をファイル名として決定し（ステップＳ７９）、処理を終了する。 The file name determination unit 216 adds the input file name to the user dictionary (step S77). Then, the file name determination unit 216 determines the input file name as a file name (step S79), and ends the process.

以上の説明から明らかなように、上述の実施形態によれば、色分解部２１１は、ユーザによる書き込みがなされた原稿を読み取って生成した画像データを複数の色成分に分解し、文字列抽出部２１２は、Ｋ色成分の画像データにおいて、予め定めた領域Ａに対して文字認識処理を行い、ファイル名候補となる文字列を抽出する。これにより、Ｋ色で印刷された元の原稿に対して、Ｋ色以外の色（例えば、赤や青）を用いてユーザによる書き込みがなされた場合でも、ファイル名候補となる文字列が正確に認識される。例えば、色分解していない図４（Ａ）に示す原稿画像に対し文字認識処理を行った場合、Ｋ色以外の色で書かれた「企画」の文字や、削除を表す「Ｘ」の図形も文字の一部として認識されてしまうため、ファイル名候補文字列を正しく認識できない。例えば、図４（Ａ）の例では、「標準化規格会議議事録」の文字を含む画像データに対し文字認識処理を行うと、「標準化硫会会議議事録」と認識されてしまう。一方、本実施形態によれば、文字列抽出部２１２は、色分解されたＫ色の画像データに対して文字認識処理を行うため、ユーザにより書き込まれた図形や文字の影響を受けずに文字認識処理が実行される。これにより、ユーザによる書き込みがなされた場合でも、ファイル名候補となる文字列が正しく認識されるため、文字認識処理の精度が向上する。 As is clear from the above description, according to the above-described embodiment, the color separation unit 211 separates image data generated by reading a document written by the user into a plurality of color components, and a character string extraction unit. A character recognition process 212 is performed on a predetermined region A in the image data of the K color component, and a character string that becomes a file name candidate is extracted. As a result, even if the original document printed in the K color is written by the user using a color other than the K color (for example, red or blue), the character string that is the file name candidate is accurately determined. Be recognized. For example, when character recognition processing is performed on the original image shown in FIG. 4A that has not been color-separated, a “plan” character written in a color other than the K color or an “X” figure representing deletion Is also recognized as part of the character, the file name candidate character string cannot be recognized correctly. For example, in the example of FIG. 4A, if character recognition processing is performed on image data including characters of “standardized standard meeting minutes”, it is recognized as “standardized meeting minutes”. On the other hand, according to the present embodiment, since the character string extraction unit 212 performs character recognition processing on the color-separated K-color image data, the character string is not affected by the graphic or character written by the user. Recognition processing is executed. As a result, even when the user writes, the character string as the file name candidate is correctly recognized, so that the accuracy of the character recognition process is improved.

そして、他色文字認識部２１５は、Ｋ色成分とは異なる他色の画像データにおいてファイル名候補文字列を含む領域Ｂに存在する他色の画像が、ファイル名候補文字列と重なる場合、または、他色の画像データにおいて領域Ｂに記号格納部２１４に格納された図形画像が存在する場合に、他色の画像データの領域Ｂに対して文字認識処理を行い、他色文字を認識する。修正部２１３は、他色の画像とファイル名候補文字列との関係、及び、他色文字認識部２１５により認識された他色文字の少なくともいずれかに基づいて、ファイル名候補文字列を修正する。これにより、Ｋ色の文字列に対してユーザがＫ色以外の色を用いて行った書き込みが反映されたファイル名候補文字列が取得される。 And the other color character recognition unit 215, when the image of the other color existing in the region B including the file name candidate character string in the image data of the other color different from the K color component overlaps the file name candidate character string, or When there is a graphic image stored in the symbol storage unit 214 in the area B in the image data of the other color, the character recognition process is performed on the area B of the image data of the other color to recognize the other color character. The correction unit 213 corrects the file name candidate character string based on the relationship between the image of the other color and the file name candidate character string and at least one of the other color characters recognized by the other color character recognition unit 215. . Thereby, the file name candidate character string reflecting the writing performed by the user using a color other than the K color with respect to the K color character string is acquired.

上述の実施形態では、修正部２１３は、ファイル名候補文字列に他色の画像が重なる場合、ファイル名候補文字列に含まれる文字のうち、他色の画像と重なる文字を、ファイル名文字列から削除する。これにより、ユーザが元の原稿に対して行った、削除を意図する書き込みを反映したファイル名候補文字列が取得される。 In the above-described embodiment, when the image of another color overlaps the file name candidate character string, the correction unit 213 converts the character overlapping the image of the other color among the characters included in the file name candidate character string. Delete from. As a result, a file name candidate character string reflecting a write intended for deletion performed by the user on the original document is acquired.

また、修正部２１３は、ファイル候補名文字列に他色の画像が重なる場合、ファイル名候補文字列に含まれる文字のうち、他色の画像と重なる文字を、他色文字認識部２１５が認識した他色文字によって置換する。これにより、ユーザが元の原稿に対して行った、置換を意図する書き込みを反映したファイル名候補文字列が取得される。 In addition, when the image of another color overlaps the file candidate name character string, the correction unit 213 recognizes the character that overlaps the image of the other color among the characters included in the file name candidate character string by the other color character recognition unit 215. Replace with other color characters. As a result, a file name candidate character string reflecting the writing intended for replacement performed by the user on the original document is acquired.

また、修正部２１３は、他色の画像データにおいて、領域Ｂに挿入系記号が存在する場合、他色文字認識部２１５が認識した他色文字をファイル名候補文字列に挿入する。これにより、ユーザが元の原稿に対して行った、文字の挿入を意図する書き込みを反映したファイル名候補文字列が取得される。 Further, when there is an insertion symbol in the region B in the image data of other colors, the correcting unit 213 inserts the other color characters recognized by the other color character recognizing unit 215 into the file name candidate character string. As a result, the file name candidate character string reflecting the writing that the user has performed on the original document and intended to insert the character is acquired.

また、辞書格納部２１７は、ユーザに使用されたことがある文字列を登録したユーザ辞書を格納し、ファイル名決定部２１６は、ファイル名候補文字列が辞書格納部２１７に含まれない場合、辞書格納部２１７に格納された文字列の中から、ファイル名候補文字列と類似する文字列を検出し、ファイル名としてユーザに提示する。ファイル名候補文字列がユーザ辞書にない場合、文字列抽出部２１２及び他色文字認識部２１５が実行する文字認識処理において、文字を誤検知している可能性が高い。ユーザ辞書を用いて、ファイル名候補文字列が過去にファイル名として使用されたことのある文字列か否かを判定することにより、ファイル名の認識精度が向上する。 In addition, the dictionary storage unit 217 stores a user dictionary in which character strings that have been used by the user are registered, and the file name determination unit 216 stores the file name candidate character strings in the dictionary storage unit 217. A character string similar to the file name candidate character string is detected from the character strings stored in the dictionary storage unit 217 and presented to the user as a file name. When the file name candidate character string is not in the user dictionary, there is a high possibility that a character is erroneously detected in the character recognition processing executed by the character string extraction unit 212 and the other color character recognition unit 215. By using the user dictionary to determine whether the file name candidate character string is a character string that has been used as a file name in the past, the recognition accuracy of the file name is improved.

上述した実施の形態は、本発明の実施形態の一部である。但し、これに限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々変形実施可能である。 The above-described embodiments are a part of the embodiments of the present invention. However, the present invention is not limited to this, and various modifications can be made without departing from the scope of the present invention.

上述した実施の形態において、領域Ｂ内に記号格納部２１４に格納された図形が存在する場合、他色文字認識部２１５は、領域Ｂに対して文字認識処理を実行していた。しかしながら、他色文字認識部２１５は、例えば、挿入系記号を含む、領域Ｂとは異なる領域（例えば、領域Ｄ）を決定し、領域Ｄに対して文字認識処理を実行してもよい。 In the above-described embodiment, when the graphic stored in the symbol storage unit 214 exists in the region B, the other color character recognition unit 215 performs the character recognition process on the region B. However, the other-color character recognizing unit 215 may determine a region (for example, the region D) that includes the insertion system symbol and is different from the region B, and execute the character recognition process on the region D, for example.

上述の実施形態では、画像データ中に含まれる文字列をファイル名として利用する処理について説明した。しかしながら、元の文書原稿に対してユーザにより書き込みが行われた原稿をＯＣＲ（Optical Character Recognition）処理する場合に、本発明を適用してもよい。 In the above-described embodiment, the process of using the character string included in the image data as the file name has been described. However, the present invention may be applied to an OCR (Optical Character Recognition) process performed on a document that has been written by the user with respect to the original document document.

なお、上記の画像処理装置２０が有する機能は、ＣＰＵ、ＲＯＭ、ＲＡＭ等を備えるコンピュータによって実現することができる。その場合、画像処理装置２０が有すべき機能の処理内容を記述したプログラムが提供される。そのプログラムをコンピュータで実行することにより、上記処理機能がコンピュータ上で実現される。処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。 Note that the functions of the image processing apparatus 20 can be realized by a computer including a CPU, a ROM, a RAM, and the like. In that case, a program describing the processing contents of the functions that the image processing apparatus 20 should have is provided. By executing the program on a computer, the above processing functions are realized on the computer. The program describing the processing contents can be recorded on a computer-readable recording medium.

プログラムを流通させる場合には、例えば、そのプログラムが記録されたＤＶＤ（Digital Versatile Disc）、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）などの可搬型記録媒体の形態で販売される。また、プログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することもできる。 When the program is distributed, for example, it is sold in the form of a portable recording medium such as a DVD (Digital Versatile Disc) or a CD-ROM (Compact Disc Read Only Memory) on which the program is recorded. It is also possible to store the program in a storage device of a server computer and transfer the program from the server computer to another computer via a network.

プログラムを実行するコンピュータは、例えば、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、自己の記憶装置に格納する。そして、コンピュータは、自己の記憶装置からプログラムを読み取り、プログラムに従った処理を実行する。なお、コンピュータは、可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することもできる。また、コンピュータは、サーバコンピュータからプログラムが転送されるごとに、逐次、受け取ったプログラムに従った処理を実行することもできる。 The computer that executes the program stores, for example, the program recorded on the portable recording medium or the program transferred from the server computer in its own storage device. Then, the computer reads the program from its own storage device and executes processing according to the program. The computer can also read the program directly from the portable recording medium and execute processing according to the program. Further, each time the program is transferred from the server computer, the computer can sequentially execute processing according to the received program.

２０…画像処理装置
３０…画像読取装置
２１１…色分解部
２１２…文字列抽出部
２１３…修正部
２１４…記号格納部
２１５…他色文字認識部
２１６…ファイル名決定部
２１７…辞書格納部
５００…情報処理装置
６００…画像読取装置

DESCRIPTION OF SYMBOLS 20 ... Image processing apparatus 30 ... Image reading apparatus 211 ... Color separation part 212 ... Character string extraction part 213 ... Correction part 214 ... Symbol storage part 215 ... Other color character recognition part 216 ... File name determination part 217 ... Dictionary storage part 500 ... Information processing apparatus 600... Image reading apparatus

Claims

Characters for extracting a character string by performing character recognition processing on a predetermined first area in image data of a first color component among a plurality of color components constituting image data generated by reading an original A column extractor;
An image of another color component existing in the second area including the extracted character string in image data of another color component different from the first color component, and the extracted character string are predetermined. The image data of the other color component, a character recognition process is performed on the second area to recognize the character of the other color component;
At least the other based on a relationship between an image and the extracted character string color components, the correction unit for correcting the extracted character strings,
An image processing apparatus comprising:

When the image of the other color component overlaps the extracted character string, the correction unit extracts the character that overlaps the image of the other color component among the characters included in the extracted character string. The image processing apparatus according to claim 1, wherein the image processing apparatus is deleted from the processed character string.

In the case where the image of the other color component overlaps the extracted character string, the correction unit changes a character that overlaps the image of the other color component among the characters included in the extracted character string. The image processing apparatus according to claim 1, wherein the color character recognition unit replaces the character with the character of the other color component recognized.

When a predetermined symbol exists in the second area in the image data of the other color component, the correction unit inserts the character recognized by the other color character recognition unit into the extracted character string. The image processing apparatus according to claim 1, wherein the image processing apparatus is an image processing apparatus.

A dictionary storage for storing character strings that have been used by users;
When the character string corrected by the correction unit is not included in the dictionary storage unit, a character string similar to the character string corrected by the correction unit is detected from the character strings stored in the dictionary storage unit. A character string determination unit that determines the extraction result of the character string included in the first area;
The image processing apparatus according to claim 1, further comprising:

On the computer,
Characters for extracting a character string by performing character recognition processing on a predetermined first area in image data of a first color component among a plurality of color components constituting image data generated by reading an original A column extraction step;
An image of another color component existing in the second area including the extracted character string in image data of another color component different from the first color component, and the extracted character string are predetermined. A character recognition process for recognizing the character of the other color component by performing character recognition processing on the second region in the image data of the other color component.
A correction step at least the other based on a relationship between an image and the extracted character string color components, modifying the extracted character strings,
A program characterized by having executed.