JP2023026170A

JP2023026170A - Image processing device, image processing system, image processing method, and program

Info

Publication number: JP2023026170A
Application number: JP2021131909A
Authority: JP
Inventors: 裕介村松; Yusuke Murakami; 元気池田; Motoki Ikeda
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2021-08-13
Filing date: 2021-08-13
Publication date: 2023-02-24

Abstract

To improve estimation accuracy of a handwritten character region.SOLUTION: An information processing device is configured to: extract a processing target region in a read image; estimate a handwritten portion and a background portion in the read image; extract a contour existing in the processing target region in the read image; and control correction of estimation results on the basis of a coordinate position of the extracted contour, a coordinate position of the estimated handwritten portion, and a coordinate position of the estimated background portion.SELECTED DRAWING: Figure 9B

Description

本発明は、画像処理装置、画像処理システム、画像処理方法、及びプログラムに関する。 The present invention relates to an image processing device, an image processing system, an image processing method, and a program.

近年、コンピュータの普及に伴う労働環境の変化により、業務資料の電子化が進んでいる。こうした電子化の対象は手書き文字が記入された文書にも及んでおり、手書き文字を抽出する技術が検討されている。特許文献１では、手書きと活字が混在した原稿から細線を抽出し、細線毎に輝度の分散に応じて手書きか否かを判定することで、手書き文字を抽出することが記載されている。 In recent years, due to changes in the working environment accompanying the spread of computers, the digitization of work materials is progressing. Documents in which handwritten characters are entered are also being digitized, and techniques for extracting handwritten characters are being studied. Patent Document 1 describes extracting handwritten characters by extracting fine lines from a manuscript in which handwriting and printed characters are mixed and determining whether or not each fine line is handwritten according to the luminance distribution.

特開２０１０－２１８１０６号公報Japanese Patent Application Laid-Open No. 2010-218106

しかしながら、括弧やハイフン、ピリオドといった構成する画素数が少ない文字は輝度分散に特徴が表れにくい。そのため、特許文献１に記載の技術では、手書き文字を精度よく抽出することができない。 However, characters with a small number of pixels, such as parentheses, hyphens, and periods, are less likely to exhibit characteristics in luminance distribution. Therefore, the technique described in Patent Document 1 cannot accurately extract handwritten characters.

そこで本発明は、手書き文字領域の推定精度を向上させることを目的とする。 SUMMARY OF THE INVENTION Accordingly, an object of the present invention is to improve the estimation accuracy of a handwritten character area.

本発明の画像処理装置は、手書きを含む原稿の読取画像を取得する取得手段と、前記読取画像における、処理対象領域を抽出する第１の抽出手段と、前記読取画像における、手書き部分と背景部分とを推定する推定手段と、前記読取画像における、前記処理対象領域内に存在する輪郭を抽出する第２の抽出手段と、前記抽出された輪郭の座標位置と前記推定された手書き部分の座標位置と前記推定された背景部分の座標位置とに基づいて、前記推定手段による推定結果を補正するよう制御する制御手段と、を有することを特徴とする。 An image processing apparatus according to the present invention includes acquisition means for acquiring a read image of a document including handwriting, first extraction means for extracting a processing target area in the read image, and handwritten portions and background portions in the read image. estimating means for estimating , second extracting means for extracting a contour existing in the processing target area in the read image, the coordinate position of the extracted contour and the estimated coordinate position of the handwritten portion and control means for controlling to correct the estimation result by the estimation means based on the estimated coordinate position of the background portion.

本発明によれば、手書き文字領域の推定精度を向上させることができる。 According to the present invention, it is possible to improve the accuracy of estimating a handwritten character area.

画像処理システムの全体構成例を示す図である。It is a figure which shows the whole structural example of an image processing system. 各装置のハードウェア構成例を示す図である。It is a figure which shows the hardware configuration example of each apparatus. 学習装置の機能構成例を示す図である。It is a figure which shows the functional structural example of a learning apparatus. 学習処理を示すフローチャートである。4 is a flowchart showing learning processing; 学習データ生成処理を示すフローチャートである。4 is a flowchart showing learning data generation processing; 前景元画像の例を示す図である。FIG. 4 is a diagram showing an example of a foreground original image; 背景元画像の例を示す図である。FIG. 4 is a diagram showing an example of a background original image; 学習データ生成処理で生成されるデータを説明するための図である。FIG. 5 is a diagram for explaining data generated in learning data generation processing; 利用フェーズで実行される処理を示すフローチャートである。4 is a flow chart showing processing executed in a utilization phase; 図９Ａで実行される処理の詳細を示すフローチャートである。9B is a flow chart showing details of the processing performed in FIG. 9A; ＯＣＲ処理で生成されるデータを説明するための図である。FIG. 4 is a diagram for explaining data generated by OCR processing; FIG. ＯＣＲ処理で生成されるデータを説明するための図である。FIG. 4 is a diagram for explaining data generated by OCR processing; FIG. ＯＣＲ処理で生成されるデータを説明するための図である。FIG. 4 is a diagram for explaining data generated by OCR processing; FIG. ＯＣＲ処理で生成されるデータを説明するための図である。FIG. 4 is a diagram for explaining data generated by OCR processing; FIG. 実施形態２に係る補正処理を示すフローチャートである。9 is a flowchart showing correction processing according to the second embodiment; ＯＣＲ処理で生成されるデータを説明するための図である。FIG. 4 is a diagram for explaining data generated by OCR processing; FIG.

以下、本発明の実施形態について、図面を参照して説明する。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described with reference to the drawings.

［実施形態１］
＜画像処理システム＞
本実施形態では、合成して生成した学習データを用いて学習した手書き画素の推定を行うニューラルネットワークを用いて、手書き記入された帳票から手書き文字の領域を抽出し、記入内容を文字認識してＤＢ（データベース）に保存する方法について説明する。
図１は、本実施形態に係る画像処理システムの全体構成例を示す図である。画像処理システム１００は、画像処理装置１０１、学習装置１０２、画像処理サーバ１０３、活字ＯＣＲサーバ１０４、手書きＯＣＲサーバ１０５、及びＤＢサーバ１０６により構成される。画像処理装置１０１、学習装置１０２、画像処理サーバ１０３、活字ＯＣＲサーバ１０４、手書きＯＣＲサーバ１０５、及びＤＢサーバ１０６は、ネットワーク１０７を介して相互に接続されている。
画像処理装置１０１は、スキャン機能と印刷機能を備えたデジタル複合機であり、例えばＭＦＰ（ＭｕｌｔｉＦｕｎｃｔｉｏｎＰｅｒｉｐｈｅｒａｌ）である。 [Embodiment 1]
<Image processing system>
In this embodiment, using a neural network that estimates handwritten pixels learned using synthetically generated learning data, a region of handwritten characters is extracted from a handwritten form, and character recognition is performed on the content of the entry. A method of saving in a DB (database) will be described.
FIG. 1 is a diagram showing an example of the overall configuration of an image processing system according to this embodiment. The image processing system 100 includes an image processing device 101 , a learning device 102 , an image processing server 103 , a printed OCR server 104 , a handwritten OCR server 105 and a DB server 106 . Image processing device 101 , learning device 102 , image processing server 103 , type OCR server 104 , handwritten OCR server 105 , and DB server 106 are interconnected via network 107 .
The image processing apparatus 101 is a digital multi-function peripheral having a scanning function and a printing function, such as an MFP (Multi Function Peripheral).

画像処理システム１００における学習フェーズでは、画像処理装置１０１は、白紙に手書きのみが記入された原稿をスキャンして画像データを生成する（以降、この画像データを「前景元画像」と呼称する）。画像処理装置１０１は、複数枚の原稿をスキャンして前景元画像を複数得る。前景元画像は、第１の読取画像の一例である。また、画像処理装置１０１は、電子文書を印刷し印刷原稿を出力する。さらに、この印刷原稿をスキャンして画像データを生成する（以降、この画像データを「背景元画像」と呼称する）。画像処理装置１０１は、複数枚の印刷原稿をスキャンして背景元画像を複数得る。背景元画像は、第２の読取画像の一例である。画像処理装置１０１は、ネットワーク１０７を介して、前景元画像と背景元画像を学習装置１０２に送信する。
学習装置１０２は、画像処理装置１０１から受信した前景元画像と背景元画像とを蓄積し、蓄積した画像を合成して手書き抽出を行うためのニューラルネットワークを学習する際に用いる学習データを生成する。そして、生成した学習データを用いてニューラルネットワークの学習を行って、学習結果（ニューラルネットワークのパラメータ等）を生成する。 In the learning phase of the image processing system 100, the image processing apparatus 101 scans a blank document in which only handwriting is written to generate image data (hereinafter, this image data is referred to as "foreground original image"). The image processing apparatus 101 scans a plurality of originals to obtain a plurality of foreground original images. The foreground original image is an example of the first read image. Also, the image processing apparatus 101 prints an electronic document and outputs a print manuscript. Furthermore, this print document is scanned to generate image data (this image data is hereinafter referred to as "background original image"). The image processing apparatus 101 scans a plurality of printed documents to obtain a plurality of background original images. The background original image is an example of the second read image. The image processing device 101 transmits the foreground original image and the background original image to the learning device 102 via the network 107 .
The learning device 102 accumulates the foreground original image and the background original image received from the image processing device 101, synthesizes the accumulated images, and generates learning data used when learning a neural network for extracting handwriting. . Then, the generated learning data is used to perform learning of the neural network to generate a learning result (parameters of the neural network, etc.).

画像処理システム１００における利用フェーズでは、画像処理装置１０１は、手書き抽出を行う際に、手書きが含まれる帳票をスキャンして処理対象とする読取画像を生成する（以降、このスキャン画像データを「処理対象画像」と呼称する）。画像処理装置１０１は、画像生成装置の一例である。画像処理装置１０１は、ネットワーク１０７を介して、処理対象画像を画像処理サーバ１０３に送信する。
画像処理サーバ１０３は、画像処理装置１０１から受信した処理対象画像に対して手書き抽出を行う。この際、学習装置１０２は、ネットワーク１０７を介して、学習結果を画像処理サーバ１０３に送信する。画像処理装置１０１は、学習装置１０２から受信した学習結果を用いることで、ニューラルネットワークにより推論して処理対象画像中の手書きの画素を推定する。そして、画像処理サーバ１０３は、推定結果に基づいて、活字ＯＣＲの対象とする領域と手書きＯＣＲの対象とする領域を抽出し、これらの領域の情報を処理対象画像と共に活字ＯＣＲサーバ１０４と手書きＯＣＲサーバ１０５に送信する。画像処理サーバ１０３は、画像処理装置の一例である。 In the usage phase of the image processing system 100, the image processing apparatus 101 scans a form containing handwriting and generates a read image to be processed (hereinafter, this scanned image data is referred to as "processing"). "Target Image"). The image processing device 101 is an example of an image generating device. The image processing apparatus 101 transmits the image to be processed to the image processing server 103 via the network 107 .
The image processing server 103 performs handwriting extraction on the processing target image received from the image processing apparatus 101 . At this time, the learning device 102 transmits the learning result to the image processing server 103 via the network 107 . The image processing apparatus 101 uses the learning results received from the learning apparatus 102 to make inferences with a neural network to estimate handwritten pixels in the processing target image. Then, the image processing server 103 extracts an area to be subjected to printed OCR and an area to be subjected to handwritten OCR based on the estimation result, and transmits the information of these areas together with the image to be processed to the printed OCR server 104 and the handwritten OCR. Send to server 105 . The image processing server 103 is an example of an image processing apparatus.

活字ＯＣＲサーバ１０４は、処理対象画像に含まれる活字に対して活字を文字認識するのに適したＯＣＲ（光学文字認識：ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅｃｏｇｎｉｔｉｏｎ／Ｒｅａｄｅｒ）を行う。活字ＯＣＲサーバ１０４は、画像処理サーバ１０３から、処理対象画像、及び処理対象画像上の領域であってＯＣＲの対象とする活字を含む領域（以降、この領域を「活字対象領域」と呼称する）の情報を受信する。そして、処理対象画像中の活字対象領域に対してＯＣＲを行ってテキストデータを取得する。活字ＯＣＲサーバ１０４は、当該テキストデータを画像処理サーバ１０３に送信する。
手書きＯＣＲサーバ１０５は、処理対象画像に含まれる手書き文字に対して手書き文字を文字認識するのに適したＯＣＲを行う。手書きＯＣＲサーバ１０５は、画像処理サーバ１０３から、処理対象画像、及び処理対象画像上の領域であってＯＣＲの対象とする手書き文字を含む領域（以降、この領域を「手書き対象領域」と呼称する）の情報を受信する。そして、処理対象画像中の手書き対象領域に対してＯＣＲを行ってテキストデータを取得する。手書きＯＣＲサーバ１０５は、当該テキストデータを画像処理サーバ１０３に送信する。活字ＯＣＲサーバ１０４及び手書きＯＣＲサーバ１０５は、ＯＣＲ装置の一例である。
画像処理サーバ１０３は、活字ＯＣＲサーバ１０４及び手書きＯＣＲサーバ１０５から受信したテキストデータを統合して、ＤＢサーバ１０６に送信する。
ＤＢサーバ１０６は、画像処理サーバ１０３から受信したテキストデータを帳票の記入内容を示す情報として、ＤＢに保存する。ＤＢに保存された情報は、他のシステムから参照することが可能である。 The printed character OCR server 104 performs OCR (Optical Character Recognition/Reader) suitable for character recognition of printed characters included in the image to be processed. The type OCR server 104 receives from the image processing server 103 an image to be processed and an area on the image to be processed that includes the type to be OCRed (hereinafter, this area is referred to as a "type target area"). to receive information about Then, OCR is performed on the printing target area in the processing target image to obtain text data. The printed character OCR server 104 transmits the text data to the image processing server 103 .
The handwritten OCR server 105 performs OCR suitable for recognizing handwritten characters on the handwritten characters included in the image to be processed. The handwritten OCR server 105 receives from the image processing server 103 an image to be processed and an area on the image to be processed that includes handwritten characters to be OCRed (hereinafter, this area is referred to as a "handwritten target area"). ) information. Then, OCR is performed on the handwriting target area in the processing target image to obtain text data. Handwritten OCR server 105 transmits the text data to image processing server 103 . The printed OCR server 104 and the handwritten OCR server 105 are examples of OCR devices.
The image processing server 103 integrates the text data received from the printed OCR server 104 and the handwritten OCR server 105 and transmits the integrated text data to the DB server 106 .
The DB server 106 stores the text data received from the image processing server 103 in the DB as information indicating the contents of the form. Information saved in the DB can be referenced from other systems.

＜各装置のハードウェア構成＞
次に、図２を用いて、上述した画像処理システム１００を構成する各装置のハードウェア構成について説明する。図２（ａ）は、画像処理装置１０１のハードウェア構成例を示す。図２（ｂ）は、学習装置１０２のハードウェア構成例を示す。図２（ｃ）は、画像処理サーバ１０３のハードウェア構成例を示す。なお、活字ＯＣＲサーバ１０４、手書きＯＣＲサーバ１０５、及びＤＢサーバ１０６のハードウェア構成は、画像処理サーバ１０３と同様とし、説明を省略する。 <Hardware configuration of each device>
Next, with reference to FIG. 2, the hardware configuration of each device constituting the image processing system 100 described above will be described. FIG. 2A shows an example hardware configuration of the image processing apparatus 101 . FIG. 2B shows an example hardware configuration of the learning device 102 . FIG. 2C shows an example hardware configuration of the image processing server 103 . The hardware configuration of the printed character OCR server 104, the handwritten OCR server 105, and the DB server 106 is the same as that of the image processing server 103, and a description thereof will be omitted.

図２（ａ）に示すように、画像処理装置１０１は、次を備える。ＣＰＵ２０１、ＲＯＭ２０２、ＲＡＭ２０４、プリンタデバイス２０５、スキャナデバイス２０６、原稿搬送デバイス２０７、ストレージ２０８、入力デバイス２０９、表示デバイス２１０、及び外部インタフェース２１１を備える。各デバイスは、データバス２０３によって相互通信可能に接続されている。 As shown in FIG. 2A, the image processing apparatus 101 has the following. It has a CPU 201 , a ROM 202 , a RAM 204 , a printer device 205 , a scanner device 206 , a document conveying device 207 , a storage 208 , an input device 209 , a display device 210 and an external interface 211 . Each device is connected by a data bus 203 so as to be able to communicate with each other.

ＣＰＵ２０１は、画像処理装置１０１を統括的に制御するためのコントローラである。ＣＰＵ２０１は、ＲＯＭ２０２に格納されているブートプログラムによりＯＳ（オペレーティングシステム）を起動する。このＯＳ上で、ストレージ２０８に記憶されているコントローラプログラムが実行される。コントローラプログラムは、画像処理装置１０１を制御するためのプログラムである。ＣＰＵ２０１は、データバス２０３によって接続されている各デバイスを統括的に制御する。ＲＡＭ２０４は、ＣＰＵ２０１の主メモリやワークエリア等の一時記憶領域として動作する。 A CPU 201 is a controller for overall control of the image processing apparatus 101 . The CPU 201 boots an OS (operating system) by a boot program stored in the ROM 202 . A controller program stored in the storage 208 is executed on this OS. A controller program is a program for controlling the image processing apparatus 101 . The CPU 201 centrally controls each device connected by the data bus 203 . A RAM 204 operates as a temporary storage area such as a main memory or a work area of the CPU 201 .

プリンタデバイス２０５は、ＣＰＵ２０１の制御下で、画像データを用紙（記録材、シート）上に印刷する。これには感光体ドラムや感光体ベルトなどを用いた電子写真印刷方式や、微小ノズルアレイからインクを吐出して用紙上に直接画像を印字するインクジェット方式などがあるが、どの方式でもかまわない。スキャナデバイス２０６は、ＣＰＵ２０１の制御下で、ＣＣＤなどの光学読取装置を用いて紙などの原稿上の走査を行い、電気信号データを得てこれを変換し、画像データ（読取画像）を生成する。また、ＡＤＦ（オート・ドキュメント・フィーダ）などの原稿搬送デバイス２０７は、原稿搬送デバイス２０７上の原稿台に載置された原稿を１枚ずつスキャナデバイス２０６に搬送する。 A printer device 205 prints image data on paper (recording material, sheet) under the control of the CPU 201 . There are electrophotographic printing methods using a photoreceptor drum or a photoreceptor belt, and inkjet methods in which ink is ejected from a fine nozzle array to directly print an image on paper. Under the control of the CPU 201, the scanner device 206 scans a document such as paper using an optical reading device such as a CCD, obtains electrical signal data, converts the data, and generates image data (read image). . A document conveying device 207 such as an ADF (automatic document feeder) conveys documents placed on a document platen on the document conveying device 207 to the scanner device 206 one by one.

ストレージ２０８は、ＨＤＤやＳＳＤなどの、読み出しと書き込みが可能な不揮発メモリであり、ここには、前述のコントローラプログラムなど、様々なデータが記録される。入力デバイス２０９は、タッチパネルやハードキーなどから構成さる入力装置である。入力デバイス２０９は、ユーザの操作指示を受け付ける。そして、指示位置を含む指示情報をＣＰＵ２０１に伝達する。表示デバイス２１０は、ＬＣＤやＣＲＴなどの表示装置である。表示デバイス２１０は、ＣＰＵ２０１が生成した表示データを表示する。ＣＰＵ２０１は、入力デバイス２０９より受信した指示情報と、表示デバイス２１０に表示させている表示データとから、いずれの操作が成されたかを判定する。そしてこの判定結果に応じて、画像処理装置１０１を制御するとともに、新たな表示データを生成し表示デバイス２１０に表示させる。 The storage 208 is a readable and writable nonvolatile memory such as an HDD or SSD, and various data such as the aforementioned controller program are recorded here. The input device 209 is an input device including a touch panel, hard keys, and the like. The input device 209 accepts user's operation instructions. Then, instruction information including the indicated position is transmitted to the CPU 201 . A display device 210 is a display device such as an LCD or a CRT. The display device 210 displays display data generated by the CPU 201 . The CPU 201 determines which operation has been performed based on the instruction information received from the input device 209 and the display data displayed on the display device 210 . Then, according to the determination result, the image processing apparatus 101 is controlled, and new display data is generated and displayed on the display device 210 .

外部インタフェース２１１は、ＬＡＮや電話回線、赤外線といった近接無線などのネットワークを介して、ＣＰＵ２０１の制御下で、外部機器と、画像データをはじめとする各種データの送受信を行う。外部インタフェース２１１は、学習装置１０２やＰＣ（不図示）などの外部機器より、ＰＤＬデータを受信する。ＣＰＵ２０１は、外部インタフェース２１１が受信したＰＤＬデータを解釈し、画像を生成する。ＣＰＵ２０１は、生成した画像を、プリンタデバイス２０５により印刷したり、ストレージ２０８に記憶したりする。また、外部インタフェース２１１は、画像処理サーバ１０３などの外部機器より画像データを受信する。ＣＰＵ２０１は、受信した画像データを、プリンタデバイス２０５により印刷したり、ストレージ２０８に記憶したり、外部インタフェース２１１により、他の外部機器に送信したりする。 The external interface 211 transmits/receives various data including image data to/from an external device under the control of the CPU 201 via a network such as a LAN, a telephone line, or proximity wireless communication such as infrared rays. The external interface 211 receives PDL data from an external device such as the learning device 102 or a PC (not shown). The CPU 201 interprets the PDL data received by the external interface 211 and generates an image. The CPU 201 prints the generated image using the printer device 205 or stores it in the storage 208 . The external interface 211 also receives image data from an external device such as the image processing server 103 . The CPU 201 prints the received image data by the printer device 205, stores it in the storage 208, or transmits it to another external device via the external interface 211. FIG.

図２（ｂ）に示すように、学習装置１０２は、ＣＰＵ２３１、ＲＯＭ２３２、ＲＡＭ２３４、ストレージ２３５、入力デバイス２３６、表示デバイス２３７、外部インタフェース２３８、及びＧＰＵ２３９を備える。各部は、データバス２３３を介して相互にデータを送受信することができる。
ＣＰＵ２３１は、学習装置１０２の全体を制御するためのコントローラである。ＣＰＵ２３１は、不揮発メモリであるＲＯＭ２３２に格納されているブートプログラムによりＯＳを起動する。このＯＳの上で、ストレージ２３５に記憶されている学習データ生成プログラム及び学習プログラムを実行する。ＣＰＵ２３１が学習データ生成プログラムを実行することより、学習データ生成部３０１（図３）としての機能が実現する。また、ＣＰＵ２３１が学習プログラムを実行することにより、手書き画素の推定を行うニューラルネットワークを学習する学習部３０２（図３）としての機能が実現する。ＣＰＵ２３１は、データバス２３３などのバスを介して各部を制御する。ＲＡＭ２３４は、ＣＰＵ２３１のメインメモリやワークエリア等の一時記憶領域として動作する。 As shown in FIG. 2B, the learning device 102 includes a CPU 231, a ROM 232, a RAM 234, a storage 235, an input device 236, a display device 237, an external interface 238, and a GPU239. Each unit can transmit and receive data to and from each other via the data bus 233 .
The CPU 231 is a controller for controlling the learning device 102 as a whole. The CPU 231 activates the OS by a boot program stored in the ROM 232, which is nonvolatile memory. A learning data generation program and a learning program stored in the storage 235 are executed on this OS. As the CPU 231 executes the learning data generation program, the function of the learning data generation unit 301 (FIG. 3) is realized. Also, the CPU 231 executes the learning program to realize a function as a learning unit 302 (FIG. 3) that learns a neural network for estimating handwritten pixels. The CPU 231 controls each part via a bus such as the data bus 233 . The RAM 234 operates as a temporary storage area such as a main memory or work area of the CPU 231 .

ストレージ２３５は、読み出しと書き込みが可能な不揮発メモリであり、前述の学習データ生成プログラムや学習プログラム、画像処理装置１０１が生成した前景元画像と背景元画像、後述する学習データ生成処理（図５）で生成した学習データを記憶する。
入力デバイス２３６は、マウスやキーボードなどから構成さる入力装置である。表示デバイス２３７は、図２（ａ）を用いて説明した表示デバイス２１０と同様である。
外部インタフェース２３８は、図２（ａ）を用いて説明した外部インタフェース２１１と同様である。
ＧＰＵ２３９は、画像処理プロセッサであり、ＣＰＵ２３１と協働して画像データの生成やニューラルネットワークの学習を行う。 The storage 235 is a readable and writable non-volatile memory, and stores the above-described learning data generation program and learning program, the foreground original image and the background original image generated by the image processing apparatus 101, and the learning data generation processing described later (FIG. 5). Store the learning data generated in .
The input device 236 is an input device including a mouse and keyboard. The display device 237 is the same as the display device 210 described with reference to FIG. 2(a).
The external interface 238 is similar to the external interface 211 described using FIG.
The GPU 239 is an image processor and cooperates with the CPU 231 to generate image data and learn neural networks.

図２（ｃ）に示すように、画像処理サーバ１０３は、ＣＰＵ２６１、ＲＯＭ２６２、ＲＡＭ２６４、ストレージ２６５、入力デバイス２６６、表示デバイス２６７、及び外部インタフェース２６８を備える。各部は、データバス２６３を介して相互にデータを送受信することができる。
ＣＰＵ２６１は、画像処理サーバ１０３の全体を制御するためのコントローラである。ＣＰＵ２６１は、不揮発メモリであるＲＯＭ２６２に格納されているブートプログラムによりＯＳを起動する。このＯＳの上で、ストレージ２６５に記憶されている画像処理プログラムを実行する。ＣＰＵ２６１がこの画像処理プログラムを実行することより、後述するフローチャートに示す処理が実現する。ＣＰＵ２６１は、データバス２６３などのバスを介して各部を制御する。
ＲＡＭ２６４は、ＣＰＵ２６１のメインメモリやワークエリア等の一時記憶領域として動作する。ストレージ２６５は、読み出しと書き込みが可能な不揮発メモリであり、前述の画像処理プログラムを記憶する。
入力デバイス２６６は、図２（ｂ）を用いて説明した入力デバイス２３６と同様である。表示デバイス２６７は、図２（ａ）を用いて説明した表示デバイス２１０と同様である。
外部インタフェース２６８は、図２（ａ）を用いて説明した外部インタフェース２１１と同様である。 As shown in FIG. 2C, the image processing server 103 includes a CPU 261, a ROM 262, a RAM 264, a storage 265, an input device 266, a display device 267, and an external interface 268. Each unit can transmit and receive data to and from each other via the data bus 263 .
A CPU 261 is a controller for controlling the entire image processing server 103 . The CPU 261 starts the OS by a boot program stored in the ROM 262, which is nonvolatile memory. An image processing program stored in the storage 265 is executed on this OS. When the CPU 261 executes this image processing program, processing shown in a flow chart to be described later is realized. The CPU 261 controls each part via a bus such as the data bus 263 .
The RAM 264 operates as a temporary storage area such as a main memory of the CPU 261 or a work area. The storage 265 is a readable and writable nonvolatile memory, and stores the image processing program described above.
The input device 266 is similar to the input device 236 described with reference to FIG. 2(b). The display device 267 is similar to the display device 210 described with reference to FIG. 2(a).
The external interface 268 is similar to the external interface 211 described using FIG.

図３は、学習装置１０２の機能構成例を示すブロック図である。学習装置１０２は、学習データ生成部３０１と学習部３０２の機能を有する。ＣＰＵ２３１がストレージ２３５に記憶されている学習データ生成プログラムをＲＡＭ２３４に展開して実行することより、学習データ生成部３０１としての機能が実現する。また、ＣＰＵ２３１がストレージ２３５に記憶されている学習プログラムをＲＡＭ２３４に展開して実行することにより、手書き画素の推定を行うニューラルネットワークを学習する学習部３０２としての機能が実現する。また、ＣＰＵ２３１が、学習データ生成部３０１や学習部３０２が実行する計算処理の一部を、ＧＰＵ２３９と協働して実行する。
学習データ生成部３０１は、ニューラルネットワークを学習するための学習データを生成する。
学習部３０２は、学習データ生成部３０１が生成した学習データを用いて、ニューラルネットワークを学習する。 FIG. 3 is a block diagram showing a functional configuration example of the learning device 102. As shown in FIG. The learning device 102 has the functions of a learning data generation unit 301 and a learning unit 302 . The CPU 231 develops the learning data generation program stored in the storage 235 in the RAM 234 and executes it, thereby realizing the function of the learning data generation unit 301 . Also, the CPU 231 develops the learning program stored in the storage 235 in the RAM 234 and executes it, thereby realizing a function as a learning unit 302 that learns a neural network for estimating handwritten pixels. In addition, the CPU 231 cooperates with the GPU 239 to execute part of the calculation processing executed by the learning data generation unit 301 and the learning unit 302 .
The learning data generation unit 301 generates learning data for learning the neural network.
The learning unit 302 uses the learning data generated by the learning data generation unit 301 to learn the neural network.

続いて図４～図８を用いて、本実施形態に係る画像処理システム１００が学習フェーズで実行する処理について説明する。
＜学習処理＞
図４は、学習処理を示すフローチャートである。本フローチャートは、学習装置１０２の学習部３０２により実現される。本フローチャートは、ユーザが、画像処理装置１０１の入力デバイス２０９を介して、所定の操作を行うことで開始される。なお、本実施形態において、ニューラルネットワークの学習には、ミニバッチ法を用いるものとする。以下、各工程（ステップ）について先頭にＳを付けて表記することで、工程（ステップ）の表記を省略する。 Next, processing executed in the learning phase by the image processing system 100 according to the present embodiment will be described with reference to FIGS. 4 to 8. FIG.
<Learning processing>
FIG. 4 is a flow chart showing the learning process. This flowchart is implemented by the learning unit 302 of the learning device 102 . This flowchart starts when the user performs a predetermined operation via the input device 209 of the image processing apparatus 101 . In this embodiment, the mini-batch method is used for neural network learning. Hereinafter, notation of each process (step) is omitted by adding S to the beginning of each process (step).

まずＳ４０１において、ＣＰＵ２３１は、ニューラルネットワークを初期化する。具体的には、ＣＰＵ２３１は、ニューラルネットワークを構築し、当該ニューラルネットワークに含まれる各パラメータの値を、ランダムに決定して初期化する。構築するニューラルネットワークの構造は、様々なものを用いることができるが、例えばＦＣＮ（ＦｕｌｌｙＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｔｗｏｒｋｓ）の形態を取り得る。
Ｓ４０２において、ＣＰＵ２３１は、学習データを取得する。ＣＰＵ２３１は、学習データ生成処理を実行して、所定の数（ミニバッチサイズ、例えば１０）の学習データを取得する。なお、事前の学習データ生成処理によって生成した大量の学習データから、ミニバッチサイズ分の学習データを取得してもよい。学習データ生成処理については、図５で後述する。
Ｓ４０３において、ＣＰＵ２３１は、ニューラルネットワークの誤差を算出する。具体的には、ＣＰＵ２３１は、各学習データに含まれる入力画像をニューラルネットワークに入力して出力を得る。当該出力は、入力画像と同じ画像サイズであり、予測結果として、手書きであると判定された画素は、手書きを示す値を有し、手書きではないと判定された画素は、手書きではないことを示す値を有する画像である。そして、ＣＰＵ２３１は、当該出力と正解ラベル画像との差を評価して誤差を求める。当該評価には指標として交差エントロピーを用いることができる。 First, in S401, the CPU 231 initializes the neural network. Specifically, the CPU 231 constructs a neural network and randomly determines and initializes the values of each parameter included in the neural network. Various structures can be used for the neural network to be constructed, and for example, it can take the form of FCN (Fully Convolutional Networks).
In S402, the CPU 231 acquires learning data. The CPU 231 executes learning data generation processing to obtain a predetermined number (mini-batch size, for example 10) of learning data. Note that learning data for a mini-batch size may be acquired from a large amount of learning data generated by prior learning data generation processing. The learning data generation process will be described later with reference to FIG.
In S403, the CPU 231 calculates the error of the neural network. Specifically, the CPU 231 inputs the input image included in each learning data to the neural network to obtain an output. The output has the same image size as the input image, and as a prediction result, pixels determined to be handwritten have a value indicating handwriting, and pixels determined not to be handwritten indicate that they are not handwritten. is an image with the values shown. Then, the CPU 231 evaluates the difference between the output and the correct label image to find the error. Cross-entropy can be used as an index for the evaluation.

Ｓ４０４において、ＣＰＵ２３１は、ニューラルネットワークのパラメータを調整する。具体的には、ＣＰＵ２３１は、Ｓ４０３において算出した誤差をもとに、バックプロパゲーション法によってニューラルネットワークのパラメータの値を変更する。
Ｓ４０５において、ＣＰＵ２３１は、学習を終了するか否かを判定する。これは次のようにして行う。ＣＰＵ２３１は、Ｓ４０２～Ｓ４０４の処理を、所定回数（例えば、６００００回）行ったか否かを判定する。当該所定回数は、本フローチャートの開始時にユーザの操作入力などにより、予め決定することができる。ＣＰＵ２３１が処理回数が所定回数に達したと判定した場合、処理はＳ４０６に遷移する。ＣＰＵ２３１が処理回数が所定回数に達していないと判定した場合、処理はＳ４０２に遷移し、ＣＰＵ２３１はニューラルネットワークの学習を継続する。
Ｓ４０６において、ＣＰＵ２３１は、学習結果として、Ｓ４０４において調整したニューラルネットワークのパラメータを、画像処理サーバ１０３に送信する。その後、本フローチャートの処理が終了する。
以上のような学習処理によれば、学習データを用いて手書き画素の推定を行うニューラルネットワークの学習を行うことができる。 In S404, the CPU 231 adjusts parameters of the neural network. Specifically, the CPU 231 changes the parameter values of the neural network by the back propagation method based on the error calculated in S403.
In S405, the CPU 231 determines whether or not to end learning. This is done as follows. The CPU 231 determines whether or not the processes of S402 to S404 have been performed a predetermined number of times (eg, 60000 times). The predetermined number of times can be determined in advance by a user's operation input or the like at the start of this flowchart. When the CPU 231 determines that the number of times of processing has reached the predetermined number of times, the process transitions to S406. When the CPU 231 determines that the number of times of processing has not reached the predetermined number of times, the process transitions to S402, and the CPU 231 continues learning of the neural network.
In S406, the CPU 231 transmits the parameters of the neural network adjusted in S404 to the image processing server 103 as a learning result. After that, the processing of this flowchart ends.
According to the learning process described above, it is possible to learn a neural network that estimates handwritten pixels using learning data.

＜学習データ生成処理＞
次に、図４のＳ４０２で実行される学習データ生成処理について説明する。図５は、学習データ生成処理を示すフローチャートである。本フローチャートは、学習装置１０２の学習データ生成部３０１により実現される。図６～図８は、学習データ生成処理で生成されるデータを説明するための図である。
まずＳ５０１において、ＣＰＵ２３１は、ストレージ２３５に記憶している前景元画像を選択して読み出す。図６は、前景元画像の例を示す。前景元画像は手書きのみが書かれた画像であり、白紙に手書きのみが記入された原稿を画像処理装置１０１でスキャンすることで生成される。前景元画像はストレージ２３５に予め複数記録されているとする。本ステップでは、複数の前景元画像の中からランダムにひとつを選択する。ここでは図６の前景元画像６０１～６０３のうち、前景元画像６０１が選択されたものとして説明する。
Ｓ５０２において、ＣＰＵ２３１は、Ｓ５０１で読み出した前景元画像を回転して加工する。回転角度は所定の範囲（例えば、－１０度～１０度の間）からランダムに選択して決定する。 <Learning data generation processing>
Next, the learning data generation process executed in S402 of FIG. 4 will be described. FIG. 5 is a flowchart showing learning data generation processing. This flowchart is implemented by the learning data generation unit 301 of the learning device 102 . 6 to 8 are diagrams for explaining data generated in the learning data generation process.
First, in S501 , the CPU 231 selects and reads out the foreground original image stored in the storage 235 . FIG. 6 shows an example of the foreground original image. The foreground original image is an image in which only handwriting is written, and is generated by scanning a document in which only handwriting is written on a blank sheet of paper with the image processing apparatus 101 . Assume that a plurality of foreground original images are recorded in the storage 235 in advance. In this step, one is randomly selected from a plurality of foreground original images. Here, it is assumed that the foreground original image 601 is selected from among the foreground original images 601 to 603 in FIG.
In S502, the CPU 231 rotates and processes the foreground original image read out in S501. The rotation angle is randomly selected and determined from a predetermined range (for example, between -10 degrees and 10 degrees).

Ｓ５０３において、ＣＰＵ２３１は、前景元画像の一部（例えば、縦×横＝５１２×５１２の大きさ）を切り出した画像データを生成する（以降、この画像データを「前景画像」と呼称する）。切り出す位置はランダムに決定する。
Ｓ５０４において、ＣＰＵ２３１は、Ｓ５０３で生成した前景画像を、変倍して加工する。変倍率は所定の範囲（例えば、５０％～１５０％の間）からランダムに選択して決定する。さらに、変倍後の前景画像の一部（例えば、縦×横＝２５６×２５６の大きさ）を中央から切り出して前景画像を更新する。
Ｓ５０５において、ＣＰＵ２３１は、前景画像の各画素の輝度を変更して加工する。ＣＰＵ２３１は、前景画像をグレースケール化し、そしてガンマ補正を用いて前景画像の輝度を変更する。ガンマ値は所定の範囲（例えば、０．１～１０．０の間）からランダムに選択して決定する。この時点での前景画像の例を図８（ａ）に示す。 In S503, the CPU 231 generates image data by cutting out a portion of the original foreground image (for example, the size of length×width=512×512) (this image data is hereinafter referred to as “foreground image”). The cutting position is determined randomly.
In S504, the CPU 231 scales and processes the foreground image generated in S503. The scaling factor is randomly selected and determined from a predetermined range (eg, between 50% and 150%). Furthermore, a part of the foreground image after scaling (for example, the size of length×width=256×256) is cut out from the center to update the foreground image.
In S505, the CPU 231 processes the foreground image by changing the brightness of each pixel. CPU 231 grayscales the foreground image and uses gamma correction to change the brightness of the foreground image. The gamma value is randomly selected and determined from a predetermined range (eg, between 0.1 and 10.0). An example of the foreground image at this point is shown in FIG. 8(a).

Ｓ５０６において、ＣＰＵ２３１は、ストレージ２３５に記憶している背景元画像を選択して読み出す。図７は、背景元画像の例を示す。背景元画像は画像処理装置１０１で電子文書をプリントした原稿をそのままスキャンすることで生成される。この原稿は、手書きを含まず、活字や罫線等の帳票に印刷されるようなオブジェクトのみを含む。本実施形態では、利用フェーズでスキャンされる帳票と特徴（活字の大きさ、罫線の有無など）の似た電子文書を用いる。また、帳票に記入された手書き文字を対象とするため、帳票の手書き記入される領域を背景元画像として用いる。背景元画像はストレージ２３５に予め複数記録されているとする。本ステップでは、複数の背景元画像の中からランダムにひとつを選択する。
Ｓ５０７において、ＣＰＵ２３１は、Ｓ５０６で読み出した背景元画像を回転して加工する。回転角度は所定の範囲（例えば、－１０度～１０度の間）からランダムに選択して決定する。
Ｓ５０８において、ＣＰＵ２３１は、背景元画像の一部（Ｓ５０３で前景画像を切り出したときと同じ大きさ）を切り出して画像データを生成する（以降、この画像データを「背景画像」と呼称する）。切り出す位置はランダムに決定する。
Ｓ５０９において、ＣＰＵ２３１は、Ｓ５０８で生成した背景画像を、変倍して加工する。変倍率は所定の範囲（例えば、５０％～１５０％の間）からランダムに選択して決定する。さらに、変倍後の背景画像の一部（Ｓ５０４で前景画像を切り出したときと同じ大きさ）を中央から切り出して背景画像を更新する。
Ｓ５１０において、ＣＰＵ２３１は、背景画像の各画素の輝度を変更して加工する。ＣＰＵ２３１は、背景画像をグレースケール化し、そしてガンマ補正を用いて背景画像の輝度を変更する。ガンマ値は所定の範囲（例えば、０．１～１０．０の間）からランダムに選択して決定する。この時点での背景画像の例を図８（ｂ）に示す。 In S506 , the CPU 231 selects and reads out the background original image stored in the storage 235 . FIG. 7 shows an example of the background original image. The background original image is generated by scanning a manuscript printed from an electronic document by the image processing apparatus 101 as it is. This manuscript does not include handwriting, and includes only objects such as typed characters and ruled lines that are printed on a form. In this embodiment, an electronic document similar in characteristics (type size, presence or absence of ruled lines, etc.) to the form scanned in the usage phase is used. In addition, since handwritten characters written in a form are targeted, an area in which handwritten characters are written in the form is used as a background original image. It is assumed that a plurality of background original images are recorded in the storage 235 in advance. In this step, one is randomly selected from a plurality of background original images.
In S507, the CPU 231 rotates and processes the background original image read out in S506. The rotation angle is randomly selected and determined from a predetermined range (for example, between -10 degrees and 10 degrees).
In S508, the CPU 231 cuts out a portion of the background original image (the same size as when the foreground image was cut out in S503) to generate image data (this image data is hereinafter referred to as a "background image"). The cutting position is determined randomly.
In S509, the CPU 231 scales and processes the background image generated in S508. The scaling factor is randomly selected and determined from a predetermined range (eg, between 50% and 150%). Furthermore, a part of the background image after scaling (the same size as when the foreground image was cut out in S504) is cut out from the center to update the background image.
In S510, the CPU 231 processes the background image by changing the brightness of each pixel. The CPU 231 grayscales the background image and uses gamma correction to change the brightness of the background image. The gamma value is randomly selected and determined from a predetermined range (eg, between 0.1 and 10.0). An example of the background image at this point is shown in FIG. 8(b).

以上のようなステップにより、以降のステップで学習データを生成する際に合成対象となる前景画像と背景画像が得られる。本実施形態において、学習装置１０２は、前景画像と背景画像のそれぞれに、回転、変倍、及び輝度の変更を行う。これにより、学習データの多様性が得られ、当該学習データを用いて学習するニューラルネットワークの汎化性能を向上できる。なお、前景画像と背景画像のそれぞれに対して行う画像処理は、回転、変倍、及び輝度の変更に限られない。また、回転、変倍、及び輝度の変更のうちの何れかを選択的に行ってもよい。また、学習装置１０２は、前景元画像及び背景元画像を、そのままの大きさで用いるのではなく、より小さな部分画像をランダムに切り出して用いる。これにより、学習処理でＲＡＭ２３４に展開したり、ＣＰＵ２３１やＧＰＵ２３９が参照したりする際の効率性が考慮されるとともに、各一枚の前景元画像と背景元画像から、複数且つ多様な学習データを生成することができる。 Through the steps described above, a foreground image and a background image to be synthesized when generating learning data in subsequent steps are obtained. In this embodiment, the learning device 102 rotates, scales, and changes the luminance of each of the foreground image and the background image. As a result, the diversity of learning data can be obtained, and the generalization performance of a neural network that learns using the learning data can be improved. Note that the image processing performed on each of the foreground image and the background image is not limited to rotation, scaling, and luminance change. Also, any one of rotation, scaling, and luminance change may be selectively performed. Also, the learning device 102 does not use the original foreground image and the original background image in their original size, but randomly cuts out smaller partial images and uses them. As a result, efficiency is taken into consideration when the RAM 234 is loaded in the learning process and the CPU 231 and the GPU 239 refer to it. can be generated.

Ｓ５１１において、ＣＰＵ２３１は、前景画像に対して正解ラベル画像を生成する。まず、ＣＰＵ２３１は、前景画像に対して二値化処理を行う。そして、予め定めた閾値よりも低い値である画素の値を、手書きを示す値（例えば２５５、以降も同様）とし、他の画素の値を、手書きではないことを示す値（例えば０、以降も同様）とした画像データを、前景画像に対する正解ラベル画像として生成する。図８（ａ）の前景画像から生成された正解ラベル画像の例を図８（ｃ）に示す。図８（ｃ）における白画素が、手書きを示す値を有する画素である。
Ｓ５１２において、ＣＰＵ２３１は、学習データの入力画像を生成する。ＣＰＵ２３１は、前景画像と背景画像のそれぞれ同じ座標を比較し、輝度の低い方の画素値を採用した新しい画像を作る事で画像の合成を行う。図８（ａ）の前景画像と図８（ｂ）の背景画像を合成して生成された入力画像の例を図８（ｄ）に示す。
Ｓ５１３において、ＣＰＵ２３１は、Ｓ５１２で合成して生成した入力画像と、Ｓ５１１で生成した正解ラベル画像（正解データ）とを対応付け、学習データとしてストレージ２３５の所定の領域に保存する。ＣＰＵ２３１は、予め決定された学習データの数が生成されるまで、本フローチャートに示す一連の処理を繰り返し実行する。 In S511, the CPU 231 generates a correct label image for the foreground image. First, the CPU 231 performs binarization processing on the foreground image. Pixel values that are lower than a predetermined threshold value are set to values indicating handwriting (for example, 255, and so on), and other pixel values are set to values that indicate non-handwriting (for example, 0, and so on). ) is generated as a correct label image for the foreground image. FIG. 8(c) shows an example of a correct label image generated from the foreground image of FIG. 8(a). White pixels in FIG. 8(c) are pixels having values indicating handwriting.
In S512, the CPU 231 generates an input image of learning data. The CPU 231 compares the same coordinates of the foreground image and the background image, and synthesizes the images by creating a new image using pixel values with lower luminance. An example of an input image generated by synthesizing the foreground image of FIG. 8(a) and the background image of FIG. 8(b) is shown in FIG. 8(d).
In S513, the CPU 231 associates the input image synthesized and generated in S512 with the correct label image (correct data) generated in S511, and stores them in a predetermined area of the storage 235 as learning data. The CPU 231 repeatedly executes the series of processes shown in this flowchart until the predetermined number of learning data is generated.

以上のような学習データ生成処理によれば、手書き画素の推定を行うニューラルネットワークの学習を行うための学習データを生成することができる。 According to the learning data generation process as described above, it is possible to generate learning data for learning a neural network that estimates handwritten pixels.

続いて図９Ａ～図１３を用いて、本実施形態に係る画像処理システム１００が利用フェーズで実行する処理について説明する。利用フェーズではまず、画像処理装置１０１が、活字及び手書き文字を含む原稿をスキャンして処理対象画像を生成する。そして、処理対象画像を画像処理サーバ１０３に送信して、活字及び手書き文字のＯＣＲを依頼する。 Next, processing executed in the usage phase by the image processing system 100 according to the present embodiment will be described with reference to FIGS. 9A to 13. FIG. In the use phase, first, the image processing apparatus 101 scans a document containing printed characters and handwritten characters to generate an image to be processed. Then, the image to be processed is transmitted to the image processing server 103 to request OCR of printed characters and handwritten characters.

＜ＯＣＲ依頼処理＞
図９Ａ（ａ）は、ＯＣＲ依頼処理を示すフローチャートである。本フローチャートは、画像処理装置１０１のＣＰＵ２０１が、ストレージ２０８に記録されているコントローラプログラムを読み出し、ＲＡＭ２０４に展開して実行することにより実現される。本フローチャートは、ユーザが、画像処理装置１０１の入力デバイス２０９を介して、所定の操作を行うことで開始される。 <OCR request processing>
FIG. 9A(a) is a flowchart showing OCR request processing. This flowchart is implemented by the CPU 201 of the image processing apparatus 101 reading out a controller program recorded in the storage 208, developing it in the RAM 204, and executing it. This flowchart starts when the user performs a predetermined operation via the input device 209 of the image processing apparatus 101 .

Ｓ９０１において、ＣＰＵ２０１は、スキャナデバイス２０６や原稿搬送デバイス２０７を制御して、原稿をスキャンして処理対象画像（読取画像）を生成する。処理対象画像は、フルカラー（ＲＧＢ３チャネル）の画像データとして生成される。図１０（ａ）は、スキャンする原稿の例を示す。図１０（ａ）に示すように、原稿は登録票などの帳票であり、帳票における各項目の右側に手書き文字が記入されている。
Ｓ９０２において、ＣＰＵ２０１は、Ｓ９０１で生成した処理対象画像を、外部インタフェース２１１を介して、画像処理サーバ１０３に送信する。その後、本フローチャートの処理が終了する。 In S901, the CPU 201 controls the scanner device 206 and the document conveying device 207 to scan a document and generate a processing target image (read image). The image to be processed is generated as full-color (RGB 3-channel) image data. FIG. 10(a) shows an example of a document to be scanned. As shown in FIG. 10A, the document is a form such as a registration form, and handwritten characters are entered on the right side of each item in the form.
In S902 , the CPU 201 transmits the processing target image generated in S901 to the image processing server 103 via the external interface 211 . After that, the processing of this flowchart ends.

＜ＯＣＲ処理＞
次に、画像処理サーバ１０３によるＯＣＲ処理について説明する。画像処理サーバ１０３は、画像処理装置１０１から処理対象画像を受信し、当該処理対象画像に含まれる活字や手書き文字をＯＣＲしてテキストデータを得る。活字に対するＯＣＲは、活字ＯＣＲサーバ１０４に実行させる。手書き文字に対するＯＣＲは、手書きＯＣＲサーバ１０５に実行させる。図９Ａ（ｂ）は、このＯＣＲ処理を示すフローチャートである。本フローチャートは、ＣＰＵ２６１が、ストレージ２６５に記憶されている画像処理プログラムを読み出し、ＲＡＭ２６４に展開して実行することで実現される。本フローチャートは、ユーザが、画像処理サーバ１０３の電源をＯＮ（オン）にすると開始される。 <OCR processing>
Next, OCR processing by the image processing server 103 will be described. The image processing server 103 receives an image to be processed from the image processing apparatus 101 and obtains text data by performing OCR on typed characters and handwritten characters included in the image to be processed. OCR for printed characters is performed by the printed OCR server 104 . OCR for handwritten characters is executed by the handwritten OCR server 105 . FIG. 9A(b) is a flow chart showing this OCR processing. This flowchart is implemented by the CPU 261 reading out an image processing program stored in the storage 265, developing it in the RAM 264, and executing it. This flowchart starts when the user turns on the power of the image processing server 103 .

まずＳ９２１において、ＣＰＵ２６１は、手書き画素の推定を行うニューラルネットワークをロードする。まず、ＣＰＵ２６１は、図４のフローチャートのＳ４０１の場合と同一のニューラルネットワークを構築する。そして、図４のフローチャートのＳ４０６において、学習装置１０２から送信された学習結果（ニューラルネットワークのパラメータ）を、構築したニューラルネットワークに反映する。
Ｓ９２２において、ＣＰＵ２６１は、外部インタフェース２６８を介して、画像処理装置１０１から処理対象画像を受信したか否かを判定する。処理対象画像を受信していた場合、処理はＳ９２３に遷移する。受信していない場合、処理はＳ９３７に遷移する。本フローチャートでは、処理対象画像として、図１０（ａ）に示す原稿をスキャンして得られた処理対象画像として受信したものとする。 First, in S921, the CPU 261 loads a neural network for estimating handwritten pixels. First, the CPU 261 constructs the same neural network as in S401 of the flowchart of FIG. Then, in S406 of the flowchart of FIG. 4, the learning results (neural network parameters) transmitted from the learning device 102 are reflected in the constructed neural network.
In S922 , the CPU 261 determines whether an image to be processed has been received from the image processing apparatus 101 via the external interface 268 . If the image to be processed has been received, the process transitions to S923. If not received, the process transitions to S937. In this flowchart, it is assumed that an image to be processed obtained by scanning the document shown in FIG. 10A is received as the image to be processed.

Ｓ９２３において、ＣＰＵ２６１は、Ｓ９２２で受信した処理対象画像に対して処理対象領域抽出処理を行い、処理対象画像に含まれる手書き文字及び活字の領域を処理対象領域として抽出する。処理対象領域抽出処理の詳細は、図９Ｂ（ａ）で後述する。図１０（ａ）の処理対象画像について処理領域抽出処理を行った結果として得られた処理対象領域を、図１２（ａ）の領域１２０１～１２０５に示す。
Ｓ９２４において、ＣＰＵ２６１は、Ｓ９２２で受信した処理対象画像の手書き部分を推定する。まずＣＰＵ２６１は、処理対象画像をグレースケール化する。そして、このグレースケール化した処理対象画像を、Ｓ９２１で構築したニューラルネットワークに入力して、各画素について手書きであるかを推定する。この結果として、処理対象画像と同じサイズであり、手書きであると判定された画素には、手書きであること示す値（例えば１）、手書きではないと判定された画素には、手書きではないことを示す値（例えば０）が、それぞれ記録された画像データが得られる。以降、この画像データを「推定マップ」と呼称する。また、手書きであると判定された画素を、「手書き推定画素」と呼称し、手書きではないと判定された画素を、「背景推定画素」と呼称する。図１０（ａ）の処理対象画像に対する推定マップを図１０（ｂ）に示す。図１０（ｂ）では、結果を見やすくするために、値が０の画素を黒画素で表現し、値が１の画素を白画素で表現している。この時点で、電話番号の入力欄に印字されている活字のハイフンや括弧の一部が手書きとして誤判定されている。
Ｓ９２５において、ＣＰＵ２６１は、Ｓ９２４の推定結果について手書き判定を補正する補正処理を行って、手書き判定の誤りを補正する。補正処理の詳細は、図９Ｂ（ｂ）で後述する。図１０（ｂ）の推定マップを補正した結果を表す画像を図１０（ｃ）に示す。補正処理によって、誤判定されていた箇所が補正されているのがわかる。 In S923, the CPU 261 performs processing target region extraction processing on the processing target image received in S922, and extracts the handwritten character and printed character regions included in the processing target image as the processing target region. Details of the processing target region extraction processing will be described later with reference to FIG. 9B(a). Regions 1201 to 1205 in FIG. 12A show processing target regions obtained as a result of performing processing region extraction processing on the processing target image in FIG. 10A.
In S924, the CPU 261 estimates the handwritten portion of the processing target image received in S922. First, the CPU 261 grayscales the image to be processed. Then, this grayscaled image to be processed is input to the neural network constructed in S921 to estimate whether each pixel is handwritten. As a result, pixels that have the same size as the image to be processed and that are determined to be handwritten are given a value (e.g., 1) indicating that they are handwritten, and pixels that are determined not to be handwritten are given a value indicating that they are not handwritten. The image data in which the values (for example, 0) indicating are respectively recorded are obtained. Hereinafter, this image data will be referred to as an "estimation map". Pixels determined to be handwritten are referred to as "estimated handwritten pixels", and pixels determined not to be handwritten are referred to as "estimated background pixels". FIG. 10(b) shows an estimation map for the processing target image of FIG. 10(a). In FIG. 10B, pixels with a value of 0 are represented by black pixels, and pixels with a value of 1 are represented by white pixels in order to make the results easier to see. At this point, some of the hyphens and parentheses printed in the phone number entry field are erroneously determined to be handwritten.
In S925, the CPU 261 performs a correction process for correcting the handwriting determination on the estimation result of S924 to correct an error in the handwriting determination. Details of the correction process will be described later with reference to FIG. 9B(b). FIG. 10(c) shows an image representing the result of correcting the estimated map of FIG. 10(b). It can be seen that the erroneously determined portions are corrected by the correction processing.

Ｓ９２６において、ＣＰＵ２６１は、Ｓ９２５で補正した推定マップをマスクとして、手書きだけを抽出した画像を生成する。具体的には、まずＣＰＵ２６１が処理対象画像と同じサイズの画像を生成し、手書き推定画素の座標については、処理対象画像の画素値を代入し、背景推定画素の座標については、２５５を代入する。以降、この画像を「手書き抽出画像」と呼称する。図１０（ｃ）の補正された推定マップをマスクとして生成した手書き抽出画像を図１１（ａ）に示す。
Ｓ９２７において、ＣＰＵ２６１は、手書き抽出画像を対象に処理対象領域抽出処理を行い、手書き抽出画像に含まれる手書きＯＣＲの対象とする領域（手書き対象領域）を決める。この処理の詳細はＳ９２３と同様であり、図９Ｂ（ａ）で後述する。図１１（ａ）の手書き抽出画像について処理対象領域抽出処理を行った結果として得られた手書き対象領域を図１２（ｂ）の領域１２２１～１２２４に示す。
Ｓ９２８において、ＣＰＵ２６１は、手書き対象領域と手書き抽出画像とを手書きＯＣＲの対象として、外部インタフェース２６８を介して、手書きＯＣＲサーバ１０５に送信し、手書きＯＣＲを実行させる。手書きＯＣＲサーバ１０５のＣＰＵ２６１は、受信した手書きＯＣＲの対象に対して、手書きＯＣＲの処理を実行する。手書きＯＣＲには公知の技術を適用し実現することができる。
Ｓ９２９において、ＣＰＵ２６１は、手書きＯＣＲサーバ１０５から、手書きＯＣＲ結果を受信したか否かを判定する。手書きＯＣＲ結果とは、手書きＯＣＲサーバ１０５が、手書き対象領域に含まれていた手書き文字を認識して得たテキストデータである。ＣＰＵ２６１が外部インタフェース２６８を介して手書きＯＣＲサーバ１０５から手書きＯＣＲ結果を受信したと判定するまでＳ９２９の処理を繰り返し、受信したと判定した場合、処理はＳ９３０に遷移する。 In S926, the CPU 261 generates an image in which only the handwriting is extracted using the estimated map corrected in S925 as a mask. Specifically, first, the CPU 261 generates an image of the same size as the image to be processed, substitutes the pixel values of the image to be processed for the coordinates of the estimated handwriting pixels, and substitutes 255 for the coordinates of the estimated background pixels. . Hereinafter, this image will be referred to as a "handwritten extracted image". FIG. 11(a) shows a handwritten extracted image generated by using the corrected estimation map of FIG. 10(c) as a mask.
In S927 , the CPU 261 performs processing target region extraction processing on the handwritten extracted image, and determines a region (handwritten target region) included in the handwritten extracted image and subjected to handwriting OCR. The details of this process are the same as those of S923, and will be described later with reference to FIG. 9B(a). Areas 1221 to 1224 in FIG. 12B show areas 1221 to 1224 in FIG. 12B that are obtained as a result of performing the processing target area extraction process on the extracted handwriting image in FIG. 11A.
In S928 , the CPU 261 transmits the handwriting target area and the handwritten extracted image as targets for handwriting OCR to the handwriting OCR server 105 via the external interface 268 to execute handwriting OCR. CPU 261 of handwritten OCR server 105 executes handwritten OCR processing on the received handwritten OCR target. Handwritten OCR can be realized by applying known techniques.
In S929 , the CPU 261 determines whether or not a handwritten OCR result has been received from the handwritten OCR server 105 . The handwritten OCR result is text data obtained by the handwritten OCR server 105 recognizing handwritten characters included in the handwritten target area. The process of S929 is repeated until the CPU 261 determines that the handwritten OCR result has been received from the handwritten OCR server 105 via the external interface 268, and when it is determined that the handwritten OCR result has been received, the process proceeds to S930.

続いてＳ９３０において、ＣＰＵ２６１は、Ｓ９２５で補正した推定マップをマスクとして、背景だけを抽出した画像を生成する。具体的には、まずＣＰＵ２６１が処理対象画像と同じサイズの画像を生成し、背景推定画素の座標については、処理対象画像の画素値を代入し、手書き推定画素の座標については、２５５を代入する。以降、この画像を「背景抽出画像」と呼称する。図１０（ｃ）の補正された推定マップをマスクとして生成した背景抽出画像を図１１（ｂ）に示す。
Ｓ９３１において、ＣＰＵ２６１は、背景抽出画像を対象に処理対象領域抽出処理を行い、背景抽出画像に含まれる活字ＯＣＲの対象とする領域（活字対象領域）を決める。この処理の詳細はＳ９２３と同様であり、図９Ｂ（ａ）で後述する。図１１（ｂ）の背景抽出画像について処理対象領域抽出処理を行った結果として得られた活字対象領域を図１２（ｃ）の領域１２４１～１２４５に示す。
Ｓ９３２において、ＣＰＵ２６１は、Ｓ９３１で得た活字対象領域と背景抽出画像とを活字ＯＣＲの対象として、外部インタフェース２６８を介して、活字ＯＣＲサーバ１０４に送信し、活字ＯＣＲを実行させる。活字ＯＣＲサーバ１０４のＣＰＵ２６１は、受信した活字ＯＣＲの対象に対して、活字ＯＣＲの処理を実行する。活字ＯＣＲには公知の技術を適用し実現することができる。
Ｓ９３３において、ＣＰＵ２６１は、活字ＯＣＲサーバ１０４から、活字ＯＣＲ結果を受信したか否かを判定する。活字ＯＣＲ結果とは、活字ＯＣＲサーバ１０４が、活字対象領域に含まれていた活字を認識して得たテキストデータである。ＣＰＵ２６１が外部インタフェース２６８を介して活字ＯＣＲサーバ１０４からから活字ＯＣＲ結果を受信したと判定するまでＳ９３３の処理を繰り返し、受信したと判定した場合、処理はＳ９３４に遷移する。 Subsequently, in S930, the CPU 261 generates an image in which only the background is extracted using the estimated map corrected in S925 as a mask. Specifically, first, the CPU 261 generates an image of the same size as the image to be processed, substitutes the pixel values of the image to be processed for the coordinates of the estimated background pixels, and substitutes 255 for the coordinates of the estimated handwriting pixels. . Henceforth, this image is called a "background extraction image." FIG. 11(b) shows a background extraction image generated by using the corrected estimation map of FIG. 10(c) as a mask.
In S931, the CPU 261 performs processing target region extraction processing on the background extraction image, and determines a region (print target region) included in the background extraction image to be subjected to type OCR. The details of this process are the same as those of S923, and will be described later with reference to FIG. 9B(a). Regions 1241 to 1245 in FIG. 12(c) show the regions 1241 to 1245 in FIG. 12(c), which are obtained as a result of performing the processing target region extraction processing on the extracted background image in FIG. 11(b).
In S932, the CPU 261 transmits the text target region and the background extraction image obtained in S931 to the text OCR server 104 via the external interface 268 as text OCR targets, and causes the text OCR to be executed. The CPU 261 of the printed character OCR server 104 executes printed character OCR processing on the received printed character OCR target. The type OCR can be realized by applying a known technique.
In S933 , the CPU 261 determines whether or not the print OCR result has been received from the print OCR server 104 . The printed character OCR result is text data obtained by the printed character OCR server 104 recognizing the printed characters included in the printed character target area. The process of S933 is repeated until the CPU 261 determines that the printed character OCR result has been received from the printed character OCR server 104 via the external interface 268. If it is determined that the printed character OCR result has been received, the process proceeds to S934.

続いてＳ９３４において、ＣＰＵ２６１は、各処理対象領域についてＯＣＲ結果を集計する。まずＣＰＵ２６１は、Ｓ９２３で得た処理対象領域のうち、Ｓ９２７で得た手書き対象領域と、Ｓ９３１で得た活字対象領域のどちらかのみを含むものについては、そのＯＣＲ結果をそのまま処理対象領域のＯＣＲ結果とする。また、処理対象領域内に、手書き対象領域と活字対象領域の両方を含むものについては、手書きＯＣＲ結果と活字ＯＣＲ結果を、処理対象領域での位置関係に応じて並べたものを処理対象領域のＯＣＲ結果とする。
図１２を用いて本ステップの処理について説明する。例えば、処理対象領域１２０３は手書き対象領域１２２１のみを含むため、そのＯＣＲ結果である「田中太郎」をＯＣＲ結果とする。一方、処理対象領域１２０５は手書き対象領域１２２２，１２２３，１２２４及び活字対象領域１２４４，１２４５を含むため、各ＯＣＲ結果を元の位置関係の順番に並べて「０２－（３２）－１２６８」をＯＣＲ結果とする。 Subsequently, in S934, the CPU 261 aggregates the OCR results for each processing target area. First, the CPU 261 directly applies the OCR result of the area to be processed obtained in S923 that includes either the handwritten area obtained in S927 or the printed character area obtained in S931 to the OCR of the area to be processed. result. In addition, in the case where the processing target region includes both the handwriting target region and the printed character target region, the handwritten OCR result and the printed character OCR result are arranged according to the positional relationship in the processing target region. OCR result.
The processing of this step will be described with reference to FIG. For example, since the processing target area 1203 includes only the handwriting target area 1221, the OCR result "Taro Tanaka" is used as the OCR result. On the other hand, since the processing target region 1205 includes handwriting target regions 1222, 1223, 1224 and printed character target regions 1244, 1245, each OCR result is arranged in the order of the original positional relationship, and "02-(32)-1268" is the OCR result. and

Ｓ９３５において、ＣＰＵ２６１は、各処理対象領域のＯＣＲ結果を統合する。ここでは各処理対象領域について、その位置関係や、意味的な妥当性を評価することで、項目と値のペアを推定する。例えば、処理対象領域１２０２に対し、最も近い処理対象領域は領域１２０３であり、且つ領域１２０２のＯＣＲ結果「氏名」が項目名だとすると、領域１２０３のＯＣＲ結果「田中太郎」は氏名を含むので値としての妥当性が高い。よって、領域１２０２のＯＣＲ結果と領域１２０３のＯＣＲ結果を、氏名に関する項目と値のペアであると推定する。同様の方法で、領域１２０４と領域１２０５のＯＣＲ結果もそれぞれ項目と値のペアであると推定する。
Ｓ９３６において、ＣＰＵ２６１は、Ｓ９３５で得た項目と値のペアを、外部インタフェース２６８を介して、ＤＢサーバ１０６に送信して保存させる。
Ｓ９３７において、ＣＰＵ２６１は、一連の処理を終了するか否かを判定する。ＣＰＵ２６１が画像処理サーバ１０３の電源のＯＦＦなどの所定の操作を検知しない限り、処理はＳ９２２に遷移する。ＣＰＵ２６１が所定の操作を検知した場合には、本フローチャートの処理が終了する。
以上のようなＯＣＲ処理によれば、手書きと活字が混在する処理対象画像から、手書き文字のみの領域と活字のみの領域とを抽出し、各領域に対して行った文字認識の結果をＤＢサーバ１０６に保存することができる。 In S935, the CPU 261 integrates the OCR results of each processing target area. Here, for each region to be processed, the positional relationship and semantic validity are evaluated to estimate item-value pairs. For example, if the closest processing target area to the processing target area 1202 is the area 1203, and the OCR result "name" of the area 1202 is the item name, the OCR result "Taro Tanaka" of the area 1203 includes the name, so the value is is highly relevant. Therefore, the OCR result of area 1202 and the OCR result of area 1203 are presumed to be an item-value pair relating to the name. In a similar manner, assume that the OCR results for regions 1204 and 1205 are also item-value pairs, respectively.
In S936 , the CPU 261 transmits the item-value pairs obtained in S935 to the DB server 106 via the external interface 268 for storage.
In S937, the CPU 261 determines whether or not to end the series of processes. Unless the CPU 261 detects a predetermined operation such as powering off the image processing server 103, the process proceeds to S922. When the CPU 261 detects the predetermined operation, the processing of this flowchart ends.
According to the OCR processing as described above, an area containing only handwritten characters and an area containing only printed characters are extracted from an image to be processed in which handwritten characters and printed characters are mixed, and the results of character recognition performed on each area are stored in a DB server. 106.

＜処理対象領域抽出処理＞
次に、図９Ａ（ｂ）のＳ９２３，Ｓ９２７，Ｓ９３１で実行される処理対象領域抽出処理について説明する。図９Ｂ（ａ）は、処理対象領域抽出処理を示すフローチャートである。本フローチャートは、ＣＰＵ２６１が、ストレージ２６５に記憶されている画像処理プログラムを読み出し、ＲＡＭ２６４に展開して実行することで実現される。本フローチャートは、図９Ａ（ｂ）のＳ９２３，Ｓ９２７，Ｓ９３１で、それぞれ処理対象画像、手書き抽出画像、背景抽出画像を入力として実行される。
まずＳ９５１において、ＣＰＵ２６１は、入力画像に収縮処理を掛ける。これにより、文字が太らされて、文字を構成する部首や点などの小さなパーツが周辺の文字と繋がり、後段の処理（Ｓ９５３）でこれらがノイズとして扱われることを抑制することができる。
Ｓ９５２において、ＣＰＵ２６１は、黒画素が連結している領域の外接矩形を取得する。ＣＰＵ２６１は、Ｓ９５１で収縮処理した画像に対して、黒画素が連結している領域を探索し、探索された全ての領域について個別に外接矩形を生成する。
Ｓ９５３において、ＣＰＵ２６１は、Ｓ９５２で生成した外接矩形の中から、文字のものである可能性の低い矩形を除外する。例えば、矩形の辺の長さや面積に所定範囲を設け、当該所定範囲から外れるものについては文字ではないと推定して取り除く。これにより、図表を囲っている大きな矩形や、小さなノイズを囲っている極小の矩形を除外することができる。
Ｓ９５４において、ＣＰＵ２６１は、近接する外接矩形同士を連結する。ＣＰＵ２６１は、Ｓ９５３の結果残った各矩形について、その左右の一定の距離内に別の矩形がある場合は、それらの矩形を全て結合した新しい矩形に置き換える。これにより、文字単体ではなく単語や文章全体などのまとまりを囲う矩形を形成できる。この結果得られた各矩形を、処理対象領域とする。その後、図９A（ｂ）のフローに戻る。 <Processing target region extraction processing>
Next, the processing target area extraction processing executed in S923, S927, and S931 of FIG. 9A(b) will be described. FIG. 9B(a) is a flowchart showing the processing target area extraction process. This flowchart is implemented by the CPU 261 reading out an image processing program stored in the storage 265, developing it in the RAM 264, and executing it. This flowchart is executed in S923, S927, and S931 of FIG. 9A(b) with the processing target image, the handwritten extracted image, and the background extracted image as inputs, respectively.
First, in S951, the CPU 261 applies contraction processing to the input image. As a result, the characters are thickened, and small parts such as radicals and dots that make up the characters are connected to the surrounding characters, which can be suppressed from being treated as noise in the subsequent processing (S953).
In S952, the CPU 261 acquires the circumscribed rectangle of the area where the black pixels are connected. The CPU 261 searches for areas where black pixels are connected to the image that has been contracted in S951, and individually generates circumscribing rectangles for all the searched areas.
In S953, the CPU 261 excludes rectangles that are unlikely to be characters from the circumscribing rectangles generated in S952. For example, a predetermined range is provided for the length and area of the sides of a rectangle, and characters outside the predetermined range are presumed to be non-characters and removed. This allows the elimination of large rectangles surrounding figures and tiny rectangles surrounding small noises.
In S954, the CPU 261 connects adjacent circumscribing rectangles. For each rectangle remaining as a result of S953, if there is another rectangle within a certain distance to the left and right of the rectangle, the CPU 261 replaces all of those rectangles with a new rectangle that combines them. This makes it possible to form a rectangle that encloses a group of words or whole sentences instead of individual characters. Each rectangle obtained as a result is set as a processing target area. After that, the process returns to the flow of FIG. 9A(b).

以上のような処理対象領域抽出処理によれば、記入項目を表す活字や、記入欄に記入された手書き文字などの領域を処理対象領域として抽出することができる。Ｓ９２３では、ＣＰＵ２６１が、処理対象画像に含まれる手書き文字及び活字の領域を抽出する。Ｓ９２７では、ＣＰＵ２６１が、手書き抽出画像から処理対象領域を抽出して、抽出した領域を手書き対象領域として決定する。Ｓ９３１では、ＣＰＵ２６１が、背景抽出画像から処理対象領域を抽出して、抽出した領域を活字対象領域として決定する。 According to the process target area extraction process described above, it is possible to extract areas such as typed characters representing entry items and handwritten characters entered in entry fields as process target areas. In S923, the CPU 261 extracts areas of handwritten characters and printed characters included in the image to be processed. In S927, the CPU 261 extracts a processing target area from the handwritten extracted image and determines the extracted area as a handwritten target area. In S931, the CPU 261 extracts a processing target area from the background extraction image and determines the extracted area as a printing target area.

＜手書き判定の補正処理＞
次に、図９Ａ（ｂ）のＳ９２５で実行される手書き判定の誤りを補正する補正処理について説明する。図９Ｂ（ｂ）は、本実施形態に係る補正処理を示すフローチャートである。本フローチャートは、ＣＰＵ２６１が、ストレージ２６５に記憶されている画像処理プログラムを読み出し、ＲＡＭ２６４に展開して実行することで実現される。本フローチャートは、図９Ａ（ｂ）のＳ９２５で、Ｓ９２４で得られた推定マップを入力として実行される。 <Correction processing for handwriting determination>
Next, a correction process for correcting an error in handwriting determination performed in S925 of FIG. 9A(b) will be described. FIG. 9B(b) is a flowchart showing correction processing according to the present embodiment. This flowchart is implemented by the CPU 261 reading out an image processing program stored in the storage 265, developing it in the RAM 264, and executing it. This flowchart is executed in S925 of FIG. 9A(b) with the estimated map obtained in S924 as input.

まずＳ９７１において、ＣＰＵ２６１は、図９Ａ（ｂ）のＳ９２３で得た全ての処理対象領域について、後続するＳ９７２からＳ９７７までの処理が終了したか否かを判定する。ＣＰＵ２６１が全ての処理対象領域について処理が終了したと判定した場合には、図９Ａ（ｂ）のフローに戻る。一方で、ＣＰＵ２６１が未処理の処理対象領域が存在すると判定した場合には、処理対象領域を一つ選択して、処理はＳ９７２に遷移する。このようにして、ＣＰＵ２６１は、処理対象画像から複数の処理対象領域が抽出された場合には、当該複数の処理対象領域に対して、補正を行うかを順次判定していく。以下、図１２（ａ）に示した処理対象領域１２０５が選択されたものとして説明する。
Ｓ９７２において、ＣＰＵ２６１は、Ｓ９７１で選択した処理対象領域内に存在する輪郭を抽出する。輪郭の抽出は、処理対象領域内の画像を二値化した後、一般的な輪郭追跡アルゴリズム等を用いて行う。図１３（ａ）は、処理対象領域１２０５から抽出された輪郭１３０１～１３１２を示す。図１３（ａ）に示すように、文字のストロークを輪郭（アウトライン）で囲む領域が抽出される。
Ｓ９７３において、ＣＰＵ２６１は、Ｓ９７２で抽出した全ての輪郭について、後続するＳ９７４からＳ９７７までの処理が終了したか否かを判定する。ＣＰＵ２６１が全ての輪郭について処理が終了したと判定した場合には、処理はＳ９７１に遷移する。一方で、ＣＰＵ２６１が未処理の輪郭が存在すると判定した場合には、輪郭を一つ選択して、処理はＳ９７４に遷移する。このようにして、ＣＰＵ２６１は、処理対象領域から複数の輪郭が抽出された場合には、当該複数の輪郭に対して、補正を行うか順次判定していく。以下、図１３（ａ）の輪郭１３０７が選択されたものとして説明する。なお本実施形態では、輪郭１３０１や輪郭１３１２のように、一つの閉領域が複数の輪郭によって形成される場合には、当該複数の輪郭を一つの輪郭として扱う。 First, in S971, the CPU 261 determines whether the subsequent processes from S972 to S977 have been completed for all the processing target areas obtained in S923 of FIG. 9A(b). When the CPU 261 determines that processing has been completed for all processing target areas, the process returns to the flow of FIG. 9A(b). On the other hand, when the CPU 261 determines that there is an unprocessed processing target area, one processing target area is selected, and the process transitions to S972. In this manner, when a plurality of processing target regions are extracted from the processing target image, the CPU 261 sequentially determines whether or not to perform correction for the plurality of processing target regions. In the following description, it is assumed that the processing target area 1205 shown in FIG. 12A is selected.
In S972, the CPU 261 extracts contours present in the processing target area selected in S971. Contour extraction is performed by using a general contour tracking algorithm or the like after binarizing the image within the processing target area. FIG. 13(a) shows contours 1301 to 1312 extracted from a region 1205 to be processed. As shown in FIG. 13(a), an area surrounding the character stroke with an outline is extracted.
In S973, the CPU 261 determines whether the subsequent processes from S974 to S977 have been completed for all contours extracted in S972. When the CPU 261 determines that processing has been completed for all contours, the processing transitions to S971. On the other hand, when the CPU 261 determines that there is an unprocessed contour, one contour is selected and the process transitions to S974. In this manner, when a plurality of contours are extracted from the processing target area, the CPU 261 sequentially determines whether to correct the plurality of contours. In the following description, it is assumed that the contour 1307 in FIG. 13(a) is selected. Note that in this embodiment, when one closed region is formed by a plurality of contours, such as the contour 1301 and the contour 1312, the plurality of contours are treated as one contour.

Ｓ９７４において、ＣＰＵ２６１は、Ｓ９７３で選択した輪郭の外接矩形を算出する。輪郭１３０７について算出された外接矩形を、図１３（ａ）の点線枠１３２１で示す。
Ｓ９７５において、ＣＰＵ２６１は、Ｓ９７４で算出した外接矩形が所定の条件を満たすか否かを判定する。本実施形態において、所定の条件は、手書きと混同しやすい活字の輪郭の外接矩形を示す条件である。手書きと混同しやすい活字は、例えば活字の括弧やハイフンである。活字の括弧やハイフンの外接矩形の形状は、予め様々なフォントやサイズで集計しておき、取りうる面積や縦横比の範囲を統計的に規定しておく。活字の括弧の外接矩形の範囲は、例えば縦が６～１５ピクセル、且つ横が２６～３４ピクセルである。また活字のハイフンの外接矩形の範囲は、例えば縦が２～６ピクセル、且つ横が２０～４０ピクセルである。活字の括弧やハイフンの外接矩形は、細長い形状を有する傾向がある。ＣＰＵ２６１が外接矩形が所定の条件を満たすと判定した場合には、Ｓ９７６に遷移し、所定の条件を満たさないと判定した場合には、Ｓ９７３で選択した輪郭は補正の対象外として、再びＳ９７３に遷移する。 In S974, the CPU 261 calculates a circumscribed rectangle of the contour selected in S973. A circumscribed rectangle calculated for the contour 1307 is indicated by a dotted line frame 1321 in FIG. 13(a).
In S975, the CPU 261 determines whether or not the circumscribed rectangle calculated in S974 satisfies a predetermined condition. In this embodiment, the predetermined condition is a condition indicating a circumscribing rectangle of an outline of a printed character that is likely to be confused with handwriting. Typefaces that are likely to be confused with handwriting are, for example, typeface brackets and hyphens. The shape of the circumscribing rectangles of type brackets and hyphens is tabulated in advance in various fonts and sizes, and the range of possible area and aspect ratio is statistically defined. The range of the circumscribing rectangle of the parentheses in type is, for example, 6 to 15 pixels high and 26 to 34 pixels wide. The circumscribing rectangle of a hyphen in a type is, for example, 2 to 6 pixels long and 20 to 40 pixels wide. Typographic brackets and hyphen bounding rectangles tend to have an elongated shape. When the CPU 261 determines that the circumscribing rectangle satisfies the predetermined condition, the process proceeds to S976. Transition.

Ｓ９７６において、ＣＰＵ２６１は、Ｓ９７３で選択した輪郭内に手書き部分と背景部分との両方が含まれるか否かを判定する。当該判定は、Ｓ９７３で選択した輪郭の座標位置と、手書き推定画素の座標位置と、背景推定画素の座標位置とを比較することにより行う。具体的には、抽出された輪郭で囲まれる領域内に手書き推定画素と背景推定画素との両方が存在するかを判定する。図１３（ｂ）は、図１０（ｂ）に示す推定マップのうち、処理対象領域１２０５の画像を抜き出して示す。図１３（ｂ）では、図１３（ａ）の輪郭１３０７に対応する位置に、手書き推定画素と背景推定画素との両方が存在していることがわかる。ＣＰＵ２６１が輪郭内に手書き推定画素と背景推定画素との両方が含まれると判定した場合、Ｓ９７７に遷移し、手書き推定画素のみ、または背景推定画素のみが含まれると判定した場合、Ｓ９７３で選択した輪郭は補正の対象外として、再びＳ９７３に遷移する。
Ｓ９７７において、ＣＰＵ２６１は、Ｓ９７３で選択した輪郭内にある画素を、すべて背景推定画素に補正する。図１３（ｂ）に示す推定マップに対して、本ステップの処理を実行した結果を図１３（ｃ）に示す。図１３（ｃ）に示すように、輪郭１３０７内に存在していた手書き推定画素が、背景推定画素に補正されている。その後処理はＳ９７３に遷移する。 In S976, the CPU 261 determines whether or not the outline selected in S973 includes both the handwritten portion and the background portion. This determination is made by comparing the coordinate position of the outline selected in S973, the coordinate position of the estimated handwriting pixel, and the coordinate position of the estimated background pixel. Specifically, it is determined whether or not both estimated handwriting pixels and estimated background pixels exist within the area surrounded by the extracted outline. FIG. 13(b) shows an image of the processing target area 1205 extracted from the estimation map shown in FIG. 10(b). In FIG. 13(b), it can be seen that both estimated handwriting pixels and estimated background pixels exist at positions corresponding to the contour 1307 in FIG. 13(a). If the CPU 261 determines that both the estimated handwriting pixels and the estimated background pixels are included in the contour, the process proceeds to S977. Contours are excluded from correction, and the process proceeds to S973 again.
In S977, the CPU 261 corrects all pixels in the contour selected in S973 to background estimated pixels. FIG. 13(c) shows the result of executing the process of this step on the estimation map shown in FIG. 13(b). As shown in FIG. 13C, the estimated handwriting pixels that existed within the contour 1307 are corrected to the estimated background pixels. After that, the process transitions to S973.

以上のような補正処理によれば、画像処理サーバ１０３は、処理対象領域内に所定の条件を満たす輪郭が存在し、且つ当該輪郭内に手書きであると推定された画素と背景であると推定された画素の両方が含まれる場合には、推定結果の補正を行う。具体的には、活字の括弧やハイフン等の条件を満たす輪郭に対して、輪郭内の全体が背景部分を示すように、推定結果の補正を行う。 According to the above-described correction processing, the image processing server 103 estimates that there is a contour that satisfies a predetermined condition in the region to be processed, and that the contour contains pixels estimated to be handwritten and the background. If both of the estimated pixels are included, the estimation result is corrected. Specifically, the estimation result is corrected so that the entirety of the contour that satisfies conditions such as brackets and hyphens in the type indicates the background portion.

以上のような実施形態１によれば、手書き部分の推定後、処理対象領域内のストロークにおいて、手書きに推定された部分と背景に推定された部分との両方が存在する場合、推定の誤りとして補正できる。これにより、処理対象領域内のストロークについて、手書き部分の推定位置を補正することが可能になり、処理対象の手書き画像において、文字の一部が欠損されることや、余計な部分が付加されることを抑制できる。即ち、手書き文字領域の推定精度を向上できる。さらには、手書き文字の認識精度の向上を図ることができる。また実施形態１では、処理対象領域内のストロークが、手書きと混同しやすい活字のストロークなど所定の特徴を示す場合に、そのストロークの位置における推定結果を背景部分に補正できる。これにより、活字の括弧やハイフンなどの手書きと混同しやすい活字のストロークに対する推定の誤りを補正することが可能になり、推定精度の更なる向上を図ることができる。 According to the first embodiment as described above, after estimating a handwritten portion, if there are both a portion estimated to be handwritten and a portion estimated to be background in strokes in the processing target area, an estimation error is can be corrected. As a result, it is possible to correct the estimated position of the handwritten part for the strokes in the processing target area, and in the handwritten image to be processed, part of the character is lost or an extra part is added. can be suppressed. That is, it is possible to improve the estimation accuracy of the handwritten character area. Furthermore, it is possible to improve the recognition accuracy of handwritten characters. Further, in the first embodiment, when the strokes in the processing target area show predetermined features such as strokes of printed characters that are likely to be confused with handwriting, the estimation results at the positions of the strokes can be corrected to the background portion. As a result, it is possible to correct errors in estimation of type strokes, such as parentheses and hyphens, which are likely to be confused with handwriting, and to further improve the estimation accuracy.

本実施形態の第１の変形例として、処理対象領域から抽出された輪郭が所定の条件を満たし、且つ当該輪郭内の領域に手書き推定画素と背景推定画素が混在することに応じて、推定結果を補正する形態であれば、補正内容は上記と異なってもよい。 As a first modification of the present embodiment, an estimation result , the content of correction may be different from the above.

本実施形態の第２の変形例として、処理対象領域から抽出された輪郭が所定の条件を満たし、且つ当該輪郭内に手書き推定画素と背景推定画素が混在する場合であっても、当該輪郭を補正対象外とする条件があってもよい。この条件としては、例えば、当該輪郭の外側の周辺に、手書き推定画素が存在しないことである。 As a second modification of the present embodiment, even if a contour extracted from a region to be processed satisfies a predetermined condition and includes a mixture of estimated handwriting pixels and estimated background pixels, the contour is There may be a condition to exclude the correction target. This condition is, for example, that there are no presumed handwritten pixels around the outside of the contour.

［実施形態２］
本実施形態では、手書き判定の補正処理において、輪郭内にある手書き推定画素の数と背景推定画素の数とに応じて、補正内容を切り替える点で、実施形態１とは異なる。なお、本実施形態に係る画像処理システム１００の構成は、特徴部分を除いて実施形態１の構成と同様である。そのため、同様の構成については、同様の符号を付し、その詳細な説明を省略する。以下、実施形態１との差分を中心に説明する。 [Embodiment 2]
This embodiment differs from the first embodiment in that, in the handwriting determination correction process, the correction content is switched according to the number of estimated handwriting pixels and the number of estimated background pixels in the contour. Note that the configuration of the image processing system 100 according to the present embodiment is the same as that of the first embodiment except for characteristic portions. Therefore, similar configurations are denoted by similar reference numerals, and detailed descriptions thereof are omitted. The following description focuses on differences from the first embodiment.

＜手書き判定の補正処理＞
以下、図９Ａ（ｂ）のＳ９２５で実行される手書き判定の誤りを補正する補正処理について説明する。図１４は、本実施形態に係る補正処理を示すフローチャートである。本フローチャートは、ＣＰＵ２６１が、ストレージ２６５に記憶されている画像処理プログラムを読み出し、ＲＡＭ２６４に展開して実行することで実現される。本フローチャートは、図９Ａ（ｂ）のＳ９２５で、Ｓ９２４で手書き画素を推定した結果として得られた画像を入力として実行される。 <Correction processing for handwriting determination>
The correction process for correcting an error in handwriting determination performed in S925 of FIG. 9A(b) will be described below. FIG. 14 is a flowchart showing correction processing according to this embodiment. This flowchart is implemented by the CPU 261 reading out an image processing program stored in the storage 265, developing it in the RAM 264, and executing it. This flowchart is executed in S925 of FIG. 9A(b) with an image obtained as a result of estimating handwritten pixels in S924 as an input.

図１４のＳ９７１からＳ９７５までは、実施形態１と同様である。なお、本実施形態の説明では、図１０（ａ）の処理対象画像について手書き画素を推定した結果として、図１５（ａ）に示すような推定マップが得られたとする。この時点で、電話番号の入力欄に手書き記入の補助的に印字されている活字のハイフンや括弧の一部が手書きとして誤判定され、電話番号の入力欄に手書きで記入されている「１」の一部が背景として誤判定されている。 S971 to S975 in FIG. 14 are the same as in the first embodiment. In the description of the present embodiment, it is assumed that an estimation map as shown in FIG. 15A is obtained as a result of estimating handwritten pixels for the image to be processed in FIG. 10A. At this point, some of the hyphens and parentheses printed in the phone number input field to assist handwriting are misjudged as handwritten, and the handwritten "1" is entered in the phone number input field. are misidentified as background.

次にＳ１４０１において、ＣＰＵ２６１は、Ｓ９７３で選択した輪郭内に対応する位置にある画素のうち、手書き推定画素の数と、背景推定画素の数とをそれぞれカウントする。
Ｓ１４０２において、ＣＰＵ２６１は、Ｓ１４０１でカウントした手書き推定画素の数と背景推定画素の数とを比較し、手書き推定画素の数が背景推定画素の数よりも多いか否かを判定する。ＣＰＵ２６１が手書き推定画素の数が背景推定画素の数よりも多いと判定した場合、Ｓ１４０４に遷移し、手書き推定画素の数が背景推定画素の数よりも少ないと判定した場合、Ｓ１４０３に遷移する。
図１５（ｂ）は、図１５（ａ）に示す推定マップのうち、処理対象領域１２０５の画像を抜き出して示す。図１５（ｂ）に示す推定マップのうち、図１３（ａ）の輪郭１３０７の領域内にある画素では、手書き推定画素の数が背景推定画素の数を下回る。そのため、背景推定画素の数の方が多いとして、処理はＳ１４０３に遷移する。一方で、図１３（ａ）の輪郭１３０９の領域内にある画素では、手書き推定画素の数が背景推定画素の数を上回る。そのため、手書き推定画素の数の方が多いとして、処理はＳ１４０４に遷移する。なお本ステップでは、手書き推定画素の数と背景推定画素の数とを比較しているが、所定の画素数と比較することにより判定を行ってもよい。図１５（ｂ）の推定マップにおける、図１３（ａ）の輪郭１３０７に対して次に説明するＳ１４０３を実行し、図１３（ａ）の輪郭１３０９に対して次に説明するＳ１４０４を実行した結果を、図１５（ｃ）に示す。 Next, in S1401, the CPU 261 counts the number of presumed handwriting pixels and the number of presumed background pixels among the pixels at the corresponding positions within the contour selected in S973.
In S1402, the CPU 261 compares the number of estimated handwritten pixels counted in S1401 with the number of estimated background pixels, and determines whether or not the number of estimated handwritten pixels is greater than the number of estimated background pixels. If the CPU 261 determines that the number of estimated handwritten pixels is greater than the number of estimated background pixels, the process proceeds to S1404, and if the CPU 261 determines that the number of estimated handwritten pixels is less than the number of estimated background pixels, the process proceeds to S1403.
FIG. 15(b) shows an image of the processing target area 1205 extracted from the estimation map shown in FIG. 15(a). In the estimation map shown in FIG. 15(b), the number of estimated handwriting pixels is less than the number of estimated background pixels in the pixels within the area of the contour 1307 in FIG. 13(a). Therefore, the number of estimated background pixels is larger, and the process transitions to S1403. On the other hand, in the pixels within the area of the contour 1309 in FIG. 13A, the number of estimated handwriting pixels exceeds the number of estimated background pixels. Therefore, the number of handwritten estimated pixels is greater, and the process transitions to S1404. In this step, the number of estimated handwritten pixels and the number of estimated background pixels are compared, but the determination may be made by comparing with a predetermined number of pixels. Result of executing S1403 described below on the contour 1307 of FIG. 13A and executing S1404 described below on the contour 1309 of FIG. 13A in the estimation map of FIG. 15B is shown in FIG. 15(c).

Ｓ１４０３において、ＣＰＵ２６１は、Ｓ９７３で選択した輪郭内に位置する画素を、すべて背景推定画素となるよう補正する。Ｓ９７３で輪郭１３０７が選択された場合には、図１５（ｃ）に示すように、輪郭１３０７内にある手書き推定画素が、すべて背景推定画素に補正される。その後処理はＳ９７３に遷移する。
Ｓ１４０４において、ＣＰＵ２６１は、Ｓ９７３で選択した輪郭内に位置する画素を、すべて手書き推定画素となるよう補正する。Ｓ９７３で輪郭１３０９が選択された場合には、図１５（ｃ）に示すように、輪郭１３０９内にある背景推定画素が、すべて手書き推定画素に補正される。その後処理はＳ９７３に遷移する。 In S1403, the CPU 261 corrects all of the pixels positioned within the contour selected in S973 to be the estimated background pixels. If the contour 1307 is selected in S973, all estimated handwriting pixels within the contour 1307 are corrected to estimated background pixels as shown in FIG. 15(c). After that, the process transitions to S973.
In S1404, the CPU 261 corrects all the pixels positioned within the contour selected in S973 to be handwriting presumed pixels. When the contour 1309 is selected in S973, all estimated background pixels within the contour 1309 are corrected to estimated handwritten pixels as shown in FIG. 15(c). After that, the process transitions to S973.

以上のような補正処理によれば、画像処理サーバ１０３は、処理対象領域内に存在する輪郭内に位置する画素のうち、手書きであると推定された画素数と背景であると推定された画素数とを比較し、当該比較の結果に応じて、推定結果の補正を行う。具体的には、手書きであると推定された画素数の方が多ければ、当該輪郭内の全体が手書き部分を示すように、推定結果の補正を行う。また、手書きであると推定された画素数の方が少なければ、当該輪郭内の全体が背景部分を示すように、推定結果の補正を行う。 According to the above-described correction processing, the image processing server 103 calculates the number of pixels estimated to be handwritten and the number of pixels estimated to be background among the pixels located within the contour existing in the processing target area. are compared, and the estimation result is corrected according to the result of the comparison. Specifically, if the number of pixels estimated to be handwritten is greater, the estimation result is corrected so that the entirety of the outline indicates the handwritten portion. If the number of pixels estimated to be handwritten is smaller, the estimation result is corrected so that the entirety of the contour indicates the background portion.

以上のような実施形態２によれば、手書き部分の推定後、処理対象領域内のストロークにおける、手書きに推定された画素数と背景に推定された画素数とを比較し、当該比較の結果に応じて推定の誤りを補正できる。具体的には、手書きに推定された部分の画素数と、背景に推定された部分の画素数のうち、画素数の多い方の部分に統合されるように、そのストロークの位置における推定結果を補正できる。これにより、処理対象の手書き画像において、文字の一部が欠損されることや、余計な部分が付加されることを抑制できる。即ち、手書き文字領域の推定精度を向上させることができる。 According to the second embodiment as described above, after estimating a handwritten portion, the number of pixels estimated to be handwritten and the number of pixels estimated to be background in the strokes in the processing target area are compared, and the result of the comparison is Estimate errors can be corrected accordingly. Specifically, the estimation result at the position of the stroke is combined with the number of pixels estimated for the handwriting and the number of pixels estimated for the background, whichever has the larger number of pixels. can be corrected. As a result, it is possible to prevent a part of the character from being lost or an extra part from being added to the handwritten image to be processed. That is, it is possible to improve the estimation accuracy of the handwritten character area.

本実施形態の第１の変形例として、Ｓ９７５の処理をスキップしてもよい。この場合、画像処理サーバ１０３は、抽出された輪郭について、当該輪郭の外接矩形の形状によるふるい分けを行うことなく、輪郭内に手書きに推定された部分と背景に推定された部分との両方が存在する領域を、補正対象とする。 As a first modified example of this embodiment, the process of S975 may be skipped. In this case, the image processing server 103 does not screen the extracted contours according to the shape of the circumscribing rectangle of the contours. This area is to be corrected.

以上、本発明を実施形態と共に説明したが、上記実施形態は本発明を実施するにあたり具体化の例を示したものに過ぎず、これらによって本発明の技術的範囲が限定的に解釈されてはならないものである。すなわち、本発明はその技術思想、又はその主要な特徴から逸脱することなく、様々な形で実施することができる。 As described above, the present invention has been described together with the embodiments, but the above-described embodiments merely show specific examples for implementing the present invention, and the technical scope of the present invention should not be construed in a limited manner. It should not be. That is, the present invention can be embodied in various forms without departing from its technical concept or main features.

（その他の実施形態）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other embodiments)
The present invention supplies a program that implements one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in the computer of the system or apparatus reads and executes the program. It can also be realized by processing to It can also be implemented by a circuit (for example, ASIC) that implements one or more functions.

本発明は、複数の機器から構成されるシステムに適用しても、１つの機器からなる装置に適用してもよい。例えば、画像処理装置１０１と画像処理サーバ１０３とを別体の装置として説明したが、画像処理サーバ１０３が画像処理装置１０１の機能を具備してもよい。また、画像処理サーバ１０３と活字ＯＣＲサーバ１０４と手書きＯＣＲサーバ１０５とをそれぞれ別体の装置として説明したが、画像処理サーバ１０３が活字ＯＣＲサーバ１０４と手書きＯＣＲサーバ１０５としての機能を具備してもよい。また、活字ＯＣＲサーバ１０４と手書きＯＣＲサーバ１０５とを別体の装置として説明したが、活字ＯＣＲサーバ１０４と手書きＯＣＲサーバ１０５とが一体的に構成されていてもよい。また、画像処理サーバ１０３と学習装置１０２とを別体の装置として説明したが、画像処理サーバ１０３が学習装置１０２としての機能を具備してもよい。 The present invention may be applied to a system composed of a plurality of devices or to an apparatus composed of one device. For example, although the image processing apparatus 101 and the image processing server 103 have been described as separate apparatuses, the image processing server 103 may have the functions of the image processing apparatus 101 . Further, although the image processing server 103, the printed character OCR server 104, and the handwritten OCR server 105 have been described as separate devices, the image processing server 103 may have the functions of the printed character OCR server 104 and the handwritten OCR server 105. good. Further, although the printed character OCR server 104 and the handwritten OCR server 105 have been described as separate devices, the printed character OCR server 104 and the handwritten OCR server 105 may be configured integrally. Also, although the image processing server 103 and the learning device 102 have been described as separate devices, the image processing server 103 may have the function of the learning device 102 .

１００：画像処理システム、１０１：画像処理装置、１０２：学習装置、１０３：画像処理サーバ、１０４：活字ＯＣＲサーバ、１０５：手書きＯＣＲサーバ 100: Image processing system, 101: Image processing device, 102: Learning device, 103: Image processing server, 104: Type OCR server, 105: Handwriting OCR server

Claims

Acquisition means for acquiring a read image of a document including handwriting;
a first extraction means for extracting a processing target area in the read image;
estimating means for estimating a handwritten portion and a background portion in the read image;
a second extraction means for extracting a contour existing in the processing target area in the read image;
control means for controlling correction of the estimation result by the estimation means based on the coordinate position of the extracted contour, the estimated coordinate position of the handwritten portion, and the estimated coordinate position of the background portion;
An image processing device comprising:

When the circumscribing rectangle of the extracted contour satisfies a predetermined condition and the contour region includes the handwritten portion and the background portion, the control means controls the contour region. 2. The image processing apparatus according to claim 1, wherein the estimation result is corrected so that the entire inside becomes the background portion.

2. The control means corrects the estimation result based on the number of pixels in the handwritten portion and the number of pixels in the background portion among the pixels in the contour area. The image processing device according to .

When the number of pixels in the handwritten portion is greater than the number of pixels in the background portion, the control means corrects the estimation result so that the entire contour area becomes the handwritten portion. 4. The image processing apparatus according to claim 3.

When the number of pixels in the handwritten portion is smaller than the number of pixels in the background portion, the control means corrects the estimation result so that the entire contour area becomes the background portion. 5. The image processing apparatus according to claim 3 or 4.

The control means controls to correct the estimation result when the circumscribing rectangle of the extracted contour satisfies a predetermined condition, and controls the estimation result when the circumscribing rectangle does not satisfy the predetermined condition. 6. The image processing apparatus according to any one of claims 3 to 5, wherein control is performed so as not to correct .

7. The image processing apparatus according to claim 2, wherein said predetermined condition is a condition indicating a circumscribed rectangle of an outline of a predetermined type.

8. The image processing apparatus according to claim 7, wherein said predetermined characters are brackets or hyphens.

The control means corrects the estimation result when the handwritten portion does not exist around the outside of the contour even when the handwritten portion and the background portion are included in the area of the contour. 3. The image processing apparatus according to claim 2, wherein control is performed so as not to.

wherein, when a plurality of contours are extracted from the processing target area, the control means sequentially determines whether or not to correct the estimation result for the plurality of extracted contours. Item 10. The image processing apparatus according to any one of Items 1 to 9.

11. The image processing apparatus according to any one of claims 1 to 10, wherein a handwritten image generated based on the estimation result corrected by the control means is subjected to OCR corresponding to handwritten characters. .

12. The image processing apparatus according to any one of claims 1 to 11, wherein a background image generated based on the estimation result corrected by the control means is subjected to OCR corresponding to printed characters.

13. The image processing apparatus according to any one of claims 1 to 12, wherein the estimating means estimates pixels of the handwritten portion in the read image by inputting the read image into a neural network. .

An image obtained by synthesizing a first read image obtained by reading a document containing only handwriting and a second read image obtained by reading a document containing only objects other than handwriting is used as an input image. 14. The image processing apparatus according to claim 13, further comprising learning means for learning said neural network using learning data in which pixels are correct data.

generating the input image by synthesizing the images after performing predetermined image processing on at least one of the first read image and the second read image; 15. The image processing apparatus according to claim 14, characterized by:

16. The image processing apparatus according to claim 15, wherein the predetermined image processing includes at least one of rotation, scaling, brightness change, and image clipping.

17. The image processing apparatus according to any one of claims 1 to 16, wherein the document is a form.

An image processing system including an image generation device that generates a read image of a document including handwriting, an image processing device, and an OCR device,
The image processing device is
acquisition means for acquiring the read image from the image generation device;
a first extraction means for extracting a processing target area in the read image;
estimating means for estimating a handwritten portion and a background portion in the read image;
a second extraction means for extracting a contour existing in the processing target area in the read image;
control means for controlling correction of the estimation result by the estimation means based on the coordinate position of the extracted contour, the estimated coordinate position of the handwritten portion, and the estimated coordinate position of the background portion;
A handwritten image generated based on the estimation result corrected by the control means is subjected to handwritten OCR, and a background image generated based on the estimation result corrected by the control means is subjected to printed character OCR. a transmitting means for transmitting to the device;
has
The OCR device is
receiving means for receiving the handwritten OCR target and the printed character OCR target from the image processing apparatus;
a processing means for performing OCR corresponding to handwritten characters on the handwritten OCR target, and performing OCR corresponding to printed characters on the printed character OCR target;
An image processing system comprising:

an acquisition step of acquiring a read image of a document including handwriting;
a first extraction step of extracting a processing target area in the read image;
an estimation step of estimating a handwritten portion and a background portion in the read image;
a second extraction step of extracting a contour existing in the processing target area in the read image;
a control step of controlling to correct the estimation result obtained by the estimation step based on the coordinate position of the extracted contour, the estimated coordinate position of the handwritten part, and the estimated coordinate position of the background part;
An image processing method comprising:

A program for causing a computer to function as each means of the image processing apparatus according to any one of claims 1 to 17.