JP2007102674A

JP2007102674A - Character image extraction device, image file generation device, character image extraction method and computer program

Info

Publication number: JP2007102674A
Application number: JP2005294758A
Authority: JP
Inventors: Yoshiaki Hirooka; 義昭廣岡
Original assignee: Konica Minolta Business Technologies Inc
Current assignee: Konica Minolta Business Technologies Inc
Priority date: 2005-10-07
Filing date: 2005-10-07
Publication date: 2007-04-19

Abstract

<P>PROBLEM TO BE SOLVED: To successfully extract an image only with a character part from a scan image even when the scan image is blurred. <P>SOLUTION: An image forming apparatus 1 is provided with a binary image generation part 172 which converts a character area image GM of a character area which is an area in which character images are aligned in an original image GA0 into a binary image GN on the basis of a first threshold, a character recognition processing part 173 which performs character recognition processing to the binary image GN, a character recognition ratio calculation part 174 which calculates character recognition ratio Rc of characters represented by the binary image GN, a character background image output part 179 which outputs the binary image GN as an extraction result of the character image of the original image GA0 when the character recognition ratio Rc of the binary image GN exceeds a second threshold and a reexecution instruction part 176 which controls the binary image generation part 172 so as to change the first threshold to reexecute conversion of the binary image GN when the character recognition ratio Rc of the binary image GN is less than the second threshold. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、スキャナなどで読み取った原稿の画像から文字部分の画像を抽出する装置および方法などに関する。 The present invention relates to an apparatus and method for extracting an image of a character part from an image of a document read by a scanner or the like.

紙の原稿に描かれた画像を読み取って電子データ化するスキャナが広く普及している。スキャナは、パーソナルコンピュータおよびワークステーションなどの周辺機器として使用されるだけでなく、ファックス端末、複写機、およびＭＦＰ（Multi Function Peripherals）などの装置にも内蔵され使用される。 Scanners that read an image drawn on a paper document and convert it into electronic data are widely used. The scanner is not only used as a peripheral device such as a personal computer and a workstation, but also incorporated in a device such as a fax terminal, a copying machine, and an MFP (Multi Function Peripherals).

原稿には、文字を表す画像（以下、「文字画像」と記載する。）が含まれていることがある。文字画像を含む原稿の電子データを生成する場合は、その文字画像の文字が鮮明に再現されるようにする必要がある。そこで、特許文献１に記載されるような方法が提案されている。 An original may include an image representing characters (hereinafter referred to as “character image”). When generating electronic data of a document including a character image, it is necessary to reproduce the characters of the character image clearly. Therefore, a method as described in Patent Document 1 has been proposed.

特許文献１に記載される方法によると、送信用の画像データ内の文字を文字認識手段により認識し、その文字認識率と予め設定されている設定値とを比較し、文字認識率が設定値を下回った場合は、送信解像度を既定値よりも上げる。これにより、送信用の原稿に文字として認識しにくい薄い文字や小さい文字が記載されているような場合に、自動的に送信用画像データの解像度が変更され、好適な画像を送信することができる。 According to the method described in Patent Document 1, characters in transmission image data are recognized by character recognition means, the character recognition rate is compared with a preset setting value, and the character recognition rate is set to a set value. If it falls below, the transmission resolution is raised from the default value. As a result, when thin characters or small characters that are difficult to recognize as characters are written on the document for transmission, the resolution of the image data for transmission is automatically changed, and a suitable image can be transmitted. .

また、読み取った原稿の画像のファイルのサイズを小さくするために、原稿の全体画像を文字画像の領域および写真の領域などに分割し、それぞれの領域に適した圧縮方式で画像データを圧縮する、という方法が用いられている。係る方法を採用した代表的なファイル形式の１つが、アドビシステムズ社が開発したＰＤＦ（Portable Document Format）である（非特許文献１参照）。 In addition, in order to reduce the size of the image file of the read original, the entire image of the original is divided into a character image area and a photo area, and the image data is compressed by a compression method suitable for each area. The method is used. One typical file format that employs such a method is PDF (Portable Document Format) developed by Adobe Systems (see Non-Patent Document 1).

そのほか、特許文献２、３に記載されるような、原稿の画像の中から文字認識処理の対象の文字画像の位置を自動的に判別するための方法が提案されている。
特開２００１−１０３３１１号公報特開平５−２５２２８８号公報特開平７−１６８９１１号公報 ”ＡｄｏｂｅＰＤＦってなに？−ＰＤＦ初級編”、アドビシステムズ社、平成１７年５月１４日検索、インターネット＜ＵＲＬ：http://www.adobe.co.jp/products/acrobat/adobepdf13.html＞ In addition, there are proposed methods for automatically determining the position of a character image to be subjected to character recognition processing from images of a document as described in Patent Documents 2 and 3.
JP 2001-103311 A JP-A-5-252288 JP-A-7-168911 "What is Adobe PDF?-PDF beginner's edition", Adobe Systems, May 14, 2005, Internet <URL: http://www.adobe.co.jp/products/acrobat/adobepdf13.html >

ところで、本、雑誌、およびパンフレットなどの書類には、文字の背景に色または模様が装飾された文字画像がしばしば見られる。このような文字画像を含む書類を原稿としてスキャンを行うと、文字とその背景とのコントラストが小さいために、鮮明な文字画像が得られない場合がある。また、原稿の下地に色または模様が付いている場合は、下地除去処理が文字画像に対して過度に施されてしまい、文字の線が細くなり、鮮明な文字画像が得られない場合がある。 By the way, in documents such as books, magazines, and brochures, a character image in which a color or a pattern is decorated on the background of the character is often seen. When a document including such a character image is scanned as a document, a clear character image may not be obtained because the contrast between the character and its background is small. In addition, if the background of the document has a color or pattern, the background removal process may be performed excessively on the character image, resulting in thin character lines and a clear character image may not be obtained. .

原稿の画像を電子データ化するにあたっては、文字画像に文字が鮮明に表れるようにすることが求められる。また、原稿の画像の画像データをＰＤＦなどの形式のファイルに変換する場合は、文字画像の中から背景部分を除いた文字部分だけの画像を抽出する必要があるが、スキャンによって得られた文字画像が不鮮明であれば、上手く抽出することができない。しかし、特許文献１に記載されるような従来の方法では、不鮮明な文字画像から白黒の文字画像を生成するもできない。また、スキャン条件を変更して原稿の再スキャンを行うことは、時間と手間が掛かってユーザにとって面倒であるし、再スキャンが上手く行くとは限らない。 When converting an image of a document into electronic data, it is required that characters appear clearly in the character image. In addition, when converting image data of a document image into a file of a format such as PDF, it is necessary to extract an image of only the character portion excluding the background portion from the character image. If the image is unclear, it cannot be extracted successfully. However, the conventional method described in Patent Document 1 cannot generate a black and white character image from an unclear character image. In addition, changing the scanning conditions and rescanning the document is time consuming and troublesome for the user, and the rescanning is not always successful.

本発明は、このような問題点に鑑み、スキャナなどによって得られた文字画像が不鮮明であっても、その文字画像から文字部分だけの画像を上手く抽出できるようにすることを目的とする。 In view of such problems, an object of the present invention is to enable an image of only a character portion to be successfully extracted from a character image even if the character image obtained by a scanner or the like is unclear.

本発明に係る文字画像抽出装置は、画像入力装置によって入力された入力画像の中から文字画像を抽出する文字画像抽出装置であって、前記入力画像のうちの、１つまたは複数の文字画像が並んでいる領域である文字領域の画像を、第一の閾値を基準に二値画像に変換する変換処理を実行する、二値画像変換手段と、変換された前記文字領域の二値画像に対して文字認識処理を実行する文字認識処理手段と、前記文字領域の二値画像が表す文字のうちの前記文字認識処理によって認識できた文字の割合である文字認識率を算出する、文字認識率算出手段と、前記文字領域の二値画像の前記文字認識率が第二の閾値を上回る場合に、当該二値画像を前記入力画像の文字画像の抽出結果として出力する、出力手段と、前記文字領域の二値画像の前記文字認識率が前記第二の閾値を下回る場合に、前記第一の閾値を変更して当該文字領域について前記変換処理を実行し直すように前記二値画像変換手段を制御する、再変換制御手段と、を有することを特徴とする。 A character image extraction device according to the present invention is a character image extraction device that extracts a character image from an input image input by an image input device, wherein one or a plurality of character images of the input images are extracted. A binary image conversion unit that executes a conversion process for converting an image of a character area that is a lined area into a binary image based on a first threshold value, and the converted binary image of the character area Character recognition processing means for executing character recognition processing and calculating a character recognition rate, which is a ratio of characters recognized by the character recognition processing among characters represented by the binary image of the character region Means, and when the character recognition rate of the binary image of the character area exceeds a second threshold, the output means for outputting the binary image as a character image extraction result of the input image, and the character area Before binary image A reconversion control unit that controls the binary image conversion unit to change the first threshold and re-execute the conversion process for the character region when the character recognition rate is lower than the second threshold. It is characterized by having.

好ましくは、前記再変換制御手段は、前記入力画像のうちの前記文字領域の画像の下地濃度が第三の閾値を上回る場合は、前記第一の閾値を上げて前記変換処理を実行し直すように前記二値画像変換手段を制御し、当該下地濃度が当該第三の閾値を下回る場合は、前記第一の閾値を下げて前記変換処理を実行し直すように前記二値画像変換手段を制御する。 Preferably, the re-conversion control unit increases the first threshold value and re-executes the conversion process when the background density of the image of the character region in the input image exceeds a third threshold value. If the background density is lower than the third threshold value, the binary image conversion means is controlled so that the first threshold value is lowered and the conversion process is executed again. To do.

「文字認識」とは、一般に、画像中の文字を認識しアスキーコードなどの文字コードに変換することを言うが、本発明では、画像が表している文字を１つに特定できる必要はなく、幾つかの候補に絞り込めればよい。例えば、「乙」という文字を表す画像の文字認識を行った結果、「乙」、「２」、または「Ｚ」のいずれかであることが分かれば、その画像の文字認識ができたものとする。 “Character recognition” generally means that a character in an image is recognized and converted into a character code such as an ASCII code. However, in the present invention, it is not necessary to be able to specify one character represented by an image, You can narrow down to several candidates. For example, as a result of character recognition of an image representing the character “B”, if it is found that it is any of “B”, “2”, or “Z”, the character recognition of the image has been completed. To do.

また、本発明では、「文字」とは、漢字、平仮名、片仮名、およびアルファベットなどのほか、数字および記号なども含まれるものとする。 In the present invention, the “character” includes not only kanji, hiragana, katakana, and alphabet, but also numbers and symbols.

本発明によると、スキャナなどによって得られた文字画像が不鮮明であっても、その文字画像から文字部分だけの画像を上手く抽出することができる。 According to the present invention, even if a character image obtained by a scanner or the like is unclear, an image of only the character portion can be successfully extracted from the character image.

図１は本発明に係る画像形成装置１を有するネットワーク構成の例を示す図、図２は画像形成装置１のハードウェア構成の例を示す図、図３は画像形成装置１の機能的構成の例を示す図である。 1 is a diagram showing an example of a network configuration having an image forming apparatus 1 according to the present invention, FIG. 2 is a diagram showing an example of a hardware configuration of the image forming apparatus 1, and FIG. 3 is a functional configuration of the image forming apparatus 1. It is a figure which shows an example.

図１に示す画像形成装置１は、コピー、スキャナ、ファックス、ネットワークプリンティング、ドキュメントサーバ、およびファイル転送などの様々な機能を集約した画像処理装置である。複合機またはＭＦＰ（Multi Function Peripherals）などと呼ばれる装置こともある。 An image forming apparatus 1 shown in FIG. 1 is an image processing apparatus that integrates various functions such as copying, scanning, faxing, network printing, document server, and file transfer. There is also a device called a multifunction device or MFP (Multi Function Peripherals).

この画像形成装置１は、役所または企業などのオフィス、学校または図書館などの公共施設、コンビニエンスストアなどの店舗、その他種々の場所に設置され、複数のユーザによって共用することができる。また、パーソナルコンピュータまたはワークステーションなどの端末装置２と通信回線３を介して接続可能である。通信回線３として、インターネット、ＬＡＮ、公衆回線、または専用線などが用いられる。 The image forming apparatus 1 is installed in offices or offices such as businesses, public facilities such as schools or libraries, stores such as convenience stores, and other various places and can be shared by a plurality of users. Further, it can be connected to a terminal device 2 such as a personal computer or a workstation via a communication line 3. As the communication line 3, the Internet, a LAN, a public line, a dedicated line, or the like is used.

画像形成装置１は、図２に示すように、ＣＰＵ１０ａ、ＲＡＭ１０ｂ、ＲＯＭ１０ｃ、ハードディスク１０ｄ、制御用回路１０ｅ、操作パネル１０ｆ、スキャナ１０ｇ、印刷装置（エンジン）１０ｈ、モデム１０ｊ、およびネットワークインタフェース１０ｋなどによって構成される。 As shown in FIG. 2, the image forming apparatus 1 includes a CPU 10a, a RAM 10b, a ROM 10c, a hard disk 10d, a control circuit 10e, an operation panel 10f, a scanner 10g, a printing device (engine) 10h, a modem 10j, a network interface 10k, and the like. Composed.

スキャナ１０ｇは、原稿の用紙（以下、単に「原稿」と記載する。）に描かれている写真、文字、絵、図表などの画像を光学的に読み取って電子データ化する装置である。読み取られた画像のデータはＲＡＭ１０ｂに記憶され、後に説明するように、様々な処理が施されてファイル化される。 The scanner 10g is an apparatus that optically reads an image such as a photograph, a character, a picture, or a chart drawn on a document sheet (hereinafter simply referred to as “document”) and converts it into electronic data. The read image data is stored in the RAM 10b, and is subjected to various processes and filed as described later.

印刷装置１０ｈは、スキャナ１０ｇで読み取られた画像または端末装置２などから送信されてきた画像を用紙に印刷する装置である。カラーＭＦＰである場合は、イエロー、マゼンタ、シアン、およびブラックの４色のトナーを使用して画像を印刷する。 The printing apparatus 10h is an apparatus that prints an image read by the scanner 10g or an image transmitted from the terminal device 2 or the like on a sheet. In the case of a color MFP, an image is printed using toners of four colors of yellow, magenta, cyan, and black.

操作パネル１０ｆは、操作部および表示部によって構成される。操作部としてテンキーなどが用いられ、表示部として液晶ディスプレイなどが用いられる。ユーザは、操作部を操作することによって、画像形成装置１に対して、処理の実行開始または中断などの指令を与え、データの宛先、スキャン条件、または画像ファイルフォーマットなどの処理条件を指定し、その他種々の事項を指定することができる。表示部には、ユーザに対してメッセージまたは指示を与えるための画面、ユーザが所望する処理の種類および処理条件を入力するための画面、および画像形成装置１で実行された処理の結果を示す画面などが表示される。操作パネル１０ｆとしてタッチパネルを用いた場合は、タッチパネルが操作部および表示部の両方を兼ねる。このように、操作パネル１０ｆは、画像形成装置１を操作するユーザのためのユーザインタフェースの役割を果たしている。なお、端末装置２には、画像形成装置１に対して指令を与えるためのアプリケーションプログラムおよびドライバがインストールされている。したがって、ユーザは、端末装置２によって画像形成装置１を遠隔的に操作することができる。 The operation panel 10f includes an operation unit and a display unit. A numeric keypad or the like is used as the operation unit, and a liquid crystal display or the like is used as the display unit. By operating the operation unit, the user gives an instruction to start or stop the processing to the image forming apparatus 1 and designates processing conditions such as a data destination, a scanning condition, or an image file format, Various other items can be specified. The display unit includes a screen for giving a message or an instruction to the user, a screen for inputting the type of processing and processing conditions desired by the user, and a screen showing the result of the processing executed by the image forming apparatus 1 Etc. are displayed. When a touch panel is used as the operation panel 10f, the touch panel serves as both an operation unit and a display unit. As described above, the operation panel 10 f serves as a user interface for a user who operates the image forming apparatus 1. Note that an application program and a driver for giving a command to the image forming apparatus 1 are installed in the terminal device 2. Therefore, the user can remotely operate the image forming apparatus 1 with the terminal device 2.

モデム１０ｊは、ＮＣＵ（Network Control Unit）が内蔵されており、アナログの公衆回線を介して他のファックス端末と接続し、ファクシミリプロトコルに基づくデータ制御およびファックスデータの変復調などを行う。また、インターネットプロバイダ（ＩＳＰ）のアクセスポイントを介してインターネットに接続し、端末装置２などと電子メールまたはＦＴＰ（File Transfer Protocol）によってファイルの送受信を行うこともできる。ネットワークインタフェース１０ｋは、ＮＩＣ（Network Interface Card）であって、ＬＡＮを介して画像形成装置１を端末装置２など接続し、ファイルの送受信などを行う。 The modem 10j has a built-in NCU (Network Control Unit), and is connected to another fax terminal via an analog public line to perform data control based on a facsimile protocol, fax data modulation / demodulation, and the like. It is also possible to connect to the Internet via an access point of an Internet provider (ISP) and send / receive a file to / from the terminal device 2 or the like by e-mail or FTP (File Transfer Protocol). The network interface 10k is a NIC (Network Interface Card), and connects the image forming apparatus 1 to the terminal device 2 or the like via a LAN, and transmits and receives files.

制御用回路１０ｅは、ハードディスク１０ｄ、操作パネル１０ｆ、スキャナ１０ｇ、印刷装置１０ｈ、モデム１０ｊ、およびネットワークインタフェース１０ｋなどの装置を制御するための回路である。 The control circuit 10e is a circuit for controlling devices such as the hard disk 10d, the operation panel 10f, the scanner 10g, the printing device 10h, the modem 10j, and the network interface 10k.

ハードディスク１０ｄには、図３に示すような前処理部１０１、明度算出部１０２、スムージング処理部１０３、領域検出部１０４、領域画像抽出部１０５、下地濃度検出部１０６、文字背景画像生成部１０７、画像圧縮処理部１０８、およびファイル生成部１０９などの機能を実現するためのプログラムおよびデータなどが格納されている。これらのプログラムは必要に応じてＲＡＭ１０ｂに読み出され、ＣＰＵ１０ａによってプログラムが実行される。これらのプログラムまたはデータの一部または全部を、ＲＯＭ１０ｃに記憶させておいてもよい。または、図３に示す機能の一部または全部を、制御用回路１０ｅによって実現するようにしてもよい。 The hard disk 10d includes a preprocessing unit 101, a brightness calculation unit 102, a smoothing processing unit 103, a region detection unit 104, a region image extraction unit 105, a background density detection unit 106, a character background image generation unit 107, as shown in FIG. A program, data, and the like for realizing functions such as the image compression processing unit 108 and the file generation unit 109 are stored. These programs are read into the RAM 10b as necessary, and the programs are executed by the CPU 10a. Some or all of these programs or data may be stored in the ROM 10c. Alternatively, part or all of the functions shown in FIG. 3 may be realized by the control circuit 10e.

ＣＰＵ１０ａは、その他、ユーザによる操作の内容（つまり、ユーザが押したキーまたはボタンなど）の検出、画像形成装置１を構成する各ハードウェアの制御、および電子メールのデータの生成など、画像形成装置１の基本的な処理を行う。これらの処理は、ハードディスク１０ｄに記憶されているオペレーティングシステムのプログラムに基づいて行う。 In addition, the CPU 10a detects the contents of the operation by the user (that is, the key or button pressed by the user), controls the hardware constituting the image forming apparatus 1, and generates e-mail data. 1 basic processing is performed. These processes are performed based on an operating system program stored in the hard disk 10d.

図４は原稿画像ＧＡ０の例を示す図、図５は不具合を有する文字画像の例を示す図である。 FIG. 4 is a diagram illustrating an example of a document image GA0, and FIG. 5 is a diagram illustrating an example of a character image having a defect.

次に、図３に示す画像形成装置１の各部の機能および処理内容などについて説明する。前処理部１０１は、スキャナ１０ｇによって入力された原稿の画像（以下、「原稿画像ＧＡ０」と記載する。）に対して、明度算出部１０２以降の処理に対応した解像度に変換する処理（解像度変換処理）を施し、さらに、原稿画像ＧＡ０の下地部分の色などを除去する処理（下地除去処理）を施す。「原稿画像ＧＡ０の下地部分」とは、コンテンツ（オブジェクト）が何もない領域のことを意味する。例えば、図４に示すような原稿画像ＧＡ０である場合は、「新・デジタルＭＦＰＸＹＺ−２００５シリーズ」という文字列の画像のある矩形領域、「デジタル機の価格差は…期待できます。」という文字列の画像のある矩形領域、およびＭＦＰの写真が中央に配置された矩形領域の３つの矩形領域以外の部分が、原稿画像ＧＡ０の下地部分である。 Next, functions and processing contents of each unit of the image forming apparatus 1 shown in FIG. 3 will be described. The preprocessing unit 101 converts a document image (hereinafter referred to as “document image GA0”) input by the scanner 10g into a resolution (resolution conversion) corresponding to the processing after the lightness calculation unit 102. In addition, a process for removing the color of the background portion of the document image GA0 (background removal process) is performed. The “background portion of the document image GA0” means an area where there is no content (object). For example, in the case of a manuscript image GA0 as shown in FIG. 4, a rectangular area with a character string image “New Digital MFP XYZ-2005 Series”, “The price difference of digital machines can be expected.” A portion other than the three rectangular regions of the rectangular region where the character string image is located and the rectangular region where the photograph of the MFP is arranged at the center is the background portion of the document image GA0.

原稿画像ＧＡ０には、文字の画像、写真の画像、挿絵の画像、またはグラフの画像など、様々な種類のオブジェクトの画像が含まれている。以下、文字の画像が含まれる領域を「文字領域ＲＭ」と記載し、文字以外のオブジェクトの画像が含まれる領域を「非文字領域ＲＨ」と記載する。例えば、図４の原稿画像ＧＡ０は、２つの文字領域ＲＭおよび１つの非文字領域ＲＨを有している。以下、解像度変換処理および下地除去処理が施された原稿画像ＧＡ０を「原稿画像ＧＡ０’」と記載する。 The document image GA0 includes images of various types of objects such as character images, photo images, illustration images, or graph images. Hereinafter, an area including a character image is referred to as a “character area RM”, and an area including an image of an object other than a character is referred to as a “non-character area RH”. For example, the document image GA0 of FIG. 4 has two character areas RM and one non-character area RH. Hereinafter, the document image GA0 subjected to the resolution conversion process and the background removal process is referred to as “document image GA0 ′”.

明度算出部１０２は、原稿画像ＧＡ０’の各画素の明度を算出することによって、明度画像ＧＡ１を生成する。スムージング処理部１０３は、明度画像ＧＡ１に対してスムージング処理を施すことによって、明度画像ＧＡ１のノイズの除去および輪郭の補正などを行う。以下、スムージング処理が施された明度画像ＧＡ１を「明度画像ＧＡ１’」と記載する。 The brightness calculation unit 102 generates the brightness image GA1 by calculating the brightness of each pixel of the document image GA0 '. The smoothing processing unit 103 performs a smoothing process on the lightness image GA1 to remove noise from the lightness image GA1, correct a contour, and the like. Hereinafter, the lightness image GA1 that has been subjected to the smoothing process is referred to as a “lightness image GA1 '”.

領域検出部１０４は、明度画像ＧＡ１’に対してラベリング処理を施すことによって、原稿画像ＧＡ０’に含まれる文字領域ＲＭおよび非文字領域ＲＨの位置を判別する。一般に、文字画像に対してラベリング処理を施すと不規則な細い線が多く出現し、写真などの画像に対してラベリング処理を施すと白または黒の大きな塊が出現する。このような特徴に基づいて、文字領域ＲＭおよび非文字領域ＲＨの位置を判別し、判別結果を示す領域位置データ８１を生成する。図４の場合は、文字領域ＲＭａ、ＲＭｂおよび非文字領域ＲＨａのそれぞれについて領域位置データ８１が生成される。 The area detection unit 104 determines the positions of the character area RM and the non-character area RH included in the document image GA0 'by performing a labeling process on the lightness image GA1'. In general, when a labeling process is performed on a character image, many irregular thin lines appear, and when a labeling process is performed on an image such as a photograph, a large white or black lump appears. Based on such characteristics, the positions of the character region RM and the non-character region RH are determined, and region position data 81 indicating the determination result is generated. In the case of FIG. 4, region position data 81 is generated for each of the character regions RMa, RMb and the non-character region RHa.

領域画像抽出部１０５は、領域検出部１０４による検出結果に基づいて、原稿画像ＧＡ０’の中から文字領域ＲＭの画像および非文字領域ＲＨのそれぞれの画像を抽出する。図４に示すような原稿画像ＧＡ０がスキャナ１０ｇによって入力された場合は、文字領域ＲＭａ、ＲＭｂおよび非文字領域ＲＨａの各画像を抽出する。以下、文字領域ＲＭの抽出された画像を「文字領域画像ＧＭ」と記載し、非文字領域ＲＨの抽出された画像を「非文字領域画像ＧＨ」と記載する。文字領域画像ＧＭには、１つまたは複数の文字画像が含まれている。例えば、文字領域ＲＭｂには、数字および句読点などの文字を含め、およそ１２０個の文字画像が含まれている。 The area image extraction unit 105 extracts the image of the character area RM and the image of the non-character area RH from the document image GA0 ′ based on the detection result by the area detection unit 104. When a document image GA0 as shown in FIG. 4 is input by the scanner 10g, each image of the character areas RMa and RMb and the non-character area RHa is extracted. Hereinafter, the extracted image of the character region RM is referred to as “character region image GM”, and the extracted image of the non-character region RH is referred to as “non-character region image GH”. The character area image GM includes one or more character images. For example, the character region RMb includes approximately 120 character images including characters such as numbers and punctuation marks.

文字背景画像生成部１０７は、文字領域画像ＧＭの圧縮率を高めるために、文字領域画像ＧＭを、文字だけの画像（つまり、背景を消去した画像）と背景だけの画像とに分離する。つまり、文字背景画像生成部１０７は、文字領域画像ＧＭから、文字だけの画像および背景だけの画像を生成する。 The character background image generation unit 107 separates the character region image GM into an image with only characters (that is, an image with the background deleted) and an image with only the background in order to increase the compression rate of the character region image GM. That is, the character background image generation unit 107 generates an image of only characters and an image of only the background from the character region image GM.

ところで、文字領域画像ＧＭに含まれる文字画像は、文字とその背景（文字の下地）とのコントラストが小さかったり過度な下地除去処理を実行したりすると、文字だけの画像を生成したときに、図５（ａ）のように文字が背景と分離されずに潰れてしまったり、図５（ｂ）のように文字がかすれてしまったりする。そこで、文字背景画像生成部１０７は、文字だけの画像を生成する際に、文字がより鮮明になるように必要に応じて補正（適正化）の処理を実行する。以下、文字背景画像生成部１０７について詳細に説明する。 By the way, if the character image included in the character region image GM has a low contrast between the character and its background (character background) or if excessive background removal processing is performed, an image of only the character is generated. The characters may be crushed without being separated from the background as shown in FIG. 5A, or the characters may be faded as shown in FIG. Therefore, the character background image generation unit 107 executes correction (optimization) processing as necessary so that the characters become clearer when generating an image of only characters. Hereinafter, the character background image generation unit 107 will be described in detail.

図６は文字背景画像生成部１０７の構成の例を示す図、図７は文字領域ＲＭｂに含まれる文字画像のグループ化の例を示す図、図８は濃度閾値テーブルＴＬ１の例を示す図、図９は不具合のある二値画像ＧＮおよび良好な二値画像ＧＮの例を示す図である。 6 is a diagram illustrating an example of the configuration of the character background image generation unit 107, FIG. 7 is a diagram illustrating an example of grouping of character images included in the character region RMb, and FIG. 8 is a diagram illustrating an example of the density threshold table TL1. FIG. 9 is a diagram showing an example of a defective binary image GN and a good binary image GN.

文字背景画像生成部１０７は、図６に示すように、文字領域画像分割部１７０、文字下地濃度検出部１７１、二値画像生成部１７２、文字認識処理部１７３、文字認識率算出部１７４、二値画像良否判別部１７５、再実行指令部１７６、文字画像統合部１７７、背景画像生成部１７８、文字背景画像出力部１７９、および濃度閾値テーブルＴＬ１などによって構成される。 As shown in FIG. 6, the character background image generation unit 107 includes a character region image division unit 170, a character background density detection unit 171, a binary image generation unit 172, a character recognition processing unit 173, a character recognition rate calculation unit 174, two A value image pass / fail judgment unit 175, a re-execution command unit 176, a character image integration unit 177, a background image generation unit 178, a character background image output unit 179, a density threshold value table TL1, and the like.

文字領域画像分割部１７０は、文字領域画像ＧＭに多数の文字の文字画像が含まれている場合に、左右方向または上下方向に並ぶ複数個の文字画像をグループ化する。例えば、１〜１０個程度の文字画像をグループ化する。英文などの場合は、単語ごと（つまり、スペースとスペースとの間の）文字画像をグループ化してもよい。横書きの場合は左右方向に並ぶ文字画像をグループ化し、縦書きの場合は上下方向に並ぶ文字画像をグループ化する。横書きであるか縦書きであるかは、文字画像の並び方の特徴に基づいて判別することができる。このようにグループ化することによって、１つまたは複数の文字画像からなる複数のブロックが生成される。以下、生成されたブロックの画像を「ブロック画像ＧＲ」と記載する。例えば、文字領域ＲＭｂ（図４参照）の文字領域画像ＧＭを、図７に示すように複数のブロック画像ＧＲ（ＧＲ１、ＧＲ２、…）に分割する。また、各ブロック画像ＧＲについて、文字領域ＲＭにおけるその位置を示す文字列位置データ８２を生成しておく。 The character area image dividing unit 170 groups a plurality of character images arranged in the horizontal direction or the vertical direction when the character area image GM includes character images of a large number of characters. For example, about 1 to 10 character images are grouped. In the case of English sentences, character images may be grouped for each word (that is, between spaces). For horizontal writing, character images arranged in the horizontal direction are grouped, and for vertical writing, character images arranged in the vertical direction are grouped. Whether it is horizontal writing or vertical writing can be determined based on the characteristics of the arrangement of the character images. By grouping in this way, a plurality of blocks made up of one or more character images are generated. Hereinafter, the generated image of the block is referred to as “block image GR”. For example, the character area image GM of the character area RMb (see FIG. 4) is divided into a plurality of block images GR (GR1, GR2,...) As shown in FIG. For each block image GR, character string position data 82 indicating the position in the character region RM is generated.

文字下地濃度検出部１７１は、ブロック画像ＧＲの下地すなわち文字の背景部分の濃度を、例えば次のように検出する。通常、文字の部分はブロック画像ＧＲの中央に位置するので、ブロック画像ＧＲの縁（上下および左右の端）の部分は文字の背景（下地）であることが多い。そこで、ブロック画像ＧＲの縁の付近の画素を調べることによって、ブロック画像ＧＲの背景部分の濃度を検出する。以下、検出された濃度を「文字下地濃度Ｈｄ」と記載する。 The character background density detector 171 detects the density of the background of the block image GR, that is, the density of the background portion of the character, for example, as follows. Usually, since the character portion is located at the center of the block image GR, the edges (upper and lower and left and right ends) of the block image GR are often the background (background) of the character. Therefore, the density of the background portion of the block image GR is detected by examining the pixels near the edge of the block image GR. Hereinafter, the detected density is referred to as “character base density Hd”.

二値画像生成部１７２は、文字領域画像分割部１７０によって得られたそれぞれのブロック画像ＧＲの二値画像ＧＮを生成する。本実施形態では、ブロック画像ＧＲの二値画像ＧＮを次のように生成する。 The binary image generation unit 172 generates a binary image GN of each block image GR obtained by the character area image division unit 170. In the present embodiment, the binary image GN of the block image GR is generated as follows.

ブロック画像ＧＲの各画素の濃度を算出する。本実施形態では、濃度は８ビットで表されるものとする。算出した濃度が、濃度閾値β以上であればその画素の値を「１」とし、濃度閾値β未満であればその画素の値を「０」とする。このように各画素の値を「０」または「１」のいずれかに変換することによって、二値画像ＧＮが生成される。なお、濃度閾値βは、図８に示す濃度閾値テーブルＴＬ１によって定義されている。図８の例では、初期設定値は「１２８」である。 The density of each pixel of the block image GR is calculated. In the present embodiment, the density is represented by 8 bits. If the calculated density is equal to or higher than the density threshold β, the value of the pixel is “1”, and if the calculated density is less than the density threshold β, the value of the pixel is “0”. Thus, by converting the value of each pixel into either “0” or “1”, a binary image GN is generated. The density threshold β is defined by the density threshold table TL1 shown in FIG. In the example of FIG. 8, the initial setting value is “128”.

文字認識処理部１７３は、各ブロック画像ＧＲの二値画像ＧＮに対して文字認識処理を実行し、その二値画像ＧＮに含まれる文字画像が表す文字を判別する。このとき、含まれる文字画像の個数も計数しておく。 The character recognition processing unit 173 performs character recognition processing on the binary image GN of each block image GR, and determines the character represented by the character image included in the binary image GN. At this time, the number of character images included is also counted.

文字認識率算出部１７４は、二値画像ＧＮに含まれる文字画像のうち文字認識処理部１７３によって文字を判別（認識）できた文字画像の個数を計数する。そして、文字画像の総数に対する、認識できた文字画像数の割合、すなわち、文字認識率Ｒｃを算出する。例えば、「デジタル機の価格差は」という１０文字を表す二値画像ＧＮに対して文字認識処理を実行した結果、９文字を認識できた場合は、Ｒｃ＝９０％、となる。 The character recognition rate calculation unit 174 counts the number of character images in which characters can be identified (recognized) by the character recognition processing unit 173 among the character images included in the binary image GN. Then, a ratio of the number of recognized character images to the total number of character images, that is, a character recognition rate Rc is calculated. For example, when 9 characters are recognized as a result of executing the character recognition process on the binary image GN representing 10 characters “the price difference of the digital machine”, Rc = 90%.

二値画像良否判別部１７５は、二値画像ＧＮの文字認識率Ｒｃが認識閾値α以上である場合は、その二値画像ＧＮは文字を鮮明に表しており良好である、と判別する。認識閾値α未満である場合は、その二値画像ＧＮは文字を鮮明に表しておらず不良である、と判別する。二値画像良否判別部１７５によると、二値画像ＧＮは、例えば図９（ａ）のように濃く潰れた文字画像の割合が多い場合または図９（ｂ）のように薄くかすれた文字画像の割合が多い場合に、不良であると判別される。図９（ｃ）のように、文字が鮮明に表れている文字画像の割合が多い場合は、良好であると判別される。なお、図９（ａ）〜（ｃ）に示す「○」はその文字画像の表す文字が文字認識処理部１７３によって認識されたことを意味し、「×」は認識できなかったことを意味する。 When the character recognition rate Rc of the binary image GN is equal to or higher than the recognition threshold value α, the binary image quality determination unit 175 determines that the binary image GN clearly shows characters and is good. When it is less than the recognition threshold value α, it is determined that the binary image GN is defective because it does not clearly represent characters. According to the binary image pass / fail determination unit 175, the binary image GN is a character image that is thinly faded as shown in FIG. When the ratio is large, it is determined that it is defective. As shown in FIG. 9C, when the ratio of character images in which characters appear clearly is large, it is determined that the characters are good. 9A to 9C, “◯” means that the character represented by the character image has been recognized by the character recognition processing unit 173, and “X” means that the character has not been recognized. .

再実行指令部１７６は、不良と判別された二値画像ＧＮを生成し直すように二値画像生成部１７２に対して指令する。すると、二値画像生成部１７２は、その二値画像ＧＮのデータを破棄し、もう一度、二値画像ＧＮを生成する。 The re-execution command unit 176 instructs the binary image generation unit 172 to regenerate the binary image GN determined to be defective. Then, the binary image generation unit 172 discards the data of the binary image GN and generates the binary image GN again.

ただし、その二値画像ＧＮの元のブロック画像ＧＲの文字下地濃度Ｈｄが濃薄境界閾値γ以上である場合は、前回は、その二値画像ＧＮの中の背景部分が文字の一部であると誤検出されて文字潰れを生じてしまったと考えられるので、濃度閾値βを上げて再生成する。本実施形態では、図８に示すように、５段階のレベルの濃度閾値βが濃度閾値テーブルＴＬ１に予め定義されている。そして、二値画像ＧＮを生成し直すときは、前回そのブロック画像ＧＲの二値画像ＧＮを生成したときのレベルよりも１段上のレベルに示される値を採用することによって濃度閾値βを上げる。例えば、前回、レベル３の濃度閾値βで二値画像ＧＮを生成したのであれば、レベル４の濃度閾値βに基づいて二値画像ＧＮを生成し直す。 However, if the character background density Hd of the original block image GR of the binary image GN is greater than or equal to the dark boundary threshold γ, the background portion in the binary image GN is part of the character last time. It is considered that the character is crushed due to being erroneously detected, so the density threshold value β is raised and regenerated. In the present embodiment, as shown in FIG. 8, five levels of density threshold value β are defined in advance in the density threshold value table TL1. When the binary image GN is generated again, the density threshold value β is increased by adopting a value indicated by a level one level higher than the level when the binary image GN of the block image GR was previously generated. . For example, if the binary image GN was previously generated with the level 3 density threshold β, the binary image GN is regenerated based on the level 4 density threshold β.

一方、文字下地濃度Ｈｄが濃薄境界閾値γ未満である場合は、前回は、その二値画像ＧＮの中の文字部分がかすれていたと考えられるので、前回よりも１つ下のレベルの濃度閾値βに下げて二値画像ＧＮを再生成する。 On the other hand, when the character background density Hd is less than the dark boundary threshold γ, it is considered that the character portion in the binary image GN was faint in the previous time, so the density threshold value one level lower than the previous time. The value is lowered to β to regenerate the binary image GN.

再生成された二値画像ＧＮについては、文字認識処理部１７３、文字認識率算出部１７４、および二値画像良否判別部１７５によって、再び文字認識処理が施され、文字認識率Ｒｃが算出され、そして良否が判別される。その結果、不良であると判別された場合は、さらにもう１段階レベルを上げ下げして、二値画像ＧＮの再生成を行う。以下、良好であると判別されるまで、これを繰り返す。ただし、これ以上、濃度閾値βを上げることができない場合または下げることができない場合は、最後に生成した二値画像ＧＮを良好なものとみなして、次に説明する文字画像統合部１７７以降の処理のために使用する。 The regenerated binary image GN is subjected to character recognition processing again by the character recognition processing unit 173, the character recognition rate calculation unit 174, and the binary image pass / fail determination unit 175, and the character recognition rate Rc is calculated. And pass / fail is discriminated. As a result, when it is determined that it is defective, the binary image GN is regenerated by raising and lowering the level by another level. Hereinafter, this is repeated until it is determined to be good. However, if the density threshold value β cannot be increased or decreased any more, the last generated binary image GN is regarded as good, and the processing after the character image integration unit 177 described below is performed. Use for.

文字画像統合部１７７は、二値画像生成部１７２によって生成された、文字領域画像ＧＭの分割画像である各ブロック画像ＧＲの良好な二値画像ＧＮを、それぞれの文字列位置データ８２に従って並べ、１つの画像に統合する。この統合された画像は、文字領域画像ＧＭから背景部分を取り除き文字部分のみを抽出したものである。そこで、以下、これらの二値画像ＧＮを統合することによって生成された画像を「文字抽出画像ＧＴ」と記載する。 The character image integration unit 177 arranges the good binary images GN of the respective block images GR, which are the divided images of the character region image GM, generated by the binary image generation unit 172, according to the respective character string position data 82, Merge into one image. This integrated image is obtained by removing the background portion from the character region image GM and extracting only the character portion. Therefore, hereinafter, an image generated by integrating these binary images GN will be referred to as a “character extraction image GT”.

背景画像生成部１７８は、文字領域ＲＭの背景部分だけの画像である背景画像ＧＢを生成する。つまり、文字領域画像ＧＭと同じ形状を有しかつ文字領域画像ＧＭの背景部分と同じ色または模様を全体に有する画像を生成する。色または模様は、文字領域画像ＧＭの縁部付近の画素を調べることによって検出することができる。なお、背景画像ＧＢは文字の背景として用いられるので、原稿画像ＧＡ０を再現する際に文字が見やすくなるようにするために、背景画像ＧＢの色または模様を、検出した色または模様よりもトーンが低くなるように補正してもよい。 The background image generation unit 178 generates a background image GB that is an image of only the background portion of the character region RM. That is, an image having the same shape as the character region image GM and the same color or pattern as the background portion of the character region image GM is generated. The color or pattern can be detected by examining pixels near the edge of the character area image GM. Since the background image GB is used as a character background, the tone or color of the background image GB is set to be higher than that of the detected color or pattern in order to make the characters easier to see when reproducing the document image GA0. You may correct | amend so that it may become low.

または、文字領域画像ＧＭの分割画像であるブロック画像ＧＲごとに背景画像を生成し、これらを元のブロック画像ＧＲの並び順に並べて統合することによって背景画像ＧＢを生成してもよい。 Alternatively, the background image GB may be generated by generating a background image for each block image GR that is a divided image of the character area image GM and arranging them in the order of arrangement of the original block images GR.

文字背景画像出力部１７９は、生成された文字抽出画像ＧＴおよび背景画像ＧＢのそれぞれのデータを、画像圧縮処理部１０８（図３参照）に出力する。 The character background image output unit 179 outputs the generated data of the character extraction image GT and the background image GB to the image compression processing unit 108 (see FIG. 3).

図１０は文字背景画像生成処理の流れの例を説明するフローチャート、図１１は二値画像生成処理の流れの例を説明するフローチャート、図１２はファイルＦＬの構成の例を示す図である。 FIG. 10 is a flowchart for explaining an example of the flow of the character background image generation process, FIG. 11 is a flowchart for explaining an example of the flow of the binary image generation process, and FIG. 12 is a diagram showing an example of the configuration of the file FL.

ここで、図４の文字領域ＲＭｂの文字領域画像ＧＭを文字部分と背景部分とに分離して文字抽出画像ＧＴと背景画像ＧＢとを生成する場合を例に、文字背景画像生成部１０７の全体的な処理の流れを、図１０および図１１のフローチャートなどを参照して説明する。 Here, the entire character background image generation unit 107 is taken as an example in which the character extraction image GT and the background image GB are generated by separating the character region image GM of the character region RMb of FIG. 4 into a character portion and a background portion. A typical process flow will be described with reference to the flowcharts of FIGS.

文字背景画像生成部１０７は、文字領域ＲＭｂの文字領域画像ＧＭに含まれる「デジタル機の価格差は…期待できます。」の各文字の文字画像を、図７のように１〜１０個程度の文字画像のグループにグループ化し、ブロック画像ＧＲ（ＧＲ１、ＧＲ２、…）に分割する（図１０の＃１１）。 The character background image generation unit 107 has about 1 to 10 character images of each character “I can expect the price difference of digital machines” included in the character region image GM of the character region RMb as shown in FIG. Are divided into groups of character images and divided into block images GR (GR1, GR2,...) (# 11 in FIG. 10).

１番目のブロック画像ＧＲ１について、ステップ＃１２〜＃１４の処理を施す。すなわち、ブロック画像ＧＲ１が文字領域ＲＭｂ内のどの位置にあるのかを判別することによって文字列位置データ８２を得るとともに（＃１２）、ブロック画像ＧＲ１の下地濃度（文字下地濃度Ｈｄ）を検出する（＃１３）。 Steps # 12 to # 14 are performed on the first block image GR1. That is, the character string position data 82 is obtained by determining where the block image GR1 is in the character area RMb (# 12), and the background density (character background density Hd) of the block image GR1 is detected (# 12). # 13).

ブロック画像ＧＲ１の二値画像ＧＮを、図１１に示すような手順で生成する（＃１４）。すなわち、まず、濃度閾値テーブルＴＬ１（図８参照）に基づいて濃度閾値βをリセットし初期レベルの値に設定しておく（＃２１）。濃度閾値βを閾値としてブロック画像ＧＲ１の各画素を二値化し、二値画像ＧＮを生成する（＃２２）。そして、文字認識処理を実行することによって、その二値画像ＧＮの中の各文字画像の文字認識を行い、文字認識率Ｒｃを算出する（＃２３）。 A binary image GN of the block image GR1 is generated by a procedure as shown in FIG. 11 (# 14). That is, first, based on the density threshold table TL1 (see FIG. 8), the density threshold β is reset and set to an initial level value (# 21). Each pixel of the block image GR1 is binarized using the density threshold β as a threshold to generate a binary image GN (# 22). Then, by executing the character recognition process, the character recognition of each character image in the binary image GN is performed, and the character recognition rate Rc is calculated (# 23).

算出した文字認識率Ｒｃが認識閾値α以上である場合は（＃２４でＮｏ）、その二値画像ＧＮには文字が鮮明に表れているものとみなし、今回生成した二値画像ＧＮを、ブロック画像ＧＲ１の良好な二値画像ＧＮとして採用する（＃３４）。 When the calculated character recognition rate Rc is equal to or greater than the recognition threshold value α (No in # 24), it is considered that the character appears clearly in the binary image GN, and the binary image GN generated this time is blocked. Adopted as a good binary image GN of the image GR1 (# 34).

一方、文字認識率Ｒｃが認識閾値α未満である場合は（＃２４でＹｅｓ）、より鮮明な文字画像が表れるように、次のように濃度閾値βを変更して二値画像ＧＮの生成をやり直す。 On the other hand, when the character recognition rate Rc is less than the recognition threshold value α (Yes in # 24), the density threshold value β is changed as follows to generate a binary image GN so that a clearer character image appears. Try again.

ブロック画像ＧＲ１の文字下地濃度Ｈｄが濃薄境界閾値γ以上である場合は（＃２５でＹｅｓ）、ブロック画像ＧＲ１の背景部分が濃いことが原因で文字が鮮明に表れていない二値画像ＧＮが生成されたと考えられる。そこで、濃度閾値テーブルＴＬ１に基づいてレベルを１つ上げて濃度閾値βを増やし（＃２６）、その濃度閾値βで二値画像ＧＮの生成をやり直す（＃２７）。そして、生成し直した二値画像ＧＮについて文字認識処理を実行し（＃２８）、文字認識率Ｒｃが認識閾値α以上であれば（＃２９でＮｏ）、最新の（つまり、今回生成し直した）二値画像ＧＮを、ブロック画像ＧＲ１の良好な二値画像ＧＮとして採用する（＃３４）。 When the character background density Hd of the block image GR1 is greater than or equal to the thin boundary threshold γ (Yes in # 25), a binary image GN in which characters are not clearly displayed due to the dark background portion of the block image GR1 is obtained. It is thought that it was generated. Therefore, based on the density threshold table TL1, the level is increased by 1 to increase the density threshold β (# 26), and the binary image GN is generated again with the density threshold β (# 27). Then, character recognition processing is executed for the regenerated binary image GN (# 28), and if the character recognition rate Rc is equal to or greater than the recognition threshold α (No in # 29), the latest (that is, the current regenerated image) The binary image GN is adopted as a good binary image GN of the block image GR1 (# 34).

ブロック画像ＧＲ１の文字下地濃度Ｈｄが濃薄境界閾値γ未満である場合は（＃２５でＮｏ）、二値画像ＧＮに表れた文字がかすれていると考えられる。そこで、レベルを１つ下げて濃度閾値βを減らし（＃３０）、その濃度閾値βで二値画像ＧＮの生成をやり直す（＃３１）。そして、生成し直した二値画像ＧＮについて文字認識処理を実行し（＃３２）、文字認識率Ｒｃが認識閾値α以上であれば（＃３３でＮｏ）、今回生成し直した二値画像ＧＮを、ブロック画像ＧＲ１の良好な二値画像ＧＮとして採用する（＃３４）。 When the character background density Hd of the block image GR1 is less than the thin boundary threshold γ (No in # 25), it is considered that the characters appearing in the binary image GN are faint. Therefore, the level is decreased by one to reduce the density threshold β (# 30), and the binary image GN is generated again with the density threshold β (# 31). Then, the character recognition process is executed for the regenerated binary image GN (# 32), and if the character recognition rate Rc is equal to or greater than the recognition threshold α (No in # 33), the regenerated binary image GN this time. Is adopted as a good binary image GN of the block image GR1 (# 34).

依然として文字認識率Ｒｃが認識閾値α未満である場合は（＃２９でＹｅｓまたは＃３３でＹｅｓ）、さらに濃度閾値βのレベルを上下させて、二値画像ＧＮの生成および画像認識処理をやり直す。ただし、これ以上レベルを上げることができない場合（＃２９でＮｏ）または下げることができない場合は（＃３３でＮｏ）、最後に生成した二値画像ＧＮを、ブロック画像ＧＲ１の良好な二値画像ＧＮとみなす（＃３４）。 If the character recognition rate Rc is still less than the recognition threshold α (Yes in # 29 or Yes in # 33), the level of the density threshold β is further increased and decreased to generate the binary image GN and perform the image recognition process again. However, if the level cannot be further increased (No in # 29) or cannot be decreased (No in # 33), the binary image GN generated last is used as a good binary image of the block image GR1. It is regarded as GN (# 34).

図１０に戻って、ブロック画像ＧＲ１の二値画像ＧＮが得られたら、残りのブロック画像ＧＲ（ＧＲ２、ＧＲ３、…）の二値画像ＧＮを順次生成する。生成の手順は、上に説明したブロック画像ＧＲ１の二値画像ＧＮを生成する場合と同様である（＃１２〜＃１４）。 Returning to FIG. 10, when the binary image GN of the block image GR1 is obtained, the binary images GN of the remaining block images GR (GR2, GR3,...) Are sequentially generated. The generation procedure is the same as that for generating the binary image GN of the block image GR1 described above (# 12 to # 14).

すべてのブロック画像ＧＲの良好な二値画像ＧＮが生成されたら（＃１５でＹｅｓ）、これらの二値画像ＧＮを、ステップ＃１２で求めた文字列位置データ８２に基づいて元の並び順に並べて１つの画像に統合することによって、文字抽出画像ＧＴを生成する（＃１６）。 When good binary images GN of all the block images GR are generated (Yes in # 15), these binary images GN are arranged in the original arrangement order based on the character string position data 82 obtained in Step # 12. A character extraction image GT is generated by integrating the images into one image (# 16).

ブロック画像ＧＲの背景部分の色または模様を検出し、背景画像ＧＢを生成する（＃１７）。 The color or pattern of the background portion of the block image GR is detected, and the background image GB is generated (# 17).

図３に戻って、画像圧縮処理部１０８は、文字領域ＲＭの文字部分だけの画像すなわち二値画像ＧＮの画像データおよび背景部分だけの画像すなわち背景画像ＧＢの画像データを文字背景画像生成部１０７から取得すると、これらの画像データを圧縮処理する。文字抽出画像ＧＴは、モノクロビットマップの画像であるので、Ｇ４圧縮方式（ＭＭＲ圧縮方式）などで圧縮するのが望ましい。背景画像ＧＢは、色または模様の画像なので、ＧＩＦまたはＪＰＥＧなどの圧縮方式で圧縮するのが望ましい。以下、圧縮された文字抽出画像ＧＴの画像データおよび背景画像ＧＢの画像データをそれぞれ「圧縮文字画像データＤＴＡ」および「圧縮背景画像データＤＴＢ」と記載する。 Returning to FIG. 3, the image compression processing unit 108 converts the image of only the character portion of the character region RM, that is, the image data of the binary image GN, and the image of only the background portion, that is, the image data of the background image GB, to the character background image generation unit 107. If acquired from the above, these image data are compressed. Since the character extraction image GT is a monochrome bitmap image, it is desirable to compress it using the G4 compression method (MMR compression method) or the like. Since the background image GB is a color or pattern image, it is desirable to compress the background image GB by a compression method such as GIF or JPEG. Hereinafter, the image data of the compressed character extraction image GT and the image data of the background image GB are referred to as “compressed character image data DTA” and “compressed background image data DTB”, respectively.

また、画像圧縮処理部１０８は、非文字領域ＲＨの画像のデータも圧縮する。ただし、非文字領域ＲＨには写真などの大きなサイズの画像が配置されているので、ＪＰＥＧのような、非可逆の圧縮方式によってサイズが小さくなるように圧縮する。または、非文字領域画像ＧＨごとに画像のタイプを判別し、ＪＰＥＧまたはＧＩＦなどの圧縮方式を使い分けるようにしてもよい。以下、画像圧縮処理部１０８によって圧縮された非文字領域ＲＨの画像データを「圧縮非文字画像データＤＴＣ」と記載する。 The image compression processing unit 108 also compresses image data in the non-character region RH. However, since a large-sized image such as a photograph is arranged in the non-character region RH, the image is compressed so as to be reduced in size by an irreversible compression method such as JPEG. Alternatively, the image type may be determined for each non-character area image GH, and a compression method such as JPEG or GIF may be used properly. Hereinafter, the image data of the non-character area RH compressed by the image compression processing unit 108 is referred to as “compressed non-character image data DTC”.

ファイル生成部１０９は、スキャナ１０ｇが読み取った原稿の原稿画像ＧＡ０を再現するためのファイルＦＬを、画像圧縮処理部１０８によって得られた各圧縮データおよび各画像の位置情報などを用いて、例えば次のように生成する。 The file generation unit 109 uses the compressed data obtained by the image compression processing unit 108 and the position information of each image, for example, as a file FL for reproducing the document image GA0 of the document read by the scanner 10g. Generate as follows.

原稿画像ＧＡ０を構成する各領域の圧縮文字画像データＤＴＡ、圧縮背景画像データＤＴＢ、圧縮非文字画像データＤＴＣ、および領域位置データ８１を用意する。さらに、生成するファイルＦＬの属性を示すファイル属性データ８３を生成し用意する。属性の一部（例えばファイル名など）は、ユーザに指定させるようにしてもよい。 Compressed character image data DTA, compressed background image data DTB, compressed non-character image data DTC, and region position data 81 for each area constituting the document image GA0 are prepared. Further, file attribute data 83 indicating the attribute of the file FL to be generated is generated and prepared. A part of the attribute (for example, a file name) may be specified by the user.

同じ文字領域ＲＭの圧縮文字画像データＤＴＡと圧縮背景画像データＤＴＢと領域位置データ８１とを対応付け、同じ非文字領域ＲＨの圧縮背景画像データＤＴＢと領域位置データ８１とを対応付ける。そして、これらの圧縮文字画像データＤＴＡ、圧縮背景画像データＤＴＢ、圧縮非文字画像データＤＴＣ、領域位置データ８１、および属性データ８３を１つに統合することによって、ファイル化する。このようにして、ファイルＦＬが生成される。 The compressed character image data DTA, the compressed background image data DTB, and the region position data 81 in the same character region RM are associated with each other, and the compressed background image data DTB in the same non-character region RH and the region position data 81 are associated with each other. Then, the compressed character image data DTA, the compressed background image data DTB, the compressed non-character image data DTC, the region position data 81, and the attribute data 83 are integrated into one file. In this way, the file FL is generated.

例えば、図４のような原稿画像ＧＡ０がスキャナ１０ｇによって入力され、図３の前処理部１０１ないし画像圧縮処理部１０８によってその原稿画像ＧＡ０に対して処理を施した結果、文字領域ＲＭａのデータとして圧縮文字画像データＤＴＡａ、圧縮背景画像データＤＴＢａ、および領域位置データ８１ａが得られ、文字領域ＲＭｂのデータとして圧縮文字画像データＤＴＡｂ、圧縮背景画像データＤＴＢｂ、および領域位置データ８１ｂが得られ、非文字領域ＲＨａのデータとして圧縮非文字画像データＤＴＣｃおよび領域位置データ８１ｃが得られたとする。このような場合は、ファイル生成部１０９は、属性データ８３を生成し、領域ごとのデータを並べて対応付け、そして、図１２に示すようなファイル構成のファイルＦＬを生成する。 For example, an original image GA0 as shown in FIG. 4 is input by the scanner 10g, and the original image GA0 is processed by the preprocessing unit 101 or the image compression processing unit 108 of FIG. Compressed character image data DTAa, compressed background image data DTBa, and region position data 81a are obtained. As character region RMb data, compressed character image data DTAb, compressed background image data DTBb, and region position data 81b are obtained, and non-characters are obtained. It is assumed that compressed non-character image data DTCc and region position data 81c are obtained as data of region RHa. In such a case, the file generation unit 109 generates the attribute data 83, arranges and associates the data for each area, and generates a file FL having a file configuration as shown in FIG.

なお、同じ種類のオブジェクトの領域が複数個ある場合は、これらの領域を１つに統合し、全体のオブジェクト数を減らす処理を行ってもよい。 If there are a plurality of areas of the same type of object, these areas may be integrated into one and processing for reducing the total number of objects may be performed.

生成されたファイルＦＬは、ハードディスク１０ｄに保存される。または、電子メールまたはＦＴＰなどによって通信回線３を介して端末装置２などに転送される。ファイルＦＬを、アクロバット社のＰＤＦなどの既存のフォーマットに従って生成してもよい。 The generated file FL is stored in the hard disk 10d. Alternatively, it is transferred to the terminal device 2 or the like via the communication line 3 by e-mail or FTP. The file FL may be generated in accordance with an existing format such as Acrobat PDF.

生成されたファイルＦＬの画像は、端末装置２などにおいて次のように再現される。端末装置２は、ファイルＦＬに対応したアプリケーションソフト（例えば、ファイルＦＬがＰＤＦファイルである場合はアドビシステムズ社のアクロバットリーダなど）を起動し、ファイルＦＬをＲＡＭにロードする。そのファイルＦＬの中の圧縮文字画像データＤＴＡに基づいて文字画像を再現し、これに対応付けられている領域位置データ８１に示される位置に画像を配置する。圧縮背景画像データＤＴＢが対応付けられている場合は、それに基づいて背景画像を再現し、この上に文字画像を透過形式で重ね合わせる。圧縮非文字画像データＤＴＣについても同様に、そのデータに基づいて写真などの画像を再現し、そのデータに対応付けられている領域位置データ８１に示される位置に基づいて画像を配置すればよい。このようにして、原稿画像ＧＡ０が再現される。 The generated image of the file FL is reproduced in the terminal device 2 or the like as follows. The terminal device 2 activates application software corresponding to the file FL (for example, Adobe Acrobat's acrobat reader if the file FL is a PDF file), and loads the file FL into the RAM. A character image is reproduced based on the compressed character image data DTA in the file FL, and the image is arranged at a position indicated by the region position data 81 associated therewith. When the compressed background image data DTB is associated, a background image is reproduced based on the compressed background image data DTB, and a character image is superimposed on the background image in a transparent format. Similarly, with respect to the compressed non-character image data DTC, an image such as a photograph may be reproduced based on the data, and the image may be arranged based on the position indicated by the region position data 81 associated with the data. In this way, the document image GA0 is reproduced.

図１３はスキャンジョブを実行する際の画像形成装置１の全体的な処理の流れの例を説明するフローチャートである。次に、原稿をスキャンして原稿画像の電子データを生成する際の画像形成装置１の全体的な処理の流れを、図１３のフローチャートを参照して説明する。 FIG. 13 is a flowchart illustrating an example of the overall processing flow of the image forming apparatus 1 when executing a scan job. Next, the overall processing flow of the image forming apparatus 1 when the original is scanned to generate electronic data of the original image will be described with reference to the flowchart of FIG.

ユーザは、電子データ化したい原稿を画像形成装置１の原稿台にセットし、操作パネル１０ｆを操作してスキャン指令を画像形成装置１に対して与える。 The user sets a document to be converted into electronic data on the document table of the image forming apparatus 1, and operates the operation panel 10 f to give a scan command to the image forming apparatus 1.

すると、画像形成装置１は、セットされた原稿をスキャンして原稿画像ＧＡ０を入力する。つまり、原稿画像ＧＡ０のデータを取得する（図１３の＃１）。そして、次のように、その原稿画像ＧＡ０のファイルＦＬを生成するための処理を開始する。すなわち、まず、原稿画像ＧＡ０に対して解像度変換処理および下地除去処理を施す（＃２）。これにより、原稿画像ＧＡ０’が得られる。原稿画像ＧＡ０’の各画素の明度を算出することによって明度画像ＧＡ１を生成し（＃３）、これにスムージング処理を施す（＃４）。これにより、明度画像ＧＡ１’が得られる。 Then, the image forming apparatus 1 scans the set original and inputs the original image GA0. That is, the data of the document image GA0 is acquired (# 1 in FIG. 13). Then, processing for generating the file FL of the document image GA0 is started as follows. That is, first, resolution conversion processing and background removal processing are performed on the document image GA0 (# 2). Thereby, the document image GA0 'is obtained. A brightness image GA1 is generated by calculating the brightness of each pixel of the document image GA0 '(# 3), and a smoothing process is performed on it (# 4). Thereby, the brightness image GA1 'is obtained.

明度画像ＧＡ１’に対してラベリング処理を施すことによって、原稿画像ＧＡ０’の中の文字領域ＲＭおよび非文字領域ＲＨを検出する（＃５）。このとき、これらの領域の位置を示す領域位置データ８１を生成しておく。 By performing a labeling process on the brightness image GA1 ', the character area RM and the non-character area RH in the document image GA0' are detected (# 5). At this time, area position data 81 indicating the positions of these areas is generated.

文字領域ＲＭの文字領域画像ＧＭから文字だけの画像すなわち二値画像ＧＮと背景だけの画像すなわち非文字領域画像ＧＨを生成する（＃６）。この際、二値画像ＧＮについては、文字が鮮明になるように必要に応じて適正化の処理を行う。係る処理の手順は、前に図１０および図１１で説明した通りである。なお、非文字領域ＲＨの非文字領域画像ＧＨについても、公知の方法を用いて写真または挿絵などを補正する処理を施してもよい。 From the character region image GM in the character region RM, an image of only characters, that is, a binary image GN, and an image of only the background, that is, a non-character region image GH are generated (# 6). At this time, for the binary image GN, optimization processing is performed as necessary so that the characters become clear. The procedure of such processing is as described above with reference to FIGS. Note that the non-character area image GH of the non-character area RH may be subjected to processing for correcting a photograph or an illustration using a known method.

文字領域画像ＧＭ、背景画像ＧＢ、非文字領域画像ＧＨの各画像データを圧縮し、圧縮文字画像データＤＴＡ、圧縮背景画像データＤＴＢ、および圧縮非文字画像データＤＴＣを生成する（＃７）。 The image data of the character area image GM, background image GB, and non-character area image GH are compressed to generate compressed character image data DTA, compressed background image data DTB, and compressed non-character image data DTC (# 7).

そして、これらの圧縮文字画像データＤＴＡ、圧縮非文字画像データＤＴＣ、圧縮背景画像データＤＴＢ、各領域の領域位置データ８１、および属性データ８３などを統合することによって、ファイルＦＬを生成する（＃８）。 The file FL is generated by integrating the compressed character image data DTA, the compressed non-character image data DTC, the compressed background image data DTB, the region position data 81 of each region, the attribute data 83, and the like (# 8). ).

本実施形態によると、スキャンによって得られた原稿画像ＧＡ０に含まれる文字画像に背景の装飾が含まれいたり文字画像に過度な下地除去処理が施されたりして、文字画像が不鮮明になっても、その文字画像から文字部分だけの画像を上手く抽出することができる。 According to the present embodiment, even if the character image included in the original image GA0 obtained by scanning contains background decoration or the character image is subjected to excessive background removal processing, the character image becomes unclear. , It is possible to successfully extract an image of only the character portion from the character image.

また、文字領域の判定および文字下地濃度Ｈｄの判別を行った上で適正化の処理を行うので、処理に掛かる時間やメモリなどの資源の使用を抑えることができる。 In addition, since the optimization process is performed after the determination of the character region and the character background density Hd, it is possible to suppress the processing time and the use of resources such as memory.

本実施形態では、原稿の画像を読み取る機能、画像フォーマットを変換する機能、画像を補正する機能、および画像データのファイル化を行う機能など、図２および図３の各部の機能がすべて集約された画像形成装置１を例に説明したが、図３の各部の機能を複数の装置に分散して実現することも可能である。例えば、原稿の画像を読み取る機能をパーソナルコンピュータ用のスキャナで実現し、それ以外をパーソナルコンピュータによって実現するようにしてもよい。この場合は、パーソナルコンピュータには、スキャナのドライバおよび図１０、図１１、図１３で説明した処理を実行するためのコンピュータプログラムをインストールしておく。または、画像形成装置１で読み取った原稿画像ＧＡ０の画像データを端末装置２に転送し、端末装置２が図１０、図１１、図１３で説明した処理を実行するようにしてもよい。 In this embodiment, all the functions of the respective parts in FIGS. 2 and 3 are integrated, such as a function of reading an image of a document, a function of converting an image format, a function of correcting an image, and a function of filing image data. Although the image forming apparatus 1 has been described as an example, the functions of the respective units in FIG. 3 can be realized by being distributed to a plurality of apparatuses. For example, the function of reading an image of a document may be realized by a scanner for a personal computer, and the others may be realized by a personal computer. In this case, a scanner driver and a computer program for executing the processing described with reference to FIGS. 10, 11, and 13 are installed in the personal computer. Alternatively, the image data of the original image GA0 read by the image forming apparatus 1 may be transferred to the terminal apparatus 2 so that the terminal apparatus 2 executes the processing described with reference to FIGS.

本実施形態では、５段階のレベルで濃度閾値βを調整したが、もっと細かいレベルで調整するようにしてもよい。 In the present embodiment, the density threshold β is adjusted at five levels, but may be adjusted at a finer level.

本実施形態では、スキャナで読み取った原稿の画像をファイル化する場合を例に説明したが、本発明は、それ以外の種類の画像読取装置で取得した画像をファイル化する場合にも適用可能である。例えば、デジタルカメラで撮影した原稿の画像をファイル化するためにも適用することができる。 In this embodiment, the case where an image of a document read by a scanner is converted into a file has been described as an example. However, the present invention can also be applied to a case where an image acquired by another type of image reading apparatus is converted into a file. is there. For example, the present invention can be applied to file an image of a document shot with a digital camera.

文字領域画像ＧＭに対して文字認識処理を行った際に得られた、その文字領域画像ＧＭの表す文字列のテキストデータを、ファイルＦＬに含めるようにしてもよい。 The text data of the character string represented by the character region image GM obtained when the character recognition process is performed on the character region image GM may be included in the file FL.

本実施形態では、文字領域ＲＭおよび非文字領域ＲＨの検出を自動で行ったが、ユーザがマウスでドラックするなどして指定できるようにしてもよい。 In the present embodiment, the character area RM and the non-character area RH are automatically detected. However, the user may be able to specify them by dragging with the mouse.

その他、画像形成装置１の全体または各部の構成、処理内容、処理順序、テーブルの内容などは、本発明の趣旨に沿って適宜変更することができる。 In addition, the configuration of the entire image forming apparatus 1 or each unit, processing contents, processing order, table contents, and the like can be appropriately changed in accordance with the spirit of the present invention.

本発明は、特に、ＭＦＰまたはパーソナルコンピュータなどの画像処理装置において、スキャンした原稿画像のＰＤＦファイルなどを生成するために好適に用いられる。 The present invention is particularly suitable for generating a PDF file of a scanned document image in an image processing apparatus such as an MFP or a personal computer.

本発明に係る画像形成装置を有するネットワーク構成の例を示す図である。1 is a diagram illustrating an example of a network configuration having an image forming apparatus according to the present invention. 画像形成装置のハードウェア構成の例を示す図である。2 is a diagram illustrating an example of a hardware configuration of an image forming apparatus. FIG. 画像形成装置の機能的構成の例を示す図である。2 is a diagram illustrating an example of a functional configuration of an image forming apparatus. FIG. 原稿画像の例を示す図である。It is a figure which shows the example of a manuscript image. 不具合を有する文字画像の例を示す図である。It is a figure which shows the example of the character image which has a malfunction. 文字背景画像生成部の構成の例を示す図である。It is a figure which shows the example of a structure of a character background image generation part. 文字領域に含まれる文字画像のグループ化の例を示す図である。It is a figure which shows the example of grouping of the character image contained in a character area. 濃度閾値テーブルの例を示す図である。It is a figure which shows the example of a density | concentration threshold value table. 不具合のある二値画像および良好な二値画像の例を示す図である。It is a figure which shows the example of a binary image with a malfunction, and a favorable binary image. 文字背景画像生成処理の流れの例を説明するフローチャートである。It is a flowchart explaining the example of the flow of a character background image generation process. 二値画像生成処理の流れの例を説明するフローチャートである。It is a flowchart explaining the example of the flow of a binary image generation process. ファイルの構成の例を示す図である。It is a figure which shows the example of a structure of a file. スキャンジョブを実行する際の画像形成装置の全体的な処理の流れの例を説明するフローチャートである。10 is a flowchart illustrating an example of the overall processing flow of the image forming apparatus when executing a scan job.

Explanation of symbols

１画像形成装置（文字画像抽出装置）
１０ｇスキャナ（画像入力装置、画像ファイル生成装置）
１０４領域検出部（文字領域検出手段）
１０８画像圧縮処理部（データ圧縮手段）
１０９ファイル生成部（画像ファイル生成手段）
１７０文字領域画像分割部（分割手段）
１７２二値画像生成部（二値画像変換手段）
１７３文字認識処理部（文字認識処理手段）
１７４文字認識率算出部（文字認識率算出手段）
１７６再実行指令部（再変換制御手段）
１７７文字画像統合部（統合画像生成手段）
１７９文字背景画像出力部（出力手段）
ＦＬファイル（画像ファイル）
Ｈｄ文字下地濃度
ＧＡ０原稿画像（入力画像）
ＧＮ二値画像（文字画像）
ＧＲブロック画像
ＧＴ文字抽出画像（統合画像）
Ｒｃ文字認識率
ＲＨ非文字領域
ＲＭ文字領域
α 認識閾値（第二の閾値）
β 濃度閾値（第一の閾値）
γ 濃薄境界閾値（第三の閾値）

1 Image forming device (character image extraction device)
10g scanner (image input device, image file generation device)
104 area detection unit (character area detection means)
108 Image compression processing unit (data compression means)
109 File generation unit (image file generation means)
170 Character area image dividing unit (dividing means)
172 Binary image generation unit (binary image conversion means)
173 Character recognition processing unit (character recognition processing means)
174 Character recognition rate calculation unit (character recognition rate calculation means)
176 Re-execution command section (re-conversion control means)
177 Character image integration unit (integrated image generation means)
179 Character background image output unit (output means)
FL file (image file)
Hd Character background density GA0 Original image (input image)
GN binary image (character image)
GR block image GT character extraction image (integrated image)
Rc Character recognition rate RH Non-character area RM Character area α Recognition threshold (second threshold)
β concentration threshold (first threshold)
γ Dense boundary threshold (third threshold)

Claims

A character image extraction device that extracts a character image from an input image input by an image input device,
Binary image conversion that executes conversion processing for converting an image of a character area, which is an area in which one or more character images are arranged, from the input image into a binary image based on a first threshold. Means,
Character recognition processing means for executing character recognition processing on the converted binary image of the character region;
A character recognition rate calculating means for calculating a character recognition rate, which is a ratio of characters recognized by the character recognition process among characters represented by the binary image of the character region;
An output unit that outputs the binary image as a character image extraction result of the input image when the character recognition rate of the binary image of the character region exceeds a second threshold;
When the character recognition rate of the binary image of the character area is lower than the second threshold value, the binary image conversion is performed such that the first threshold value is changed and the conversion process is performed again for the character area. Reconversion control means for controlling the means;
A character image extracting apparatus comprising:

The re-conversion control means increases the first threshold value and re-executes the conversion process when the background density of the image of the character region in the input image exceeds a third threshold value. Controlling the value image converting means, and if the background density is lower than the third threshold value, the binary image converting means is controlled so as to lower the first threshold value and execute the conversion process again.
The character image extraction apparatus according to claim 1.

A character image extraction device that extracts a character image from an input image input by an image input device,
A dividing unit that divides a character region, which is a region where a plurality of character images in the input image are arranged, into a plurality of blocks such that one or more character images are grouped;
Binary image conversion means for executing conversion processing for converting an image of the block of the input image into a binary image based on a first threshold;
Character recognition processing means for executing character recognition processing on the binary image of the converted block;
A character recognition rate calculating means for calculating a character recognition rate, which is a ratio of characters recognized by the character recognition process among characters represented by the binary image of the block;
When the character recognition rate of the binary image of the block is lower than a second threshold, the binary image conversion means is controlled so as to change the first threshold and re-execute the conversion process for the block. Re-conversion control means;
An integrated image generating means for generating an integrated image by arranging binary images in which the character recognition rate of each block of the character area exceeds the second threshold in the original arrangement order and integrating them into one image; ,
Output means for outputting the generated integrated image of the character region as a result of extraction of the character image of the input image;
A character image extracting apparatus comprising:

When the background density of the image of the block exceeds a third threshold value, the re-conversion control means increases the first threshold value and re-executes the conversion process for the block. If the background density is lower than the third threshold, the binary image conversion means is controlled so as to lower the first threshold and re-execute the conversion process for the block.
The character image extraction device according to claim 3.

An image file generation device that generates an image file of an input image input by an image input device,
A character area detecting means for detecting, from the input image, a character area in which one or more character images are continuously arranged and a non-character area in which an image other than characters is arranged;
A binary image conversion means for executing a conversion process of converting the detected image of the character region of the input image into a binary image based on a first threshold;
Character recognition processing means for executing character recognition processing on the converted binary image of the character region;
A character recognition rate calculating means for calculating a character recognition rate, which is a ratio of characters recognized by the character recognition process among characters represented by the binary image of the character region;
In the case where the character recognition rate of the binary image of the character area is lower than a second threshold value, when the background density of the character area exceeds a third threshold value, the first threshold value is increased and the character area is The binary image converting means is controlled so as to execute the conversion process again. When the background density of the character area is lower than the third threshold value, the first threshold value is lowered and the conversion process is performed for the character area. Re-conversion control means for controlling the binary image conversion means to re-execute,
Image data of the binary image of the character area determined that the character recognition rate exceeds the second threshold, image data of the image of the non-character area, each of the character area and the non-character area, Image file generation means for generating an image file by integrating the position data indicating the position in the input image and the background color data for reproducing the background color of the character area into one;
An image file generation device characterized by comprising:

Data compression means for compressing image data of an image according to the type of object represented by the image;
The image file generation means generates the image file by integrating the image data of the binary image in the character area and the image data of the image in the non-character area compressed by the data compression means.
6. The image file generation means according to claim 5.

A character image extraction method for extracting a character image from an input image input by an image input device,
Executing a conversion process of converting an image of a character area, which is an area in which one or more character images are arranged in the input image, into a binary image based on a first threshold;
Performing character recognition processing on the converted binary image of the character region;
Calculating a character recognition rate, which is a ratio of characters recognized by the character recognition process among characters represented by the binary image of the character region;
When the character recognition rate of the binary image of the character region exceeds a second threshold value, the binary image is output as a character image extraction result of the input image,
When the character recognition rate of the binary image of the character region is lower than the second threshold, the binary image conversion is performed so that the first threshold is changed and the conversion process is performed again for the character region. Control means,
A character image extraction method characterized by that.

A computer program used in a computer for extracting a character image from an input image input by an image input device,
A conversion process of converting an image of a character area, which is an area in which one or more character images are arranged in the input image, into a binary image based on a first threshold;
A character recognition process for the binary image of the converted character region;
A calculation process for calculating a character recognition rate which is a ratio of characters recognized by the character recognition process among the characters represented by the binary image of the character region;
When the character recognition rate of the binary image of the character region exceeds a second threshold, an output process for outputting the binary image as a character image extraction result of the input image;
When the character recognition rate of the binary image of the character area is lower than the second threshold value, the binary image conversion is performed such that the first threshold value is changed and the conversion process is performed again for the character area. Control processing for controlling the means;
A computer program for causing a computer to execute.