JP2005269271A

JP2005269271A - Image processing apparatus

Info

Publication number: JP2005269271A
Application number: JP2004079129A
Authority: JP
Inventors: Toshiya Koyama; 俊哉小山
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2004-03-18
Filing date: 2004-03-18
Publication date: 2005-09-29
Anticipated expiration: 2024-03-18
Also published as: JP4370950B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image processing apparatus capable of enhancing the compression rate and the image quality. <P>SOLUTION: The image processing apparatus extracts an image element of a line segment from image data which have become processing objects, generates first partial image data comprising the image elements of the line segments and second partial image data except the image elements of the line segments, determines the representative color of the first partial image, limits colors of the first partial image data by using the determined representative color, determines also the representative color, even as to the second partial image data, limits the colors of the second partial image data by using the determined representative color, and uses the image data, after the colors are limited for prescribed compression processing. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、画像データを所定のルールで分離し、圧縮処理を行う画像処理装置に関する。 The present invention relates to an image processing apparatus that separates image data according to a predetermined rule and performs compression processing.

ラスタ画像のデータ（以下、区別するべき場合を除いて単に「画像データ」と呼ぶ）には、文字（テキスト）部分や、自然画の部分（絵柄部分）など、互いに性状の異なる多くの画像要素が含まれ得る。こうした画像要素は、その性状の相違から、例えば圧縮処理において異なる方式での圧縮が適していたりするなど、画像要素ごとに異なる画像処理を行うことが好ましい場合が多い。 Raster image data (hereinafter simply referred to as “image data” unless otherwise distinguished) has many image elements with different properties such as character (text) parts and natural picture parts (design parts). Can be included. Due to the difference in properties of these image elements, it is often preferable to perform different image processing for each image element, for example, compression by a different method is suitable for compression processing.

そこで従来から、いわゆるＴ／Ｉ分離と呼ばれる画像処理が研究・開発されており、文字と絵柄とに分離して圧縮処理を行う技術が開発されている（例えば特許文献１，２）。
特開２００２−１６５１０５号公報特開２００２−１７５５３２号公報 Therefore, conventionally, image processing called T / I separation has been researched and developed, and techniques for separating and compressing characters and patterns have been developed (for example, Patent Documents 1 and 2).
JP 2002-165105 A JP 2002-175532 A

しかしながら、例えば特許文献１に開示された方法では、文字以外の部分については単にＪＰＥＧ（Joint Picture Experts Group）等の非可逆圧縮を行うだけで、絵柄のうち輪郭線の部分や線画内で限定色化可能な部分にまでＪＰＥＧ圧縮が行われてしまうため、圧縮率の面でも、また画質の面でも十分な性能を発揮できない場合があった。 However, in the method disclosed in Patent Document 1, for example, a non-character portion is simply subjected to irreversible compression such as JPEG (Joint Picture Experts Group), and a limited color within a contour portion or line drawing of a pattern. Since JPEG compression is performed up to the part that can be converted, there are cases where sufficient performance cannot be exhibited in terms of compression rate and image quality.

本発明は上記実情に鑑みて為されたもので、文字以外の部分においても可能な部分について限定色化を行って圧縮率を向上し、また画質を向上できる画像処理装置を提供することをその目的の一つとする。 The present invention has been made in view of the above circumstances, and it is an object of the present invention to provide an image processing apparatus capable of improving the compression rate and improving the image quality by performing limited color on a possible portion other than a character. One of the purposes.

上記従来例の問題点を解決するための本発明は、画像処理装置であって、処理対象となっている画像データから、線分の画像要素を抽出し、当該線分の画像要素からなる第１部分画像データと当該線分の画像要素を除く第２部分画像データとを生成する手段と、前記第１部分画像データに含まれる画素の値に基づいて、少なくとも一つの代表色を決定し、当該決定した代表色を用いて前記第１部分画像データを限定色化した、第１限定色化画像データを生成する第１限定色化手段と、前記第２部分画像データに含まれる画素の値に基づいて、少なくとも一つの代表色を決定し、当該決定した代表色を用いて前記第２部分画像データを限定色化した、第２限定色化画像データを生成する第２限定色化手段と、を含み、前記第１限定色化画像データと第２限定色化画像データとが所定の圧縮処理に供されることを特徴としている。 The present invention for solving the above-described problems of the conventional example is an image processing apparatus that extracts line segment image elements from image data to be processed, and includes a line segment image element. Determining at least one representative color based on means for generating one partial image data and second partial image data excluding the image element of the line segment, and a value of a pixel included in the first partial image data; First limited colorization means for generating first limited color image data obtained by limiting the first partial image data using the determined representative color, and pixel values included in the second partial image data A second limited colorization unit that determines at least one representative color and generates the second limited color image data by limiting the second partial image data using the determined representative color. And the first limited color image data When a second color quantization image data is characterized in that it is subjected to a predetermined compression process.

ここで処理対象となる画像データは、例えば元となる画像データから文字部分を除去して得た画像データであってもよい。 Here, the image data to be processed may be, for example, image data obtained by removing a character portion from the original image data.

ここで前記第１限定色化画像データは、前記第１部分画像データに含まれる画素の値の有意個数を演算し、当該有意個数が予め定めた第１しきい値を越えない場合にのみ、少なくとも一つの代表色を決定し、当該決定した代表色を用いて前記第１部分画像データを限定色化した、第１限定色化画像データを生成することとしてもよい。 Here, the first limited color image data calculates a significant number of pixel values included in the first partial image data, and only when the significant number does not exceed a predetermined first threshold value. At least one representative color may be determined, and first limited color image data may be generated by limiting the first partial image data using the determined representative color.

さらに、前記第２限定色化画像データは、前記第２部分画像データに含まれる画素の値の有意個数を演算し、当該有意個数が予め定めた第２しきい値を越えない場合にのみ、少なくとも一つの代表色を決定し、当該決定した代表色を用いて前記第２部分画像データを限定色化した、第２限定色化画像データを生成することとしてもよい。 Further, the second limited color image data is calculated only when the significant number of pixel values included in the second partial image data is calculated and the significant number does not exceed a predetermined second threshold value. At least one representative color may be determined, and second limited color image data obtained by limiting the second partial image data using the determined representative color may be generated.

さらに前記第１限定色化画像データと第２限定色化画像データとに基づいて、前記第１限定色化画像データと第２限定色化画像データとに含まれる画素のうち、互いに同一色と判断される色ごとの画素を含んでなる、複数の色別画像データを生成する手段をさらに含むこととしてもよい。 Further, based on the first limited color image data and the second limited color image data, the same color among the pixels included in the first limited color image data and the second limited color image data. The image processing apparatus may further include means for generating a plurality of color-specific image data including pixels for each color to be determined.

なお、処理対象となる画像データが、元の画像データから文字部分を除去して得た画像データであり、また、元の画像データ内から複数抽出された部分的な画像データの一つである場合、当該部分的な画像データの各々から得られる前記第１限定色化画像データと第２限定色化画像データとに基づき、前記第１限定色化画像データと第２限定色化画像データとに含まれる画素のうち、互いに同一色と判断される色ごとの画素を含んでなる、複数の色別画像データを生成することとしてもよい。 Note that the image data to be processed is image data obtained by removing the character portion from the original image data, and is one of the partial image data extracted from the original image data. The first limited color image data and the second limited color image data based on the first limited color image data and the second limited color image data obtained from each of the partial image data. A plurality of color-specific image data including pixels for each color determined to be the same color may be generated.

また、本発明のある態様に係る画像処理方法は、処理対象となっている画像データから、線分の画像要素を抽出し、当該線分の画像要素からなる第１部分画像データと当該線分の画像要素を除く第２部分画像データとを生成する工程と、前記第１部分画像データに含まれる画素の値に基づいて、少なくとも一つの代表色を決定し、当該決定した代表色を用いて前記第１部分画像データを限定色化した、第１限定色化画像データを生成する工程と、前記第２部分画像データに含まれる画素の値に基づいて、少なくとも一つの代表色を決定し、当該決定した代表色を用いて前記第２部分画像データを限定色化した、第２限定色化画像データを生成する工程と、前記分離して得られた第１限定色化画像データと、第２限定色化画像データとが所定の圧縮処理に供されることを特徴としている。 In addition, an image processing method according to an aspect of the present invention extracts line segment image elements from image data to be processed, the first partial image data including the line segment image elements, and the line segments. And generating at least one representative color based on the pixel value included in the first partial image data, and generating the second partial image data excluding the image element, and using the determined representative color A step of generating first limited color image data obtained by limiting the first partial image data, and determining at least one representative color based on a value of a pixel included in the second partial image data; Generating a second limited color image data obtained by limiting the second partial image data using the determined representative color, the first limited color image data obtained by the separation, 2 limited color image data is predetermined It is characterized by being subjected to a condensation process.

さらに本発明の別の態様に係る画像処理プログラムは、コンピュータに、処理対象となっている画像データから、線分の画像要素を抽出し、当該線分の画像要素からなる第１部分画像データと当該線分の画像要素を除く第２部分画像データとを生成する手順と、前記第１部分画像データに含まれる画素の値に基づいて、少なくとも一つの代表色を決定し、当該決定した代表色を用いて前記第１部分画像データを限定色化した、第１限定色化画像データを生成する手順と、前記第２部分画像データに含まれる画素の値に基づいて、少なくとも一つの代表色を決定し、当該決定した代表色を用いて前記第２部分画像データを限定色化した、第２限定色化画像データを生成する手順と、を実行させ、前記分離して得られた第１限定色化画像データと、第２限定色化画像データとが所定の圧縮処理に供させることを特徴としている。 Furthermore, an image processing program according to another aspect of the present invention extracts a line segment image element from image data to be processed, and first partial image data including the line segment image element. At least one representative color is determined based on the procedure for generating the second partial image data excluding the image element of the line segment and the value of the pixel included in the first partial image data, and the determined representative color Based on a procedure for generating first limited color image data in which the first partial image data is limited color using and a value of a pixel included in the second partial image data, at least one representative color is selected. And generating the second limited color image data by limiting the second partial image data using the determined representative color and executing the procedure, and performing the first limitation obtained by the separation. Colored image data and A second color quantization image data is characterized thereby subjected to a predetermined compression process.

本発明の実施の形態に係る画像処理装置は、図１に示すように、制御部１１と記憶部１２と画像入力部１３と画像出力部１４とを含んで構成されている。制御部１１は、記憶部１２に格納されているプログラムに従って動作しており、後に説明する各画像処理を遂行する。この画像処理の内容については、後に詳しく述べる。 As shown in FIG. 1, the image processing apparatus according to the embodiment of the present invention includes a control unit 11, a storage unit 12, an image input unit 13, and an image output unit 14. The control unit 11 operates in accordance with a program stored in the storage unit 12 and performs each image processing described later. The contents of this image processing will be described in detail later.

記憶部１２は、制御部１１によって実行されるプログラムを保持している。またこの記憶部１２は、制御部１１の処理の過程で生成される各種データ等を格納するワークメモリとしても動作する。具体的にこの記憶部１２は、コンピュータ可読な記録媒体と当該記録媒体に対してデータを書き込み、又は当該記録媒体からデータを読み出す装置（例えばハードディスク装置やメモリ装置）として実装できる。 The storage unit 12 holds a program executed by the control unit 11. The storage unit 12 also operates as a work memory that stores various data generated during the process of the control unit 11. Specifically, the storage unit 12 can be implemented as a computer-readable recording medium and a device that writes data to or reads data from the recording medium (for example, a hard disk device or a memory device).

画像入力部１３は、例えばスキャナであり、原稿を光学的に読み取って得られた画像データを制御部１１に出力する。ここではこの画像入力部１３が出力する画像データにおいて、各画素の値がＲＧＢ（赤、緑、青）の色空間で表現されているとする。画像出力部１４は、制御部１１から入力される指示に従って画像データを出力するもので、例えば画像形成部（プリンタ等）に出力し、又はネットワークを介して外部の装置に送信する等の処理を行うものである。 The image input unit 13 is, for example, a scanner, and outputs image data obtained by optically reading a document to the control unit 11. Here, it is assumed that the value of each pixel is expressed in an RGB (red, green, blue) color space in the image data output from the image input unit 13. The image output unit 14 outputs image data in accordance with an instruction input from the control unit 11. For example, the image output unit 14 outputs the image data to an image forming unit (printer or the like) or transmits it to an external device via a network. Is what you do.

次に制御部１１の処理の内容について説明する。本実施の形態の制御部１１は、図２に機能的に示すように、画像入力部１３から入力される画像データを処理対象として、この処理対象となった画像データに対して、所定前処理を行う前処理部２１と、絵柄部分の候補となる部分（絵柄候補部分）を特定する絵柄候補部分特定処理部２２と、文字を抽出する文字抽出処理部２３と、レイアウト処理を遂行するレイアウト処理部２４と、同一色領域分離部２５と、穴埋処理部２６と、圧縮処理部２７とを含んで構成されている。 Next, the content of the process of the control part 11 is demonstrated. As functionally shown in FIG. 2, the control unit 11 of the present embodiment sets the image data input from the image input unit 13 as a processing target, and performs predetermined preprocessing on the image data that is the processing target. A pre-processing unit 21 for performing a pattern part, a pattern candidate part specifying unit 22 for specifying a part that is a candidate for a pattern part (a pattern candidate part), a character extraction processing unit 23 for extracting characters, and a layout process for performing a layout process The unit 24, the same color region separation unit 25, the hole filling processing unit 26, and the compression processing unit 27 are configured.

以下、これら各部について具体的に説明する。 Hereinafter, each of these parts will be described in detail.

［１．前処理部］
この前処理部２１では、画像入力部１３から入力される画像データ（処理対象画像データ）の各画素の値をＲＧＢからＹＣｂＣｒ（輝度と色差とからなる値）に変換する。具体的には、次の（１）式を用いて変換を行うことができる。なお、ここではＲＧＢの各成分の値は0x00（「0x」は１６進数であることを示す）から0xFFまでの値であるとしている。また、この前処理部２１では、下地領域の輝度・彩度に基づいて各画素値を階調補正してもよい。尤も、この階調補正の処理は、必ずしも必要なものではない。 [1. Pre-processing section]
In the preprocessing unit 21, the value of each pixel of the image data (processing target image data) input from the image input unit 13 is converted from RGB to YCbCr (a value composed of luminance and color difference). Specifically, the conversion can be performed using the following equation (1). Here, the value of each component of RGB is assumed to be a value from 0x00 (“0x” indicates a hexadecimal number) to 0xFF. Further, the pre-processing unit 21 may correct the gradation of each pixel value based on the luminance and saturation of the background area. However, the gradation correction process is not always necessary.

［２．絵柄候補領域特定処理部］
絵柄候補部分特定処理部２２は、前処理部２１が出力するＹＣｂＣｒ色空間で表現された画像データから絵柄の領域と推定される領域を、絵柄候補領域として特定する処理を行う。具体的に、この絵柄候補部分特定処理部２２は、前処理部２１が出力するＹＣｂＣｒ色空間で表現された画像データを記憶部１２にコピーして格納する。そして、当該コピーされた画像データ（絵柄領域特定画像データと呼ぶ）の各画素の値を参照しながら、当該画素値の輝度成分と、予め定めた二値化しきい値とを比較して、輝度成分が二値化しきい値以上の画素を黒画素（値「１」）、輝度成分が二値化しきい値未満の画素を白画素（値「０」）としていき、当該絵柄領域特定画像データを二値画像データに変換する。なおここでは、輝度成分が大きいほど暗い(黒い)ものとする。 [2. Design candidate area identification processing unit]
The pattern candidate portion identification processing unit 22 performs a process of identifying an area estimated as a pattern area from the image data expressed in the YCbCr color space output from the preprocessing unit 21 as a pattern candidate area. Specifically, the pattern candidate portion specifying processing unit 22 copies and stores the image data expressed in the YCbCr color space output from the preprocessing unit 21 in the storage unit 12. Then, referring to the value of each pixel of the copied image data (referred to as picture area specifying image data), the luminance component of the pixel value is compared with a predetermined binarization threshold value to obtain the luminance Pixels whose components are equal to or higher than the binarization threshold are black pixels (value “1”), pixels whose luminance components are less than the binarization threshold are white pixels (value “0”), and the picture area specifying image data is Convert to binary image data. Here, the larger the luminance component, the darker (black).

次に絵柄候補部分特定処理部２２は、この二値画像データ中から、黒画素が連結している領域（連結領域）を抽出する。この連結領域の抽出は、ラベリング処理として広く知られた処理を用いることができる。ラベリング処理を用いる場合、各連結領域には互いに異なるラベル識別子を付して、記憶部１２に格納しておく。次に、この連結領域の各々について、その所定のスケール特徴量を演算する。ここでスケール特徴量には、連結領域に関係して定められる矩形の面積などを含む。そして、このスケール特徴量に基づいて各連結領域が絵柄候補領域であるか否かを判定する。具体的な例を挙げると、文字であれば一般的なドキュメントで利用されるフォントサイズの最小値（例えば６ポイント）に相当する面積（いわば最小面積）未満の領域（小領域）は、文字を含まない、絵柄の領域と判定できるので、こうした小領域を除去する。また一般的なドキュメントでは、フォントサイズの大きさも例えば２４ポイントを超えることは稀となる。そこで、この一般的なドキュメントで利用される最大のフォントサイズに相当する面積（いわば最大面積）以上の場合、文字は含まれない（つまり絵柄領域）と判断する。 Next, the pattern candidate portion specifying processing unit 22 extracts a region (connected region) in which black pixels are connected from the binary image data. For the extraction of the connected region, a process widely known as a labeling process can be used. When the labeling process is used, each connected area is given a different label identifier and stored in the storage unit 12. Next, a predetermined scale feature amount is calculated for each of the connected regions. Here, the scale feature amount includes a rectangular area defined in relation to the connected region. Then, based on the scale feature amount, it is determined whether each connected area is a pattern candidate area. As a specific example, if a character is an area (small area) less than the area (so-called minimum area) corresponding to the minimum font size (for example, 6 points) used in a general document, Since it can be determined that the pattern area does not include, such a small area is removed. In a general document, the font size rarely exceeds 24 points, for example. Therefore, when the area is equal to or larger than the maximum font size used in this general document (so-called maximum area), it is determined that characters are not included (that is, a picture area).

絵柄候補部分特定処理部２２は、各連結領域ごとの判定結果を参照して、二値化した絵柄領域特定画像データのうち、絵柄領域でないと判定された連結領域に含まれる黒画素を白画素に設定する。こうして非絵柄領域と背景とが白画素となっている絵柄領域特定画像データが生成される。 The pattern candidate part specification processing unit 22 refers to the determination result for each connected area, and among the binarized pattern area specifying image data, the black pixel included in the connected area determined not to be the pattern area is set as the white pixel. Set to. In this way, image area specifying image data in which the non-image area and the background are white pixels is generated.

次に絵柄候補部分特定処理部２２は、絵柄部分の境界に連結している白画素を、所定の値（白、黒以外の任意の色）で塗りつぶす。ここでは、広く知られているような、閉じた領域（閉曲線で囲まれた内部又は外部）を塗つぶす処理を用いる。 Next, the pattern candidate part specification processing unit 22 paints white pixels connected to the boundary of the pattern part with a predetermined value (any color other than white and black). Here, a process of painting a closed region (inside or outside surrounded by a closed curve) as is widely known is used.

そして絵柄候補部分特定処理部２２は、上記所定の値で塗潰されなかった画素の色を黒に設定する。この処理によって黒画素に設定された部分を絵柄候補領域として記憶部１２に格納する。 Then, the pattern candidate portion specifying processing unit 22 sets the color of the pixels that are not painted with the predetermined value to black. A portion set as a black pixel by this processing is stored in the storage unit 12 as a pattern candidate region.

［３．文字抽出処理部］
文字抽出処理部２３は、前処理部２１が出力するＹＣｂＣｒ色空間で表現された画像データから文字部分を抽出する処理を行う。この処理は、例えば特許文献２に開示されている通りの処理を用いることができる。 [3. Character extraction processing unit]
The character extraction processing unit 23 performs a process of extracting a character part from the image data expressed in the YCbCr color space output from the preprocessing unit 21. For this processing, for example, processing as disclosed in Patent Document 2 can be used.

そして文字抽出処理部２３は、個々の文字についてそれらを構成する画素に外接する矩形（基本矩形）の座標情報を記憶部１２に格納する。この情報が文字部分を特定する情報となる。 The character extraction processing unit 23 stores coordinate information of rectangles (basic rectangles) circumscribing the pixels constituting the individual characters in the storage unit 12. This information is information for specifying the character portion.

［４．レイアウト処理部］
レイアウト処理部２４は、絵柄候補部分特定処理部２２が生成した絵柄候補領域画定情報と、文字抽出処理部２３が生成した文字外接矩形の座標情報とを記憶部１２から読み出す。 [4. Layout processing section]
The layout processing unit 24 reads, from the storage unit 12, the pattern candidate area definition information generated by the pattern candidate part specifying processing unit 22 and the coordinate information of the character circumscribing rectangle generated by the character extraction processing unit 23.

レイアウト処理部２４は、この絵柄候補領域画定情報によって画定される絵柄候補領域と、文字外接矩形の座標情報で画定される文字の領域とのそれぞれに対して互いに異なる判定条件を用いてレイアウト解析を行う。 The layout processing unit 24 performs layout analysis using different determination conditions for each of the pattern candidate area defined by the pattern candidate area definition information and the character area defined by the coordinate information of the character circumscribed rectangle. Do.

具体的に本実施の形態においては、記憶部１２から読出した絵柄候補領域画定情報によって画定される絵柄候補領域においてレイアウト解析処理を行い、当該絵柄候補領域の中からさらに文字部分の抽出を試みる。そして、抽出された文字部分を除く部分を絵柄領域として画定する。 Specifically, in the present embodiment, layout analysis processing is performed in the pattern candidate area defined by the pattern candidate area definition information read from the storage unit 12, and further character portions are extracted from the pattern candidate area. Then, a portion excluding the extracted character portion is defined as a pattern region.

本実施形態において特徴的なことの一つは、いわゆるＴ／Ｉ分離処理においてレイアウト解析を用いて文字部分を抽出することとしていることである。これによって絵柄候補領域に含まれる文字部分もレイアウト解析処理によって抽出されることとなり、文字部分の抽出精度を向上させることができる。 One characteristic of this embodiment is that a character portion is extracted using layout analysis in so-called T / I separation processing. As a result, the character portion included in the pattern candidate region is also extracted by the layout analysis process, and the accuracy of extracting the character portion can be improved.

一方、レイアウト処理部２４は、文字抽出処理部２３によって文字として画定された基本矩形についても、レイアウト解析処理を行う。そしてレイアウト処理部２４は、当該レイアウト解析処理の結果として得られたレイアウト枠（少なくとも各文字に外接する矩形（基本矩形）を含む）を定め、これらレイアウト枠の情報（座標情報など）を記憶部１２に格納する。 On the other hand, the layout processing unit 24 also performs layout analysis processing on the basic rectangle defined as characters by the character extraction processing unit 23. The layout processing unit 24 determines layout frames (including at least rectangles (basic rectangles) circumscribing each character) obtained as a result of the layout analysis processing, and stores information (coordinate information and the like) of these layout frames. 12.

このレイアウト処理部２４は、記憶部１２に格納された文字部分特定情報と、文字抽出処理部２３が生成した基本矩形（またはそれに対するレイアウト処理結果としてのレイアウト枠）とを参照し、処理対象となった画像データ全体について、画像データ内で文字を含む矩形領域の座標情報を生成する。具体的にレイアウト処理部２４は、絵柄候補となった部分から抽出された文字部分と、文字抽出処理部２３が抽出した文字部分との各情報を統合して、文字領域を画定する情報を生成する。 The layout processing unit 24 refers to the character part specifying information stored in the storage unit 12 and the basic rectangle generated by the character extraction processing unit 23 (or a layout frame as a result of layout processing for the rectangle). For the entire image data, coordinate information of a rectangular area including characters in the image data is generated. Specifically, the layout processing unit 24 integrates each piece of information of the character part extracted from the part that has become the pattern candidate and the character part extracted by the character extraction processing unit 23 to generate information that defines the character region To do.

そしてレイアウト処理部２４は、結合後の各文字部分について固有の領域識別子（以下、ラベルデータと呼ぶ）を生成し、このラベルデータと、対応する文字領域を画定するための座標情報（頂点座標の情報等）とを関連づけて記憶部１２に文字領域データベースとして格納する。 Then, the layout processing unit 24 generates a unique area identifier (hereinafter referred to as label data) for each combined character portion, and coordinates information (vertex coordinate of the vertex coordinates) for defining the label data and the corresponding character area. And the like are stored in the storage unit 12 as a character area database.

また、このレイアウト処理部２４は、絵柄部分や文字部分を取り囲む（つまり、それらを構成する画素に外接する）矩形の情報をレイアウト枠情報として生成して、記憶部１２に格納する。 In addition, the layout processing unit 24 generates rectangular information that surrounds the pattern portion and the character portion (that is, circumscribes the pixels constituting them) as layout frame information, and stores the generated information in the storage unit 12.

［５．同一色領域分離部］
同一色領域分離部２５は、文字領域データベースに格納された文字部分の画像データと、それ以外の部分の画像データとについて、それぞれ色別の領域に分離する処理を行う。 [5. Same color area separation unit]
The same color area separation unit 25 performs a process of separating the image data of the character part stored in the character area database and the image data of the other part into areas of different colors.

［５ａ．文字部分に対する処理］
まず文字部分の画像データに対する処理について説明する。同一色領域分離部２５は、記憶部１２の文字領域データベースに格納されている文字領域の各々について、同一の色の文字部分のみからなる領域に分離する。同一色領域分離部２５は、レイアウト処理の過程で記憶部１２に格納された基本矩形（絵柄候補領域と文字領域との双方から画定された矩形）の座標情報を読み出す。そして、処理対象となっている画像データ（元画像データ）のうち、この座標情報で画定される各基本矩形内に含まれる画素値から代表値（代表色）の候補を決定する。 [5a. Processing for character part]
First, processing for image data of a character part will be described. The same color area separation unit 25 separates each character area stored in the character area database of the storage unit 12 into an area including only character portions of the same color. The same color region separation unit 25 reads the coordinate information of the basic rectangle (rectangle defined from both the pattern candidate region and the character region) stored in the storage unit 12 during the layout process. Then, a representative value (representative color) candidate is determined from pixel values included in each basic rectangle defined by the coordinate information in the image data (original image data) to be processed.

本実施の形態における同一色領域分離部２５は、図３（ａ）に示すように、代表色決定部５１と、同一色領域情報生成部５２と、包含画像作成部５３とを含んで構成されている。なお、以下の説明で基本矩形内の画素は、元画像データ上で基本矩形内に含まれるすべての画素であってもよいし、基本矩形内で、文字を構成する画素として判断される画素であってもよい。 As shown in FIG. 3A, the same color region separation unit 25 in the present embodiment is configured to include a representative color determination unit 51, a same color region information generation unit 52, and an inclusion image creation unit 53. ing. In the following description, the pixels in the basic rectangle may be all pixels included in the basic rectangle on the original image data, or pixels that are determined as pixels constituting the character in the basic rectangle. There may be.

代表色決定部５１は、記憶部１２に格納された基本矩形の座標情報を参照し、それらを順次注目基本矩形として選択しながら、当該注目基本矩形内の画素値に基づいて少なくとも一つの代表画素値を決定する。ここで、代表画素値の決定方法は、元画像データのうち注目基本矩形内の画素値のヒストグラム（発生頻度）を生成し、このヒストグラムにおいて最も高い頻度で出現する一つの画素値、又は所定しきい値（例えば注目基本矩形内の画素数の１／３など）を超える頻度で出現する少なくとも一つの画素値を代表画素値とする方法などがある。代表色決定部５１は、注目基本矩形を特定する情報（注目基本矩形に固有に発行された識別子）に関連づけて決定した代表画素値を代表画素値データベースとして記憶部１２に格納する。 The representative color determination unit 51 refers to the coordinate information of the basic rectangle stored in the storage unit 12 and selects at least one representative pixel based on the pixel value in the target basic rectangle while sequentially selecting them as the target basic rectangle. Determine the value. Here, the representative pixel value is determined by generating a histogram (occurrence frequency) of pixel values in the target basic rectangle in the original image data, and one pixel value appearing at the highest frequency in the histogram, or a predetermined pixel value. There is a method in which at least one pixel value that appears at a frequency exceeding a threshold value (for example, 1/3 of the number of pixels in the target basic rectangle) is used as a representative pixel value. The representative color determination unit 51 stores the representative pixel value determined in association with the information specifying the target basic rectangle (identifier issued uniquely to the target basic rectangle) in the storage unit 12 as a representative pixel value database.

同一色領域情報生成部５２は、代表画素値データベースに格納されている代表画素値を、互いに同一と判断される色ごとにグループ化する。具体的に同一色領域情報生成部５２は、代表画素値データベースに格納されている代表画素値のうち、未だグループ化されていない代表画素値を注目代表画素値とし、この注目代表画素値とは異なる代表画素値であって、未だグループ化されていない代表画素値を順次、比較対照画素値として選択する。そして注目代表画素値と同一の色と判断できる比較対照画素値があった場合には、当該少なくとも一つの比較対照画素値と注目代表画素値とを一つのグループとして決定する。そして同一色領域情報生成部５２は、それらグループとなった代表画素値に関連づけられた基本矩形の識別子を代表画素値データベースから取り出して、基本矩形の識別子のリストを生成し、注目代表画素値と比較対照画素値とに基づいて決定される一つのグループ代表画素値を決定する。同一色領域情報生成部５２はそして、固有のグループ識別子を発行し、このグループ識別子と、グループ代表画素値と、上記生成した基本矩形の識別子のリストとを関連づけて同一色領域情報データベースとして、記憶部１２に格納する。なおグループ代表画素値の決定方法は、例えば注目代表画素値と、それに同一と判断された比較対照画素値との統計演算結果（例えば平均）の画素値などとすることができる。 The same color area information generation unit 52 groups representative pixel values stored in the representative pixel value database for each color determined to be the same. Specifically, the same color area information generation unit 52 sets representative pixel values not yet grouped among representative pixel values stored in the representative pixel value database as target representative pixel values. Different representative pixel values that have not yet been grouped are sequentially selected as comparison pixel values. If there is a comparison pixel value that can be determined to be the same color as the target representative pixel value, the at least one comparison pixel value and the target representative pixel value are determined as one group. Then, the same color area information generation unit 52 extracts the basic rectangle identifiers associated with the representative pixel values in the group from the representative pixel value database, generates a list of basic rectangle identifiers, One group representative pixel value determined based on the comparison pixel value is determined. The same color area information generation unit 52 then issues a unique group identifier, stores this group identifier, the group representative pixel value, and the generated list of basic rectangle identifiers as the same color area information database. Stored in the unit 12. Note that the group representative pixel value determination method may be, for example, a pixel value of a statistical calculation result (for example, an average) between a target representative pixel value and a comparison pixel value determined to be the same.

一方、注目代表画素値と同一の色と判断できる比較対照画素値がなかった場合には、注目代表画素値のみからなるグループを生成する。つまり同一色領域情報生成部５２は、注目代表画素値に関連づけられた基本矩形の識別子を代表画素値データベースから取り出し、注目代表画素値をそのままグループ代表画素値として決定する。そして同一色領域情報生成部５２は固有のグループ識別子を発行し、このグループ識別子と、グループ代表画素値と、上記生成した基本矩形の識別子とを関連づけて、同一色領域情報データベースとして記憶部１２に格納する。 On the other hand, when there is no comparison pixel value that can be determined to be the same color as the representative pixel value of interest, a group including only the representative pixel value of interest is generated. That is, the same color area information generation unit 52 extracts the identifier of the basic rectangle associated with the target representative pixel value from the representative pixel value database, and determines the target representative pixel value as the group representative pixel value as it is. The same color area information generation unit 52 issues a unique group identifier, associates the group identifier, the group representative pixel value, and the generated basic rectangle identifier with each other in the storage unit 12 as the same color area information database. Store.

こうして同一色領域情報生成部５２は、グループ化された代表画素値（グループ代表画素値）ごとに、当該代表画素値（グループ代表画素値）と当該代表画素値に対応した基本領域を画定する情報（ここでは基本領域の識別子）とを関連づけた情報を含む同一色領域情報を生成し、これを記憶部１２に同一色領域情報データベースとして格納することになる。 In this way, the same color area information generation unit 52 defines, for each grouped representative pixel value (group representative pixel value), the representative pixel value (group representative pixel value) and a basic area corresponding to the representative pixel value. The same color area information including the information associated with (here, the identifier of the basic area) is generated and stored in the storage unit 12 as the same color area information database.

包含画像作成部５３は、代表画素値データベースを参照しながら、当該代表画素値データベースに含まれる基本領域の識別子を取り出す。また基本矩形データベースを参照して、上記取り出した各基本領域の識別子に関連づけられている座標情報（各基本領域を画定する情報）を取得する。そして包含画像作成部５３は、これら基本領域を包含する領域を生成する。具体的に包含画像作成部５３は、取得した座標情報が示すＸ，Ｙの座標値のうち、最も小さいＸ座標値Ｘminと、Ｙ座標値Ｙmin、並びに、最も大きいＸ座標値Ｘmaxと、Ｙ座標値Ｙmaxとを検索する。そして、（Ｘmin,Ｙmin）と、（Ｘmax，Ｙmax）を対角線とする矩形（包含される各基本矩形に外接する矩形）を画定する情報を生成する。 The included image creation unit 53 retrieves the identifier of the basic area included in the representative pixel value database while referring to the representative pixel value database. Further, with reference to the basic rectangle database, coordinate information (information defining each basic area) associated with the identifier of each extracted basic area is acquired. Then, the included image creation unit 53 generates a region that includes these basic regions. Specifically, the inclusion image creation unit 53 includes the smallest X coordinate value Xmin, the Y coordinate value Ymin, the largest X coordinate value Xmax, and the Y coordinate among the X and Y coordinate values indicated by the acquired coordinate information. Search for the value Ymax. Then, information is generated that defines a rectangle having a diagonal line (Xmin, Ymin) and (Xmax, Ymax) (a rectangle circumscribing each included basic rectangle).

包含画像作成部５３は、包含している各基本領域内の画素を元画像データから抽出し、これを、例えば元画像データの四隅の画素値から決定される背景画素値に基づいて生成されるしきい値を用いて二値化し、当該二値化された画像データを包含画像データとして生成する。 The inclusion image creation unit 53 extracts the pixels in each included basic region from the original image data, and generates this based on the background pixel values determined from the pixel values at the four corners of the original image data, for example. Binarization is performed using a threshold value, and the binarized image data is generated as inclusion image data.

この包含画像データは、当該包含画像データに包含されている基本領域内の各画素の値を二値化して得たビットマップ情報となっており、従って、包含画像データには少なくともいずれかの代表画素値（又はグループ代表画素値）に設定されるべき画素の位置を示す情報が含まれることになる。なお、包含画像作成部５３は、この二値化して得たビットマップ情報など、上記画素の位置を示す情報を所定の方法（例えばＭＭＲやＪＢＩＧ２等）で圧縮処理してもよい。 This inclusion image data is bitmap information obtained by binarizing the value of each pixel in the basic region included in the inclusion image data. Therefore, the inclusion image data includes at least one representative. Information indicating the position of the pixel to be set in the pixel value (or group representative pixel value) is included. The inclusion image creation unit 53 may compress information indicating the position of the pixel, such as bitmap information obtained by binarization, by a predetermined method (for example, MMR, JBIG2, etc.).

また、制御部１１の同一色領域分離部２５は、文字部分についての処理の別の例として、図３（ｂ）に示す処理を行ってもよい。すなわちこの処理は、代表色決定部５１と、同一色領域情報生成部５２と、包含画像作成部５３と、色別包含画像作成部５４と、比較部５５とを含んで機能的に構成される。ここで代表色決定部５１と、同一色領域情報生成部５２と、包含画像作成部５３とは既に説明したものと同様の動作を行う。 Further, the same color region separation unit 25 of the control unit 11 may perform the process shown in FIG. 3B as another example of the process for the character portion. In other words, this processing is functionally configured including a representative color determination unit 51, the same color area information generation unit 52, an inclusion image creation unit 53, a color-specific inclusion image creation unit 54, and a comparison unit 55. . Here, the representative color determination unit 51, the same color area information generation unit 52, and the included image creation unit 53 perform the same operations as those already described.

色別包含画像作成部５４は、グループごとの包含画像データを生成する。比較部５５は、これらグループごとの包含画像データ（色別包含領域）を含んだ文字プレーンデータ（第１画像データ）と、上記包含画像データと同一色領域情報データベースとを含む文字プレーンデータ（第２画像データ）とを比較し、第１画像データと第２画像データとのうちいずれかサイズの小さい方を選択して、選択したデータを文字プレーンデータとして出力する。 The inclusion image creation unit 54 for each color generates inclusion image data for each group. The comparison unit 55 includes character plane data (first image data) including the inclusion image data (inclusion area for each color) for each group, and character plane data (first image data including the same color area information database as the inclusion image data). 2 image data), the smaller one of the first image data and the second image data is selected, and the selected data is output as character plane data.

また、同一色領域情報生成部５２において、２つの代表画素値が同一と判断できるか否かを決定する方法は、２つの代表画素値の成分ごとの差の二乗和（代表画素値間の所定色空間上での距離に関する量）が予め定められたしきい値より小さい場合に同一と判断することとすればよい。 In addition, the same color area information generation unit 52 determines whether two representative pixel values can be determined to be the same as the sum of squares of differences of the two representative pixel values for each component (predetermined value between representative pixel values). If the amount relating to the distance in the color space is smaller than a predetermined threshold value, it may be determined that they are the same.

次に、図４を参照しながら、本実施の形態の同一色領域分離部２５の動作を説明する。ここでは具体的に図４（ａ）に示すような画像を例として説明する。図４（ａ）に示す例では、第１行目の文字列「昨日は雨でした」のうち、「昨日」の文字が赤文字で、「は雨でした」の文字が黒文字で表され、次の行の文字列「今日は曇です」の全体が黒文字で表されている。また、この２行の文字列とは離れた位置に「お天気の話でした」との文字列があり、ここで「話」の文字のみが赤文字となっている例を示している。なお、図４（ａ）では図示の都合上、赤色文字部分を破線で囲んで示すこととしている。この囲み破線は現実に表されるものではない。 Next, the operation of the same color region separation unit 25 of the present embodiment will be described with reference to FIG. Here, an image as shown in FIG. 4A will be specifically described as an example. In the example shown in FIG. 4A, in the character string “Yesterday was rainy” on the first line, the letters “Yesterday” were red characters and the characters “Has rainy” were represented by black characters. , The whole string of the next line “Today is cloudy” is represented in black. Further, an example is shown in which there is a character string “It was a story about the weather” at a position apart from the character strings of the two lines, and only the characters “story” are red characters. In FIG. 4A, for convenience of illustration, the red character portion is surrounded by a broken line. This enclosed broken line is not actually represented.

この場合、レイアウト処理部２４の処理等によって個々の文字の画素塊に外接する矩形（基本矩形）が画定され、代表色決定部５１が、各基本矩形ごとに代表色を決定する。なお、この代表色決定部５１によって決定される代表色は、個々の文字、例えば上記「昨日」の「昨」と「日」との２つの文字が元々同じ色（画素値）に設定されていたとしても、同一の画素値になるとは限らず、距離は比較的近いが互いに異なる代表画素値が決定されてもよい。つまり、この代表色決定部５１によって、「昨」と「日」、並びに「話」の３つの文字に外接する基本矩形について赤色に近い代表画素値が決定され、他の文字に外接する各基本矩形については、黒色に近い代表画素値が決定される。 In this case, a rectangle (basic rectangle) circumscribing the pixel block of each character is defined by the processing of the layout processing unit 24 and the like, and the representative color determining unit 51 determines a representative color for each basic rectangle. The representative color determined by the representative color determining unit 51 is set to the same color (pixel value) for each character, for example, two characters “Yesterday” and “Day” of “Yesterday”. However, the pixel values are not always the same, and representative pixel values that are relatively close to each other but different from each other may be determined. That is, the representative color determining unit 51 determines a representative pixel value close to red for the basic rectangle circumscribing the three characters “Yesterday”, “Day”, and “Story”, and each basic character circumscribing other characters. For the rectangle, a representative pixel value close to black is determined.

同一色領域情報生成部５２は、同一と判断される代表画素値に関連づけられる基本矩形をグループとして定める。これにより「昨」と「日」、並びに「話」の３つの文字に外接する基本矩形を画定する情報に一つのグループ代表画素値（これもまた、赤色に近い値として定められる）が関連づけてグループ化され、他の文字に外接する基本矩形を画定する情報に一つのグループ代表画素値（これもまた、黒色に近い値として定められる）が関連づけてグループ化される。 The same color area information generation unit 52 defines basic rectangles associated with representative pixel values determined to be the same as a group. As a result, one group representative pixel value (also defined as a value close to red) is associated with the information defining the basic rectangle circumscribing the three characters “Yesterday”, “Day”, and “Story”. One group representative pixel value (also defined as a value close to black) is associated with information that defines a basic rectangle that is grouped and circumscribes other characters, and is grouped.

包含画像作成部５３は、（グループに関わらず）対象となっている基本矩形のすべてを包含する矩形を生成し、各基本矩形内の画素値を二値化した画像データを包含画像データ（第２画像データとなる）として生成する（図４（ｂ））。 The inclusion image creation unit 53 generates a rectangle that includes all of the target basic rectangles (regardless of the group), and converts the pixel value in each basic rectangle into binary image data (first image). 2 image data) (FIG. 4B).

そして基本矩形を画定する情報とグループ代表画素値との組を少なくとも一つ含んでなる同一色領域情報データベースと、包含画像データとを関連づけて、文字プレーンデータとして出力する。 Then, the same color region information database including at least one set of information defining the basic rectangle and the group representative pixel value is associated with the included image data and output as character plane data.

一方、色別包含画像作成部５４はグループごとに包含画像データ（色別包含画像データ）を生成する（図４（ｃ））。ここで色別包含画像データ内の画素値は二値化しなくてもよい。また、二値化する場合は、色別包含画像データに対応するグループのグループ代表画素値を関連づけて文字プレーンデータに含める。そして比較部５５が、この色別包含画像作成部５４が生成する文字プレーンデータ（第１画像データ）と、第２画像データのうち、データのサイズの小さい画像データを選択的に出力し、この出力結果が文字プレーンデータとして記憶部１２に格納される。 On the other hand, the color-specific inclusion image creation unit 54 generates inclusion image data (color-specific inclusion image data) for each group (FIG. 4C). Here, the pixel values in the color-by-color inclusion image data need not be binarized. In the case of binarization, the group representative pixel values of the group corresponding to the color-specific inclusion image data are associated and included in the character plane data. Then, the comparison unit 55 selectively outputs image data having a small data size among the character plane data (first image data) generated by the color-specific inclusion image creation unit 54 and the second image data. The output result is stored in the storage unit 12 as character plane data.

［５ｂ．文字以外の領域に対する処理］
次に、文字以外の領域に対する処理について説明する。同一色領域分離部２５の文字以外の領域に対する処理は、機能的には、図５に示すように、前処理部６０と、エッジ抽出部６１と、線分抽出部６２と、線分画像要素特定部６３と、線分画像要素除去部６４と、第１色抽出部６５と、第２色抽出部６６と、限定色化部６７と、領域統合部６８とを含んで構成される。 [5b. Processing for non-character areas]
Next, processing for a region other than characters will be described. As shown in FIG. 5, the processing for the area other than the characters in the same color area separation unit 25 is functionally performed as shown in FIG. The identification unit 63, the line segment image element removal unit 64, the first color extraction unit 65, the second color extraction unit 66, the limited colorization unit 67, and the region integration unit 68 are configured.

前処理部６０は、記憶部１２から文字プレーンデータを読出す。そして元画像データから文字プレーンデータに含まれる画素を除去し、文字以外の領域の画像データを生成する。さらに、この前処理部６０は、記憶部１２から絵柄部分や線画部分を取り囲む矩形情報（レイアウト枠情報）を読み出す。そして、文字以外の領域の画像データからレイアウト枠情報によって画定される矩形内の画像データを取り出し、処理対象の画像データとして記憶部１２に格納する。 The preprocessing unit 60 reads character plane data from the storage unit 12. Then, the pixels included in the character plane data are removed from the original image data, and image data of an area other than the characters is generated. Further, the preprocessing unit 60 reads out rectangular information (layout frame information) surrounding the pattern portion and the line drawing portion from the storage unit 12. Then, the image data in the rectangle defined by the layout frame information is extracted from the image data of the area other than the character, and is stored in the storage unit 12 as the image data to be processed.

これにより、レイアウト枠で取り囲まれる絵柄部分や線画部分の画像であって、文字部分を除去した画像が生成される。例えば図６（ａ）に示すようなグラフについては、文字が取り除かれた状態（図６（ｂ））の画像が処理対象の画像データとなって記憶部１２に格納される。 As a result, an image of the pattern portion and the line drawing portion surrounded by the layout frame, which is the character portion removed, is generated. For example, for a graph as shown in FIG. 6A, an image in which characters are removed (FIG. 6B) is stored in the storage unit 12 as image data to be processed.

エッジ抽出部６１は、記憶部１２に格納されている処理対象の画像データから互いに隣接する画素のうち、最大輝度と最小輝度の画素の差を輝度差として演算し、さらにこの輝度差の標準偏差を演算してエッジを抽出する。線分抽出部６２は、記憶部１２に格納されている処理対象の画像データから互いに隣接する画素値の差が所定の閾値以上となる部分を検出し、線分を抽出する。これらエッジ抽出の処理及び線分抽出の処理についてはいずれも広く知られているので、詳細な説明を省略する。 The edge extraction unit 61 calculates a difference between the maximum luminance and the minimum luminance among the pixels adjacent to each other from the processing target image data stored in the storage unit 12 as a luminance difference, and further calculates a standard deviation of the luminance difference. To extract an edge. The line segment extraction unit 62 detects a portion where the difference between adjacent pixel values is equal to or greater than a predetermined threshold from the image data to be processed stored in the storage unit 12 and extracts a line segment. Since both the edge extraction process and the line segment extraction process are widely known, a detailed description thereof will be omitted.

線分画像要素特定部６３は、エッジとして抽出された画素と線分として抽出された画素とを含む画素群を特定する情報を生成して、線分画像要素データとして出力する。例えば図６（ｂ）の例から、そのエッジ部分や線分を構成する画素を抽出すると図６（ｃ）に示すような画素群が得られる。この画素群が線分画像要素データであり、本発明の第１部分画像データに相当する。 The line segment image element specifying unit 63 generates information for specifying a pixel group including pixels extracted as edges and pixels extracted as line segments, and outputs the information as line segment image element data. For example, from the example shown in FIG. 6B, when pixels constituting the edge portion or line segment are extracted, a pixel group as shown in FIG. 6C is obtained. This pixel group is line segment image element data and corresponds to the first partial image data of the present invention.

線分画像要素除去部６４は、前処理部６０が出力する画像データから線分画像要素データに含まれる画素群を除去した画像データ（線分画像要素除去データ）を生成する。この線分画像要素除去データは、本発明の第２部分画像データに相当するものであり、具体的には図６（ｄ）に示すようなものとなる。 The line segment image element removal unit 64 generates image data (line segment image element removal data) obtained by removing the pixel group included in the line segment image element data from the image data output from the preprocessing unit 60. This line segment image element removal data corresponds to the second partial image data of the present invention, and is specifically as shown in FIG.

第１色抽出部６５は、線分画像要素データに含まれている各画素の画素値から代表色を抽出する。具体的にこの第１色抽出部６５は、図７に示すように、色数判定部７１と、代表色決定部７２とを含む。色数判定部７１は、線分画像要素データに含まれている各画素の画素値をメジアン・カット（中央値分割法）等の方法で限定色化して、限定色化後の色数を出力する。またこの色数判定部７１は、各画素の画素値を色空間に配置し、当該色空間におけるクラスタリング処理を行って、色空間を複数の部分に分割し、その分割数を色数として判断するようにしてもよい。こうして得られた色数の値は、有意な色の数を表すものであり、本発明の画素値の値の有意個数に相当する。 The first color extraction unit 65 extracts a representative color from the pixel value of each pixel included in the line segment image element data. Specifically, the first color extraction unit 65 includes a color number determination unit 71 and a representative color determination unit 72, as shown in FIG. The number-of-colors determination unit 71 color-limits the pixel values of each pixel included in the line segment image element data by a method such as median cut (median value division method), and outputs the number of colors after the color limitation To do. The color number determination unit 71 arranges pixel values of each pixel in the color space, performs clustering processing in the color space, divides the color space into a plurality of parts, and determines the number of divisions as the number of colors. You may do it. The value of the number of colors thus obtained represents the number of significant colors and corresponds to the number of significant pixel values of the present invention.

代表色決定部７２は、色数判定部７１が出力する色数が、予め定めたしきい値（第１しきい値）を越えない場合に、当該色数に相当する数の代表色の情報を出力する。例えば色数判定部７１がメジアン・カットによって限定色化した場合には、当該限定色化で得られた各画素値をそのまま代表色を表す情報として出力する。またクラスタリングを用いる場合は、分割された各部分ごとに、当該部分に含まれる画素値の重心を演算し、当該重心に対応する色を代表色の情報として出力する。 When the number of colors output from the color number determination unit 71 does not exceed a predetermined threshold value (first threshold value), the representative color determination unit 72 has information on the number of representative colors corresponding to the number of colors. Is output. For example, when the number-of-colors determination unit 71 performs limited color by median cut, each pixel value obtained by the limited color is output as information representing the representative color as it is. When clustering is used, for each divided part, the centroid of the pixel value included in the part is calculated, and the color corresponding to the centroid is output as representative color information.

第２色抽出部６６は、第１色抽出部６５と同様にして、線分画像要素除去データに含まれている各画素の画素値から代表色を抽出する。すなわち、この第２色抽出部６６も、線分画像要素データに含まれている各画素の画素値をメジアン・カット（中央値分割法）等の方法で限定色化して、限定色化後の色数をカウントする。そしてこの色数が、予め定めたしきい値（第２しきい値）を越えない場合に、上記限定色化で得られた各画素値をそのまま代表色を表す情報として出力する。 Similar to the first color extraction unit 65, the second color extraction unit 66 extracts a representative color from the pixel values of each pixel included in the line segment image element removal data. In other words, the second color extraction unit 66 also limits the pixel value of each pixel included in the line segment image element data by a method such as median cut (median value division method), and the color after the limited color is obtained. Count the number of colors. When the number of colors does not exceed a predetermined threshold value (second threshold value), each pixel value obtained by the limited color is output as information representing a representative color as it is.

限定色化部６７は、線分画像要素データに含まれている各画素の画素値と、第１色抽出部６５が出力する各代表値の情報とを比較する。そして、線分画像要素データの各画素の画素値を、第１色抽出部６５が出力する代表値のうち、それぞれの画素値に最も近い代表値にそれぞれ設定した画像データ（第１限定色化画像データ）を生成する。また、この限定色化部６７は、線分画像要素除外データに含まれている各画素の画素値と、第２色抽出部６６が出力する各代表値の情報とを比較する。そして、線分画像要素除外データの各画素の画素値を、第２色抽出部６６が出力する代表値のうち、それぞれの画素値に最も近い代表値にそれぞれ設定した画像データ（第２限定色化画像データ）を生成する。 The limited colorization unit 67 compares the pixel value of each pixel included in the line segment image element data with the information on each representative value output from the first color extraction unit 65. Then, image data (first limited colorization) in which the pixel value of each pixel of the line segment image element data is set to the representative value closest to each pixel value among the representative values output by the first color extraction unit 65. Image data). Further, the limited colorization unit 67 compares the pixel value of each pixel included in the line segment image element exclusion data with the information of each representative value output from the second color extraction unit 66. Then, the image data (second limited color) in which the pixel value of each pixel of the line segment image element exclusion data is set to the representative value closest to each pixel value among the representative values output by the second color extraction unit 66. Generated image data).

また、第１色抽出部６５の代表色決定部７２は、色数判定部７１が出力する色数が、予め定めたしきい値（第１しきい値）を越える場合は、限定色化不能と判断する。この場合、限定色化部６７は、線分画像要素データをそのまま出力し、限定色化を行わない。 In addition, the representative color determination unit 72 of the first color extraction unit 65 cannot perform the limited color when the number of colors output from the color number determination unit 71 exceeds a predetermined threshold value (first threshold value). Judge. In this case, the limited colorization unit 67 outputs the line segment image element data as it is and does not perform limited colorization.

同様に線分画像要素除外データに含まれる色数が、予め定めたしきい値（第２しきい値）を越える場合は、第２色抽出部６６は、限定色化不能と判断する。このとき限定色化部６７は、線分画像要素除外データをそのまま出力し、限定色化を行わない。 Similarly, when the number of colors included in the line segment image element exclusion data exceeds a predetermined threshold value (second threshold value), the second color extraction unit 66 determines that the limited color cannot be obtained. At this time, the limited color unit 67 outputs the line segment image element exclusion data as it is, and does not perform limited color.

さらにこのとき、第１色抽出部６５は線分画像要素データの内部に部分的に限定色化可能な領域が含まれるか否かを判断してもよい。この判断は例えば、ヒストグラムを演算して、所定のしきい値を越える画素値について略同色と判断できる（例えばその画素値の差が±１６程度の場合など）画素値の数をカウントし、この数が、線分画像要素データ内の画素数に対して所定割合を越えるか否かなどで判断する。そして、部分的に限定色化可能な領域が含まれるならば、第１色抽出部６５は当該部分を特定する情報と、当該部分についての代表色の値を出力する。この場合限定色化部６７は、線分画像要素データのうち、当該特定される部分の色を上記代表色に限定する処理を行う。 Further, at this time, the first color extraction unit 65 may determine whether or not the line segment image element data includes an area that can be partially limited in color. For this determination, for example, a histogram is calculated, and pixel values exceeding a predetermined threshold value can be determined to be substantially the same color (for example, when the difference between the pixel values is about ± 16). Judgment is made based on whether or not the number exceeds a predetermined ratio with respect to the number of pixels in the line segment image element data. If an area that can be partially limited in color is included, the first color extraction unit 65 outputs information for specifying the part and a representative color value for the part. In this case, the limited colorization unit 67 performs processing for limiting the color of the specified portion of the line segment image element data to the representative color.

また第２色抽出部６６も同様に、線分画像要素除外データの内部に部分的に限定色化可能な領域が含まれるか否かを判断してもよい。そして、部分的に限定色化可能な領域が含まれるならば、第２色抽出部６６は当該部分を特定する情報と、当該部分についての代表色の値を出力する。この場合限定色化部６７は、線分画像要素除外データのうち、当該特定される部分の色を上記代表色に限定する処理を行う。 Similarly, the second color extraction unit 66 may determine whether or not a region that can be partially limited in color is included in the line segment image element exclusion data. If an area that can be partially limited in color is included, the second color extraction unit 66 outputs information for specifying the part and a representative color value for the part. In this case, the limited colorization unit 67 performs processing for limiting the color of the specified portion of the line segment image element exclusion data to the representative color.

領域統合部６８は、第１限定色化画像データと第２限定色化画像データとに含まれる画素のうち、互いに同一色と判断される色ごとの画素を含んでなる、複数の色別画像データを生成する。すなわち領域統合部６８は、第１色抽出部６５が出力する代表色の情報群Ｃ１ａ，Ｃ１ｂ…と、第２色抽出部６６が出力する代表色の情報群Ｃ２ａ，Ｃ２ｂ…とを比較する。そして、互いに同一色と判断される代表色を検索する。例えば、Ｃ１ａ，Ｃ１ｂ…の各々を順次注目代表色Ｃ１ｘとして選択する。そして注目代表色Ｃ１ｘと、Ｃ２ａ，Ｃ２ｂ…の各々との色空間上の距離（色空間上の各成分同士の差の二乗和）を演算する。ここで当該距離が所定の同一色判定しきい値を下回る代表色Ｃ２ｙがあれば、注目代表色Ｃ１ｘと当該代表色Ｃ２ｙとを互いに同一色と判断する。 The region integration unit 68 includes a plurality of color-specific images including pixels for each color determined to be the same color among the pixels included in the first limited color image data and the second limited color image data. Generate data. That is, the region integration unit 68 compares the representative color information groups C1a, C1b... Output from the first color extraction unit 65 with the representative color information groups C2a, C2b. Then, representative colors that are determined to be the same color are searched. For example, each of C1a, C1b,... Is sequentially selected as the target representative color C1x. Then, the distance on the color space between the target representative color C1x and each of C2a, C2b... (The sum of squares of the differences between the components on the color space) is calculated. If there is a representative color C2y whose distance is less than a predetermined same color determination threshold value, the representative color C1x of interest and the representative color C2y are determined to be the same color.

次に領域統合部６８は、当該同一色と判断された代表色Ｃ１ｘの画素を第１限定色化画像データから分離する。同様に代表色Ｃ２ｙの画素を第２限定色化画像データから分離する。そして、これら第１限定色化画像データと第２限定色化画像データから分離された画素を含んでなる色別画像データを生成して出力する。領域統合部６８は、この処理を同一色と判断された各代表色のペアごとに繰り返して行い、色別画像データを生成する。 Next, the region integration unit 68 separates the pixel of the representative color C1x determined to be the same color from the first limited color image data. Similarly, the pixel of the representative color C2y is separated from the second limited color image data. Then, color-specific image data including pixels separated from the first limited color image data and the second limited color image data is generated and output. The region integration unit 68 repeats this process for each representative color pair determined to have the same color, and generates color-specific image data.

またこの処理により第１限定色化画像データと第２限定色化画像データからは、各画像データに含まれ互いに同一色と判断される画素が除去されることになる。例えば第１限定色化画像データにＣ１ａとＣ１ｂとの画素値の画素が含まれるとし、第２限定色化画像データにＣ２ａとＣ２ｂとの画素値の画素が含まれるとする。ここで上記処理によってＣ１ａとＣ２ｂとが互いに同一色であると判断されたときには、第１限定色化画像データのうちＣ１ａの画素と、第２限定色化画像データのうちＣ２ｂの画素とを合成した画像データが色別画像データとして生成される。また、第１限定色化画像データには、Ｃ１ｂの画素が残存することになる。同様に、第２限定色化画像データには、Ｃ２ａの画素が残存することになる。 Further, by this process, pixels included in each image data and determined to be the same color are removed from the first limited color image data and the second limited color image data. For example, assume that the first limited color image data includes pixels having pixel values of C1a and C1b, and the second limited color image data includes pixels having pixel values of C2a and C2b. When it is determined that C1a and C2b are the same color by the above processing, the C1a pixel of the first limited color image data and the C2b pixel of the second limited color image data are combined. The processed image data is generated as color-specific image data. Further, the C1b pixel remains in the first limited color image data. Similarly, C2a pixels remain in the second limited color image data.

なお、ここでは元画像データ内から複数抽出されたレイアウト枠の一つから得られる第１限定色化画像データと第２限定色化画像データとの間で互いに同一と判断される色を見いだしている。このため、かかる色がある場合に、当該レイアウト枠内で当該見いだされた色の画素を抽出して色別画像データを生成している。しかし、色別画像データの生成方法としてはこれに限られるものではない。すなわち、すべてのレイアウト枠に関する複数の第１限定色化画像データと複数の第２限定色化画像データとの全体から、互いに同一色と判断される色ごとの画素を含んでなる色別画像データを生成することとしてもよい。 It should be noted that here, a color determined to be the same between the first limited color image data and the second limited color image data obtained from one of the plurality of layout frames extracted from the original image data is found. Yes. For this reason, when there is such a color, the pixel of the color found in the layout frame is extracted to generate color-specific image data. However, the color image data generation method is not limited to this. That is, color-specific image data including pixels for each color determined to be the same color from the entirety of the plurality of first limited color image data and the plurality of second limited color image data regarding all layout frames. It is good also as producing | generating.

さらに領域統合部６８は、文字プレーンデータ内に色別包含画像データが含まれる場合は、次のようにしてもよい。すなわち、領域統合部６８は、この色別包含画像データに関連づけられた色の情報と、上記色別画像データに係る代表色とを比較する。そして互いに同一の色と判断できる場合（色空間内の距離で判断すればよい）には、当該色別包含画像データに、同一の色と判断された色別画像データを合成する。 Furthermore, the region integration unit 68 may perform the following when the character-by-color inclusion image data is included in the character plane data. That is, the region integration unit 68 compares the color information associated with the color-specific inclusion image data with the representative color related to the color-specific image data. If the colors can be determined to be the same color (determined based on the distance in the color space), the color-specific image data determined to be the same color is combined with the color-specific inclusion image data.

こうして同一色領域分離部２５は、文字以外の領域からは、色別画像データ（もしあれば）や、レイアウト枠ごとの第１限定色化画像データ（又は線分画像要素データ）と第２限定色化画像データ（又は線分画像要素除外データ）とを生成することとなる。以下、これらの画像データを線分絵柄プレーンデータと呼ぶ。 In this way, the same color area separation unit 25 determines the image data for each color (if any), the first limited color image data (or line segment image element data) for each layout frame, and the second limit from areas other than characters. Colored image data (or line segment image element exclusion data) is generated. Hereinafter, these image data are referred to as line segment pattern plane data.

制御部１１は、文字部分の領域から得られる包含画像データと、同一色領域情報データベースと、包含画像データの元の画像データ上での位置を表す座標情報とを取り出す。そして、これらの情報を含むデータを文字線画プレーンデータとして出力する。このように生成された文字線画プレーンデータから元の画像データを再現する際には、まず圧縮されたデータを伸長して包含画像データのビットマップを再現し、同一色領域情報データベースに含まれているグループごとに、当該グループ内の基本矩形に相当する上記再現したビットマップ上の黒画素（オンとなっている画素）の画素値を特定し、当該特定した画素の値を当該グループのグループ代表画素値に設定すればよい。 The control unit 11 extracts inclusion image data obtained from the character part area, the same color area information database, and coordinate information indicating the position of the inclusion image data on the original image data. Data including these pieces of information is output as character / line drawing plane data. When reproducing the original image data from the generated character / line drawing plane data, the compressed data is first decompressed to reproduce the bitmap of the included image data, which is included in the same color area information database. For each group, the pixel value of a black pixel (on pixel) on the reproduced bitmap corresponding to the basic rectangle in the group is specified, and the value of the specified pixel is used as the group representative of the group What is necessary is just to set to a pixel value.

なお、ここでは同一色領域情報データベースに含まれる基本矩形を画定する座標情報は、元の画像データ上の位置を表すものであってもよいし、包含画像データ内での位置を表すもの（例えば包含画像データの右下座標からの相対位置を表す座標情報）に変換されてもよい。 Here, the coordinate information that defines the basic rectangle included in the same color region information database may represent a position on the original image data, or represents a position in the included image data (for example, Coordinate information representing a relative position from the lower right coordinates of the included image data).

また同一色領域情報データベースには、必ずしもすべての基本矩形を画定する情報が含まれなくてもよい。例えば各グループのうち、含まれる基本矩形の数が最大となっているグループ（最大グループと呼ぶ）のグループ代表画素値を図４（ｂ）に示した包含画像データに関連づけ、当該最大グループの情報を同一色領域情報データベースから除去してもよい。 The same color area information database does not necessarily include information that defines all basic rectangles. For example, among the groups, the group representative pixel value of the group having the maximum number of basic rectangles (referred to as the maximum group) is associated with the included image data shown in FIG. May be removed from the same color area information database.

この場合、元の画像データを生成する側ではまず、包含画像データのビットマップを再現して、その内部の黒画素（オンとなっている画素）の画素値を、包含画像データに関連づけられたグループ代表画素値（最大グループのグループ代表画素値）に設定し、次いで、同一色領域情報データベース内の各グループごとに、当該ビットマップ上で各グループに含まれる基本矩形に相当する領域内でオンとなっている画素について、各グループのグループ代表画素値に設定し直すことになる。 In this case, the original image data generation side first reproduces the bitmap of the inclusive image data, and associates the pixel values of the black pixels (pixels that are turned on) with the inclusive image data. Set to the group representative pixel value (the group representative pixel value of the largest group), and then turn on for each group in the same color area information database in the area corresponding to the basic rectangle included in each group on the bitmap. For the pixels that are, the group representative pixel value of each group is reset.

また、ここでは最大グループのグループ代表画素値とすることとしたが、予め定めた色（例えば黒色）に最も近いグループ代表画素値を図４（ｂ）に示した包含画像データに関連づけ、当該グループ代表画素値に関するグループの情報を同一色領域情報データベースから除去してもよい。 In this example, the group representative pixel value of the maximum group is set, but the group representative pixel value closest to a predetermined color (for example, black) is associated with the inclusion image data shown in FIG. The group information related to the representative pixel value may be removed from the same color area information database.

さらに、基本矩形内の画素値がばらついている場合に配慮して、同一色領域分離部２５は、事前に平滑化処理を行ってから代表画素値を決定する処理等を行ってもよい。ここで平滑化処理としては、各基本矩形内の各画素を順次注目画素として特定し、注目画素の値とそれに隣接する画素の値との平均値を注目画素の値とする処理などがある。 Furthermore, in consideration of the case where the pixel values in the basic rectangle vary, the same color region separation unit 25 may perform a process of determining a representative pixel value after performing a smoothing process in advance. Here, the smoothing process includes a process of sequentially specifying each pixel in each basic rectangle as a target pixel, and setting an average value of the value of the target pixel and the value of a pixel adjacent thereto as the value of the target pixel.

さらにこの平滑化処理の際に、基本矩形内で文字を構成する画素（例えば二値化処理により黒画素となる部分）のみを注目画素として選択してもよい。また平滑化の処理において、平均値を演算する際は当該文字を構成する画素の値のみを参照して平均値を演算することとしてもよい。これにより、文字以外の部分の画素値を参照することにより、文字の代表色が背景色に影響されることが防止される。 Furthermore, at the time of this smoothing process, only the pixels constituting the character in the basic rectangle (for example, the portion that becomes a black pixel by the binarization process) may be selected as the target pixel. In the smoothing process, when calculating the average value, the average value may be calculated with reference to only the values of the pixels constituting the character. This prevents the representative color of the character from being influenced by the background color by referring to the pixel value of the portion other than the character.

ここで平滑化処理後に代表値を決定した場合、当該代表値について補正を行ってもよい。ここで補正は、例えば図８に示すような輝度に対するトーンカーブ（補正関数）を用いて行うことができる。この図８に示すトーンカーブは、入力値（補正前の代表値の輝度）が最小値ＭＩＮから第１しきい値ＴＨ１までに対する出力値（補正後の値）が最小値ＭＩＮであり、第２しきい値ＴＨ２（ただしＴＨ２＞ＴＨ１）から、最大値ＭＡＸまでに対する出力値が最大値ＭＡＸであるように設定されている。また、このトーンカーブは、入力値が最大値ＭＡＸと最小値ＭＩＮとの間の中央の値ＭＩＤ（例えば最大値が「２５５」であり最小値が「０」であるときにはＭＩＤは「１２８」となる）であるときに、これに対する出力値が略ＭＩＤとなるように設定されてもよい。 Here, when the representative value is determined after the smoothing process, the representative value may be corrected. Here, the correction can be performed using, for example, a tone curve (correction function) for luminance as shown in FIG. In the tone curve shown in FIG. 8, the output value (corrected value) for the input value (representative luminance before correction) from the minimum value MIN to the first threshold value TH1 is the minimum value MIN. The output value from the threshold value TH2 (TH2> TH1) to the maximum value MAX is set to be the maximum value MAX. The tone curve has a center value MID between the maximum value MAX and the minimum value MIN (for example, when the maximum value is “255” and the minimum value is “0”, the MID is “128”). The output value for this may be set to be substantially MID.

つまり同一色領域分離部２５は、代表値の輝度成分（Ｙ）について、図８のトーンカーブにより代表値の輝度（Ｙ′）を補正し、このＹ′と、代表値の候補の色差成分Ｃｂ，Ｃｒとによって特定される値を補正後の代表値として出力する。 That is, the same color region separation unit 25 corrects the luminance (Y ′) of the representative value with respect to the luminance component (Y) of the representative value by the tone curve of FIG. 8, and Y ′ and the color difference component Cb of the representative value candidate , Cr are output as corrected representative values.

さらに、ここでは輝度のみを補正したが、色差成分についても補正を行ってもよい。具体的に同一色領域分離部２５は、代表値の各色差成分が所定の条件を満足しているときに、当該代表色の輝度成分値に関する階調数を低減する補正を行い、当該補正後の値を代表値として決定してもよい。具体的には、図９に示すように、Ｌ＊ａ＊ｂの色空間で表現された代表値の色差成分（ａ＊、ｂ＊）が、それぞれ対応する色差成分の値域の中心値からの所定範囲内（図９のＴＨａ，ＴＨｂで画定される円の内部）にあるとの条件を満足している場合に、例えば２５６階調で表現された輝度成分Ｌを４階調または８階調など所定階調に低減する。この場合、色差成分の値を上記中心値に設定してもよい。ここで、各成分ごとの所定範囲ＴＨａ，ＴＨｂは、同じ値であってもよいし、異なる値であってもよい。 Further, although only the luminance is corrected here, the color difference component may also be corrected. Specifically, the same color region separation unit 25 performs correction to reduce the number of gradations related to the luminance component value of the representative color when each color difference component of the representative value satisfies a predetermined condition. May be determined as a representative value. Specifically, as shown in FIG. 9, the representative color difference components (a *, b *) expressed in the color space of L * a * b are different from the center value of the corresponding color difference component range. When the condition of being within a predetermined range (inside a circle defined by THa and THb in FIG. 9) is satisfied, for example, the luminance component L expressed by 256 gradations is converted to 4 gradations or 8 gradations. Etc. to a predetermined gradation. In this case, the value of the color difference component may be set to the center value. Here, the predetermined ranges THa and THb for each component may be the same value or different values.

この処理により、特に文字色がグレー（黒を含む）である場合に、その文字色の本来の色を再現した代表値が設定される。例えば文字色が黒であるときに色差成分と輝度成分とは本来「０」であるが、スキャナの特性や、元の画像データのエンコード形式（例えばＪＰＥＧなど）の特性によっては、色差成分が「０」でなくなってしまったり、輝度成分が「０」でなくなってしまう場合がある。そこでここで示した色差成分に関する処理を行うことで、代表値を本来の黒色とすることができるようになる。 By this processing, when the character color is gray (including black), a representative value that reproduces the original color of the character color is set. For example, when the character color is black, the color difference component and the luminance component are originally “0”. However, depending on the characteristics of the scanner and the encoding format of the original image data (for example, JPEG), the color difference component may be “ In some cases, the luminance component is not “0” or the luminance component is not “0”. Therefore, by performing the processing relating to the color difference component shown here, the representative value can be set to the original black color.

なお、ここでは平滑化処理を行った後で、補正処理を行って代表値を決定しているが、この処理順序を逆にして各画素について上記補正処理を行った後で、各画素値の平滑化処理を行ってヒストグラムを演算し、代表値を決定してもよい。 Here, after performing the smoothing process, the correction process is performed to determine the representative value. However, after performing the above correction process for each pixel by reversing the process order, A representative value may be determined by performing a smoothing process and calculating a histogram.

［６．穴埋処理部］
穴埋処理部２６は、同一色領域分離部２５が出力する線分絵柄プレーンデータのうち、限定色化されなかった画像データ（上記線分画像要素データ又は線分画像要素除外データ）について、穴埋処理を行う。 [6. Cavity processing section]
The hole filling processing unit 26 performs hole processing on the image data (line image element element data or line segment image element exclusion data) that is not limited color among the line segment pattern plane data output by the same color region separation unit 25. Perform the filling process.

つまり、これらの画像データの各画素をラスタスキャン順に走査し、走査により選択される注目画素が除去された画素でなければ、当該注目画素の画素値をそのままとするとともに、当該注目画素の画素値を直前画素値として記憶部１２のワークメモリに記憶する。なお、既に他の画素値が直前画素値として記憶されている場合は、その記憶内容に上書きする。 In other words, each pixel of these image data is scanned in raster scan order, and if the pixel of interest selected by the scan is not removed, the pixel value of the pixel of interest remains unchanged and the pixel value of the pixel of interest Is stored in the work memory of the storage unit 12 as the previous pixel value. If another pixel value is already stored as the previous pixel value, the stored content is overwritten.

また、走査により選択される注目画素が除去された画素である場合、当該注目画素の画素値を、記憶している直前画素値に設定する。これにより除去された部分の画素値が、ラスタスキャン順に直前画素値と同一になり、多くの圧縮処理において圧縮効率を向上させることができるようになる。 If the pixel of interest selected by scanning is a removed pixel, the pixel value of the pixel of interest is set to the immediately preceding pixel value stored. As a result, the pixel value of the removed portion becomes the same as the previous pixel value in the raster scan order, and the compression efficiency can be improved in many compression processes.

［７．圧縮処理部］
圧縮処理部２７は、記憶部１２に格納されている線分絵柄プレーンデータのうち、限定色化されなかった画像データ（上記線分画像要素データ又は線分画像要素除外データ）を、ＪＰＥＧ圧縮し、第１圧縮線分絵柄プレーンデータを生成する。また、圧縮処理部２７は、記憶部１２に格納されている線分絵柄プレーンデータのうち、限定色化された画像データ（色別画像データ、第１限定色化画像データ、第２限定色化画像データ）については、ランレングス圧縮又はＺＩＰなどの方式で圧縮を行い、第２圧縮線分絵柄プレーンデータを生成する。さらに圧縮処理部２７は、文字プレーンデータについても、ランレングス圧縮又はＺＩＰ等の方式で圧縮を行い、圧縮文字プレーンデータを生成する。 [7. Compression processing unit]
The compression processing unit 27 performs JPEG compression on the image data (the line segment image element data or the line segment image element exclusion data) that has not been limited color among the line segment pattern plane data stored in the storage unit 12. The first compressed line segment pattern plane data is generated. In addition, the compression processing unit 27 includes image data (color-specific image data, first limited color image data, second limited color conversion) that is limited color among line segment picture plane data stored in the storage unit 12. (Image data) is compressed by a method such as run length compression or ZIP to generate second compressed line segment pattern plane data. Further, the compression processing unit 27 also compresses the character plane data by a method such as run length compression or ZIP, and generates compressed character plane data.

こうして得られた第１圧縮線分絵柄プレーンデータ、第２圧縮線分絵柄プレーンデータ、並びに圧縮文字プレーンデータは、例えば圧縮処理部２７によって連結されて、ＰＤＦ（Portable Document Format）データとして書き出される。圧縮処理部２７は、この生成したＰＤＦデータを記憶部１２に格納し、または画像出力部１４にこのＰＤＦデータを出力して、外部の装置に送出させる。 The first compressed line segment pattern plane data, the second compressed line segment pattern plane data, and the compressed character plane data obtained in this way are connected by, for example, the compression processing unit 27 and written out as PDF (Portable Document Format) data. The compression processing unit 27 stores the generated PDF data in the storage unit 12 or outputs the PDF data to the image output unit 14 to send it to an external device.

［動作］
次に、本実施の形態の画像処理装置の動作について説明する。ここでは図１０に示すように、文字部分（Ｔ１，Ｔ２）と、写真部分（Ｐ）と、線画部分としての地図部分（Ｍ）とを含むドキュメントが画像入力部１３から入力され、このドキュメントの画像データを処理対象とする場合を例として説明する。この図１０の例においては写真部分（Ｐ）内に文字部分の一部（Ｔ２）が重ね合わせられている。また地図部分（Ｍ）には、道路線図と文字とが入組んでいる。なお、ここでは便宜的に白黒で示しているが、実際には地図部分の道路線図と文字とは互いに異なる色で表され、写真はカラーで構わない。 [Operation]
Next, the operation of the image processing apparatus according to this embodiment will be described. Here, as shown in FIG. 10, a document including a character portion (T1, T2), a photograph portion (P), and a map portion (M) as a line drawing portion is input from the image input unit 13, and this document A case where image data is a processing target will be described as an example. In the example of FIG. 10, a part (T2) of the character part is superimposed on the photograph part (P). The map portion (M) includes a road map and characters. Here, although shown in black and white for the sake of convenience, the road map and characters in the map portion are actually represented in different colors, and the photograph may be in color.

前処理部２１は、この画像データの画素値を所定色空間（ＹＣｂＣｒ）の値に変換する。絵柄候補部分特定処理部２２は、この画像データ（元の画像データ）を二値化処理し、その小領域を除去して文字部分（Ｔ１）を除去した画像データを生成する。このとき、文字のほとんどが除去されるが（部分的に残存しても構わない）、写真部分に重ね合わせられた文字は、そのまま絵柄候補部分として特定された状態となる。 The preprocessing unit 21 converts the pixel value of the image data into a value in a predetermined color space (YCbCr). The pattern candidate portion specifying processing unit 22 binarizes the image data (original image data), and generates image data from which the small portion is removed and the character portion (T1) is removed. At this time, most of the characters are removed (they may remain partially), but the characters superimposed on the photograph portion are in a state of being directly specified as the pattern candidate portion.

文字抽出処理部２３は、元の画像データを二値化し、小領域部分を特定するなどの方法で文字部分を抽出する。この際において、元の画像データを複数の領域に分割し、分割して得られた各領域ごとに二値化のしきい値を適応的に定める方法（特許文献２に開示の方法など）によって二値化を行うことで、地図等着色された部分からも文字の抽出を可能としている。そしてレイアウト処理部２４が、絵柄候補部分内でレイアウト解析処理を行い、絵柄候補部分内に残存する文字部分（Ｔ２）を抽出する。また、このレイアウト処理部２４は、画像データをドキュメントの要素ごとに分離して、各要素に外接する矩形（レイアウト枠）を定める。具体的に図１０に示した例では、文字部分（Ｔ１）と地図部分（Ｍ）と写真部分（Ｐ）とを定めるレイアウト枠が画定される。なお文字部分（Ｔ２）は、写真部分（Ｐ）に含まれているので、レイアウト枠は決定されなくてもよい。 The character extraction processing unit 23 binarizes the original image data and extracts a character part by a method such as specifying a small area part. At this time, the original image data is divided into a plurality of regions, and a binarization threshold value is adaptively determined for each region obtained by the division (a method disclosed in Patent Document 2). By performing binarization, characters can be extracted from a colored portion such as a map. Then, the layout processing unit 24 performs a layout analysis process in the pattern candidate part, and extracts a character part (T2) remaining in the pattern candidate part. Further, the layout processing unit 24 separates the image data for each element of the document and determines a rectangle (layout frame) circumscribing each element. Specifically, in the example shown in FIG. 10, a layout frame that defines a character portion (T1), a map portion (M), and a photograph portion (P) is defined. Since the character part (T2) is included in the photograph part (P), the layout frame may not be determined.

同一色領域分離部２５は、文字抽出処理部２３やレイアウト処理部２４で抽出された文字部分について（Ｔ１，Ｔ２の双方について）、個々の文字ごとに代表色を定め、当該定めた代表色から互いに同一色と判断される代表色の組を検索する。次に、各代表色の組ごとに、当該組に属する代表色の文字をグループ化する。そして文字抽出処理部２３やレイアウト処理部２４で抽出された文字部分を包含する画像データを生成し、この画像データを二値化（又はグレイスケールに変換するなど、画像データの画素値情報のサイズを低減できればよい）して包含画像データとし、グループごとのグループ代表色と、グループに含まれる基本矩形（個々の文字に外接する矩形）を画定する座標情報と、包含画像データと、当該包含画像データの元の画像データ上の位置を特定する座標情報と、を含む文字プレーンデータを生成する。 The same color region separation unit 25 determines a representative color for each character for the character portion extracted by the character extraction processing unit 23 and the layout processing unit 24 (for both T1 and T2), and determines the representative color from the determined representative color. A set of representative colors determined to be the same color is searched. Next, for each set of representative colors, the characters of the representative colors belonging to the set are grouped. Then, the image data including the character portion extracted by the character extraction processing unit 23 or the layout processing unit 24 is generated, and the size of the pixel value information of the image data is binarized (or converted to gray scale). And including the group representative color for each group, coordinate information defining a basic rectangle (a rectangle circumscribing each character) included in the group, the included image data, and the included image Character plane data including coordinate information specifying the position of the data on the original image data is generated.

一方この同一色領域分離部２５は、元の画像データから文字部分以外のレイアウト枠内の部分画像データを抽出する。つまりここでは、地図部分（Ｍ）と写真部分（Ｐ）とが抽出されることになる。次に同一色領域分離部２５は、文字抽出処理部２３やレイアウト処理部２４で抽出された文字部分を、各部分画像データから除去する。これにより、写真部分（Ｐ）に含まれる文字部分（Ｔ２）が除去される。また、地図部分に文字が含まれている場合は、当該部分も除去される。 On the other hand, the same color region separation unit 25 extracts partial image data in the layout frame other than the character portion from the original image data. That is, here, the map portion (M) and the photograph portion (P) are extracted. Next, the same color region separation unit 25 removes the character portion extracted by the character extraction processing unit 23 and the layout processing unit 24 from each partial image data. Thereby, the character part (T2) included in the photograph part (P) is removed. If the map part includes characters, the part is also removed.

次に、各部分画像データを処理対象として、線分画像要素（エッジ及び線分そのもの）を検出する。同一色領域分離部２５は、処理対象となった各画像データについて、検出した線分画像要素のみを含んでなる線分画像要素データと、線分画像要素を除去して得られる線分画像要素除去データとを生成する。 Next, line segment image elements (edges and line segments themselves) are detected using each partial image data as a processing target. The same color region separation unit 25, for each image data to be processed, the line segment image element data including only the detected line segment image element, and the line segment image element obtained by removing the line segment image element Generate removal data.

そして同一色領域分離部２５は、線分画像要素データについて代表色を決定できるか否かを調べ、代表色を決定できる場合は、代表色を定めて、線分画像要素データを限定色化したデータを生成する。ここでは地図部分（Ｍ）と、写真部分（Ｐ）とのどちらについても、その線分画像要素データについて限定色化が行われたものとする。 Then, the same color region separation unit 25 checks whether or not the representative color can be determined for the line segment image element data. If the representative color can be determined, the representative color is determined and the line segment image element data is limited to the color. Generate data. Here, it is assumed that the line segment image element data has been limited in color for both the map portion (M) and the photograph portion (P).

また同一色領域分離部２５は、線分画像要素除外データについても、代表色を決定できるか否かを調べ、代表色を決定できる場合は、代表色を定めて、線分画像要素除外データを限定色化したデータを生成する。ここでは、地図部分（Ｍ）についてのみ線分画像要素除外データを限定色化できたものとする。 The same color region separation unit 25 also checks whether or not the representative color can be determined for the line segment image element exclusion data. If the representative color can be determined, the representative color is determined and the line segment image element exclusion data is determined. Generate limited color data. Here, it is assumed that the line segment image element exclusion data can be limited color only for the map portion (M).

同一色領域分離部２５は、限定色化が行われた各データについてその限定色化に用いられた代表色のセットを取り出す。そして、各セットに含まれる代表色間に、互いに同一色と判断される色がないかを検索する。例えば地図部分（Ｍ）の線分画像要素データについての代表色の一つと、写真部分（Ｐ）の線分画像要素データについての代表色の一つとが同一色と判断される場合は、各線分画像要素データから、当該代表色の画素を分離して新たな画像データ（色別画像データ）を生成する。このとき、各線分画像要素データからは、当該分離された画素は除去される。 The same color region separation unit 25 extracts a set of representative colors used for the limited color for each data subjected to the limited color. Then, a search is made as to whether there is a color determined to be the same color between the representative colors included in each set. For example, if one of the representative colors for the line segment image element data of the map portion (M) and one of the representative colors for the line segment image element data of the photograph portion (P) are determined to be the same color, The pixel of the representative color is separated from the image element data to generate new image data (color-specific image data). At this time, the separated pixels are removed from each line segment image element data.

穴埋処理部２６は、限定色化が行われなかった写真部分（Ｐ）の線分画像要素除外データについて、文字などとして除去された画素の値を、スキャンライン順で最近傍の画素値（除去されていない画素値）に設定する。 For the line segment image element exclusion data of the photographic part (P) that has not been subjected to the limited colorization, the hole filling processing unit 26 calculates the pixel value removed as a character or the like as the nearest pixel value ( Pixel value that has not been removed).

圧縮処理部２７は、限定色化が行われなかった写真部分（Ｐ）の線分画像要素除外データについてはＪＰＥＧ圧縮を行い、限定色化が行われた色別画像データや各線分画像要素データ、及び文字プレーンデータについては、例えばランレングス圧縮を行う。そして、これらの圧縮結果を組み合せてＰＤＦデータを生成し、これを画像出力部１４に出力する。画像出力部１４は、このＰＤＦデータを外部の装置に出力する。 The compression processing unit 27 performs JPEG compression on the line segment image element exclusion data of the photographic part (P) that has not been subjected to the limited colorization, and performs color-specific image data or each line segment image element data that has undergone the limited color conversion. For the character plane data, for example, run length compression is performed. These compression results are combined to generate PDF data, which is output to the image output unit 14. The image output unit 14 outputs the PDF data to an external device.

ここで圧縮処理部２７は、線分画像要素除外データについてＪＰＥＧ圧縮の前に画像のサイズを縮小する処理（縮小処理）を行って圧縮率をより向上させることとしてもよい。 Here, the compression processing unit 27 may perform a process of reducing the image size (reduction process) on the line segment image element exclusion data before JPEG compression to further improve the compression rate.

本実施の形態では、絵柄中に含まれる線分画像要素についても限定色化などの処理を経てランレングス等、当該画像の性状に適した圧縮処理を適用できるので、圧縮効率と画質とを向上できる。 In the present embodiment, compression processing suitable for the properties of the image, such as run length, can be applied to the line segment image elements included in the pattern through processing such as limited color, so that the compression efficiency and image quality are improved. it can.

本発明の実施の形態に係る画像処理装置の一例を表す構成ブロック図である。1 is a configuration block diagram illustrating an example of an image processing apparatus according to an embodiment of the present invention. 本発明の実施の形態に係る画像処理装置の制御部によって実行される処理内容を表す機能ブロック図である。It is a functional block diagram showing the processing content performed by the control part of the image processing apparatus which concerns on embodiment of this invention. 同一色領域分離部の文字の領域に対する処理の例を表す機能ブロック図である。It is a functional block diagram showing the example of the process with respect to the area | region of the character of the same color area separation part. 同一色領域分離部の処理例を表す説明図である。It is explanatory drawing showing the process example of the same color area separation part. 同一色領域分離部の文字以外の領域に対する処理の例を表す機能ブロック図である。It is a functional block diagram showing the example of the process with respect to areas other than the character of the same color area separation part. 同一色領域分離部の処理例を表す説明図である。It is explanatory drawing showing the process example of the same color area separation part. 色抽出部の例を表す機能ブロック図である。It is a functional block diagram showing the example of a color extraction part. 同一色領域分離部２５において利用されるトーンカーブの例を表す説明図である。It is explanatory drawing showing the example of the tone curve utilized in the same color area separation part 25. FIG. 同一色領域分離部２５における補正処理の処理条件を表す説明図である。FIG. 11 is an explanatory diagram illustrating processing conditions for correction processing in the same color region separation unit 25. 本発明の実施の形態に係る画像処理装置の処理対象となる画像の例を表す説明図である。It is explanatory drawing showing the example of the image used as the process target of the image processing apparatus which concerns on embodiment of this invention.

Explanation of symbols

１１制御部、１２記憶部、１３画像入力部、１４画像出力部、２１前処理部、２２絵柄候補部分特定処理部、２３文字抽出処理部、２４レイアウト処理部、２５同一色領域分離部、２６穴埋処理部、２７圧縮処理部、５１代表色決定部、５２同一色領域情報生成部、５３包含画像作成部、５４色別包含画像作成部、５５比較部、６０前処理部、６１エッジ抽出部、６２線分抽出部、６３線分画像要素特定部、６４線分画像要素除去部、６５第１色抽出部、６６第２色抽出部、６７限定色化部、６８領域統合部、７１色数判定部、７２代表色決定部。
DESCRIPTION OF SYMBOLS 11 Control part, 12 Memory | storage part, 13 Image input part, 14 Image output part, 21 Pre-processing part, 22 Design candidate part specific processing part, 23 Character extraction processing part, 24 Layout processing part, 25 Same color area separation part, 26 Hole filling processing unit, 27 Compression processing unit, 51 Representative color determination unit, 52 Same color area information generation unit, 53 Inclusive image generation unit, 54 Color-specific inclusion image generation unit, 55 Comparison unit, 60 Preprocessing unit, 61 Edge extraction , 62 line segment extraction unit, 63 line segment image element identification unit, 64 line segment image element removal unit, 65 first color extraction unit, 66 second color extraction unit, 67 limited colorization unit, 68 region integration unit, 71 Color number determination unit, 72 representative color determination unit.

Claims

Extract line segment image elements from the image data to be processed, and generate first partial image data composed of the line segment image elements and second partial image data excluding the line segment image elements. Means,
First limited colorization, wherein at least one representative color is determined based on a pixel value included in the first partial image data, and the first partial image data is limited color using the determined representative color First limited colorizing means for generating image data;
Second limited colorization, wherein at least one representative color is determined based on a value of a pixel included in the second partial image data, and the second partial image data is limited color using the determined representative color Second limited colorization means for generating image data;
Including
An image processing apparatus, wherein the first limited color image data and the second limited color image data are subjected to a predetermined compression process.

The image processing apparatus according to claim 1.
The first limited color image data is calculated by calculating a significant number of pixel values included in the first partial image data, and only when the significant number does not exceed a predetermined first threshold value. An image processing apparatus, comprising: determining one representative color, and generating first limited color image data obtained by limiting the first partial image data using the determined representative color.

The image processing apparatus according to claim 1 or 2,
The second limited color image data is calculated by calculating a significant number of pixel values included in the second partial image data and only when the significant number does not exceed a predetermined second threshold value. An image processing apparatus that determines two representative colors and generates second limited color image data obtained by limiting the second partial image data using the determined representative colors.

The image processing apparatus according to any one of claims 1 to 3,
Based on the first limited color image data and the second limited color image data,
Means for generating a plurality of color-specific image data including pixels for each color determined to be the same color among the pixels included in the first limited color image data and the second limited color image data An image processing apparatus further comprising:

Extract line segment image elements from the image data to be processed, and generate first partial image data composed of the line segment image elements and second partial image data excluding the line segment image elements. Process,
First limited colorization, wherein at least one representative color is determined based on a pixel value included in the first partial image data, and the first partial image data is limited color using the determined representative color Generating image data; and
Second limited colorization, wherein at least one representative color is determined based on a value of a pixel included in the second partial image data, and the second partial image data is limited color using the determined representative color Generating image data; and
An image processing method, wherein the first limited color image data and the second limited color image data obtained by the separation are subjected to a predetermined compression process.

On the computer,
Extract line segment image elements from the image data to be processed, and generate first partial image data composed of the line segment image elements and second partial image data excluding the line segment image elements. Procedure and
First limited colorization, wherein at least one representative color is determined based on a pixel value included in the first partial image data, and the first partial image data is limited color using the determined representative color A procedure for generating image data;
Second limited colorization, wherein at least one representative color is determined based on a value of a pixel included in the second partial image data, and the second partial image data is limited color using the determined representative color A procedure for generating image data;
And execute
An image processing program characterized in that the first limited color image data and the second limited color image data obtained by the separation are subjected to a predetermined compression process.