JPH10232926A

JPH10232926A - Image processor and its method

Info

Publication number: JPH10232926A
Application number: JP9335995A
Authority: JP
Inventors: Takeshi Makita; 剛蒔田
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1996-12-20
Filing date: 1997-12-05
Publication date: 1998-09-02

Abstract

PROBLEM TO BE SOLVED: To provide the image processor and its method which process an image by setting a proper quantization threshold between object density and background density in the input image. SOLUTION: When a multilevel image is quantized and processed, a luminance frequency accumulation part 103 calculates the luminance frequency of the multilevel image, a quantization threshold calculation part 103 specifies a quantization threshold according to the calculated luminance frequency, and a quantization part 105 calculates a representative value used for the quantization of the multilevel image according to the specified quantization threshold and luminance frequency and quantizes the multilevel image by using the calculated representative values.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、例えば、多値画像
の量子化閾値を決定して量子化を行なう画像処理装置及
びその方法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to, for example, an image processing apparatus and method for determining a quantization threshold of a multi-valued image and performing quantization.

【０００２】[0002]

【従来の技術】近年の画像処理技術の発展はめざまし
く、フルカラー画像等の多値画像の処理や、多値画像内
の文字認識処理等が可能な画像処理装置も普及してきて
いる。このような画像処理技術において、多値画像の２
値化処理は不可欠な技術となっている。2. Description of the Related Art In recent years, image processing techniques have been remarkably developed, and image processing apparatuses capable of processing multi-valued images such as full-color images and character recognition processing in multi-valued images have become widespread. In such an image processing technique, the multivalued image 2
Value processing is an indispensable technology.

【０００３】従来の２値化手法としては、あらかじめ設
定してある固定閾値による単純２値化法をはじめとし
て、ある閾値でヒストグラムを２クラスに分割した場合
のクラス間分散が最大になるときの閾値を２値化閾値と
する大津法（「判別および最小２乗規準に基づく自動し
きい値選定法」（大津）、電子通信学会論文誌、Ｖｏ
ｌ．Ｊ６３−Ｄ，Ｎｏ．４．ｐｐ．３４９−３５６，１
９８０）、あるいは、階調を持つ画像に対して、局所的
濃度に応じて閾値を設定する２値化法等がある。[0003] Conventional binarization methods include a simple binarization method using a fixed threshold value set in advance, and a method when the inter-class variance becomes maximum when a histogram is divided into two classes by a certain threshold value. Otsu method with threshold as binarization threshold (“Automatic threshold selection method based on discrimination and least square criterion” (Otsu), IEICE Transactions, Vo
l. J63-D, No. 4. pp. 349-356, 1
980) Alternatively, there is a binarization method of setting a threshold value for an image having a gradation according to the local density.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、上述し
た従来の画像処理装置における２値化方法では、以下の
ような問題がある。However, the above-described binarization method in the conventional image processing apparatus has the following problems.

【０００５】すなわち、固定閾値による単純２値化法で
は、画像内の対象物濃度と背景濃度の間に適切な閾値を
設定することが難しく、その結果、画像一面が黒く潰れ
てしまったり、逆に白くなってしまう。また、大津法で
は、２クラスの分布が極端に異なる場合においては、大
さい方のクラスに閾値が寄ってしまうという性質があ
り、ノイズの多い２値画像が生成されてしまう。さら
に、局所的濃度に応じて閾値を設定する２値化法では、
画像を局所に分割しているため、ブロック歪が発生しや
すい。また、仮に最適な閾値を特定できても、２値化に
よって原画の下地や文字等のグレイスケール惜報が失わ
れてしまう等の問題がある。That is, in the simple binarization method using a fixed threshold value, it is difficult to set an appropriate threshold value between the object density and the background density in an image, and as a result, the entire image is crushed black or the reverse. It turns white. Further, in the Otsu method, when the distributions of the two classes are extremely different, the threshold value is closer to the larger class, and a binary image with much noise is generated. Furthermore, in the binarization method of setting a threshold according to the local density,
Since the image is divided locally, block distortion is likely to occur. Further, even if the optimum threshold value can be specified, there is a problem that the grayscale information such as the base of the original image and characters is lost due to the binarization.

【０００６】本発明は、上記課題を解決するためにされ
たもので、入力画像内の対象物濃度と背景濃度との間に
適切な量子化閾値を設定して画像処理を行なう画像処理
装置及びその方法を提供することを目的とする。SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problems, and an image processing apparatus for performing image processing by setting an appropriate quantization threshold between an object density and a background density in an input image, and It is intended to provide such a method.

【０００７】また、上記課題を解決するために、本発明
は最適な量子化閾値を特定し、原画の下地や文字等のグ
レースケール情報を失うことなく領域分離を行える画像
処理装置及びその方法を提供することを目的とする。Further, in order to solve the above-mentioned problems, the present invention provides an image processing apparatus and method which can specify an optimal quantization threshold and perform area separation without losing grayscale information such as the background of an original image or characters. The purpose is to provide.

【０００８】[0008]

【課題を解決するための手段】上記目的を達成するため
に、本発明は、多値画像を量子化して画像処理を行う画
像処理装置において、前記多値画像の輝度頻度を算出す
る第１の算出手段と、前記算出された輝度頻度に基づ
き、量子化の量子化閾値を特定する特定手段と、前記特
定された量子化閾値と前記輝度頻度とに基づき、前記多
値画像の量子化に用いる代表値を算出する第２の算出手
段と、前記算出された代表値を用いて前記多値画像を量
子化する量子化手段とを有することを特徴とする。In order to achieve the above object, the present invention provides an image processing apparatus for quantizing a multi-valued image and performing image processing, the first method comprising calculating a luminance frequency of the multi-valued image. A calculating unit, a specifying unit that specifies a quantization threshold value for quantization based on the calculated luminance frequency, and a quantization unit that quantizes the multi-valued image based on the specified quantization threshold value and the luminance frequency. It is characterized by comprising a second calculating means for calculating a representative value, and a quantizing means for quantizing the multi-valued image using the calculated representative value.

【０００９】また、上記目的を達成するために、本発明
は、多値画像を量子化して画像処理を行う画像処理方法
において、前記多値画像の輝度頻度を算出し、前記算出
された輝度頻度に基づき、量子化の量子化閾値を特定
し、前記特定された量子化閾値と前記輝度頻度とに基づ
き、前記多値画像の量子化に用いる代表値を算出し、前
記算出された代表値を用いて前記多値画像を量子化する
ことを特徴とする。According to another aspect of the present invention, there is provided an image processing method for performing image processing by quantizing a multi-valued image, comprising: calculating a luminance frequency of the multi-valued image; Based on the, specify a quantization threshold of quantization, based on the specified quantization threshold and the luminance frequency, calculate a representative value used for quantization of the multi-valued image, the calculated representative value The multi-valued image is quantized using the quantization.

【００１０】[0010]

【発明の実施の形態】以下、添付図面を参照して、本発
明に係る実施の形態を詳細に説明する。Embodiments of the present invention will be described below in detail with reference to the accompanying drawings.

【００１１】図１は、本発明の実施の形態に係る量子化
処理を実行する画像処理システムの構成を示すブロック
図である。同図において、１は、文字認識処理を行なう
画像処理装置、２は画像を入力するための、例えば、ス
キャナ等の画像入力装置、３は、処理後の画像を表示す
る画像表示装置である。FIG. 1 is a block diagram showing a configuration of an image processing system for executing a quantization process according to an embodiment of the present invention. In FIG. 1, reference numeral 1 denotes an image processing device for performing a character recognition process, 2 denotes an image input device such as a scanner for inputting an image, and 3 denotes an image display device for displaying a processed image.

【００１２】上記の画像処理装置１において、１０１は
画像入力装置２とのインターフェースとなる入力部、１
０２は処理中のデータを記憶するメモリ等の記憶部、１
０３は入力画像の輝度頻度（ヒストグラム）を累計する
輝度頻度累計部ある。また、１０４は入力画像の量子化
閾値を算出する量子化閾値算出部、１０５は量子化閾値
算出部１０４において算出された閾値を用いて量子化画
像を作成する量子化部である。In the image processing apparatus 1 described above, reference numeral 101 denotes an input unit serving as an interface with the image input apparatus 2;
Reference numeral 02 denotes a storage unit such as a memory for storing data being processed;
Reference numeral 03 denotes a luminance frequency accumulator that accumulates the luminance frequency (histogram) of the input image. Reference numeral 104 denotes a quantization threshold calculation unit that calculates a quantization threshold of the input image, and reference numeral 105 denotes a quantization unit that creates a quantization image using the threshold calculated by the quantization threshold calculation unit 104.

【００１３】１０６は画像を属性毎の領域に分離する領
域分離部、１０７はこの領域分離により文字領域として
抽出された領域に対する文字認識処理を行なう文字認識
部、１０８は文字領域以外として分離された領域に対す
る各種画像処理を行なう画像処理部、１０９は画像表示
装置３とのインターフェースとなる出力部である。な
お、これらの各構成要素は、本装置１全体を制御するＣ
ＰＵ、そのＣＰＵのプログラム等を格納しているＲＯ
Ｍ、ＣＰＵが処理を実行時に使用するワークエリアやテ
ーブル等が定義されているＲＡＭで構成される不図示の
制御部により統括的に制御されている。Reference numeral 106 denotes an area separating unit for separating an image into regions for each attribute, 107 denotes a character recognizing unit that performs a character recognition process on a region extracted as a character region by this region separation, and 108 denotes a region other than a character region. An image processing unit 109 that performs various types of image processing on the area, an output unit 109 serving as an interface with the image display device 3. Each of these components is a C that controls the entire device 1.
PU, RO storing the program of the CPU, etc.
M, a work area, a table, and the like used by the CPU at the time of execution of processing are generally controlled by a control unit (not shown) including a RAM in which a work area, a table, and the like are defined.

【００１４】以下、上述した構成をとる本実施の形態に
係る画像処理装置において実行されるＯＣＲ処理につい
て説明する。Hereinafter, the OCR processing executed in the image processing apparatus according to the present embodiment having the above configuration will be described.

【００１５】図２は、本実施の形態に係る量子化閾値決
定方法を利用した像域分離ＯＣＲ処理を示すフローチャ
ートである。FIG. 2 is a flowchart showing an image area separation OCR process using the quantization threshold value determination method according to the present embodiment.

【００１６】まず、ステップＳ２０１では、スキャナな
どの画像入力装置２より、入力部１０１が画像データを
入力し、記憶部１０２に格納する。ここでの画像データ
の入力は、８ビットの多値画像データとして行なわれ
る。続いて、ステップＳ２０２においては、ステップＳ
２０１で入力された多値画像に対して、量子化閾値算出
部１０４が後述する像域分離に最適な量子化閾値を決定
して、この量子化閾値により量子化部１０５が量子化画
像を生成する。そして、ステップＳ２０３では、領域分
離部１０６がステップＳ２０２で生成された量子化画像
の像域分離を行ない、その属性を付加した領域データを
画像処理部１０８へ出力する。続くステップＳ２０４で
は、ステップＳ２０３において分離された領域データに
ついて画像処理部１０８が「テキスト」と指定された領
域を２値化し、その後、２値画像から切り出す。そし
て、この２値画像に対して文字認識部１０７がＯＣＲ処
理を行なって、認識された文字コードを出力する。First, in step S 201, the input unit 101 inputs image data from the image input device 2 such as a scanner, and stores the image data in the storage unit 102. The input of the image data here is performed as 8-bit multi-valued image data. Then, in step S202, step S
For the multi-valued image input in 201, the quantization threshold calculation unit 104 determines an optimal quantization threshold for image area separation described later, and the quantization unit 105 generates a quantized image based on the quantization threshold. I do. Then, in step S203, the area separation unit 106 performs image area separation of the quantized image generated in step S202, and outputs the area data to which the attribute has been added to the image processing unit 108. In the following step S204, the image processing unit 108 binarizes the area designated as "text" with respect to the area data separated in step S203, and then cuts out the area from the binary image. Then, the character recognizing unit 107 performs an OCR process on the binary image, and outputs a recognized character code.

【００１７】＜量子化処理の説明＞本実施の形態におけ
る量子化処理について説明する。<Description of Quantization Processing> The quantization processing in the present embodiment will be described.

【００１８】図３は、本実施の形態における量子化処理
の手順を示すフローチャートである。同図において、ま
ず、ステップＳ３０１で、８ビットの多値画像を、画像
処理装置１内の記憶部１０２から不図示のメモリ等に入
力し、ステップＳ３０２で処理ブロックの単位（６４×
６４画素）ごとに抽出する。なお、この多値画像は、ス
キャナなどの画像入力装置２により読み込まれ、あらか
じめ記憶部１０２に格納されているものとする。そし
て、ステップＳ３０３において、輝度頻度累計部１０３
が処理ブロックごとのヒストグラムを算出する。ここで
は、処理ブロック全画素を用い、８ビット、すなわち
「０」から「２５５」までの各デジタル値に対する頻度
を計算する。これにより、例えば、図６に示すようなヒ
ストグラムが得られる。FIG. 3 is a flowchart showing the procedure of the quantization process in the present embodiment. First, in step S301, an 8-bit multivalued image is input from the storage unit 102 in the image processing apparatus 1 to a memory (not shown) or the like, and in step S302, a processing block unit (64 ×
(64 pixels). It is assumed that the multi-valued image is read by the image input device 2 such as a scanner and stored in the storage unit 102 in advance. Then, in step S303, the luminance frequency accumulating unit 103
Calculates a histogram for each processing block. Here, the frequency for each digital value of 8 bits, that is, “0” to “255” is calculated using all the pixels of the processing block. Thereby, for example, a histogram as shown in FIG. 6 is obtained.

【００１９】次に、ステップＳ３０４において、パラメ
ータＳＴＡＲＴ，ＥＮＤに、それぞれ「０」，「２５
５」とセットする。これらのパラメータＳＴＡＥＴ，Ｅ
ＮＤは、それぞれ、後段のステップＳ３０５やステップ
Ｓ３０６で求める輝度値の統計量の始点及び終点に対応
する。Next, in step S304, the parameters START and END are set to "0" and "25", respectively.
5 ”. These parameters STAET, E
ND corresponds to the start point and the end point of the statistic of the luminance value obtained in the subsequent step S305 or S306, respectively.

【００２０】ステップＳ３０５では、ＳＴＡＲＴからＥ
ＮＤまでのデジタル値に対応する画素の平均値ＡＶを算
出する。例えば、ＳＴＡＲＴ＝０，ＥＮＤ＝２５５であ
れば、「０」から「２５５」の値を持つ画素（この場
合、全画素）の平均値ＡＶを算出し、ＳＴＡＲＴ＝０，
ＥＮＤ＝１７７であれば、「０」から「１７７」の値を
持つ画素の平均値ＡＶを算出する。In step S305, START to E
An average value AV of pixels corresponding to digital values up to ND is calculated. For example, if START = 0 and END = 255, the average value AV of the pixels having values from “0” to “255” (in this case, all pixels) is calculated, and START = 0, END = 255.
If END = 177, the average value AV of pixels having values from “0” to “177” is calculated.

【００２１】ステップＳ３０６では、ＳＴＡＲＴからＥ
ＮＤまでの輝度値に対応する画素のスキュー値Ｓｋを算
出する。ここで、スキュー値とは、ヒストグラム分布の
偏りを示す統計量である。このスキュー算出には、以下
に示す式（１）を用いる。In step S306, START to E
The skew value Sk of the pixel corresponding to the luminance value up to ND is calculated. Here, the skew value is a statistic indicating the bias of the histogram distribution. For this skew calculation, the following equation (1) is used.

【００２２】Ｓｋ＝（Σ（Ｘｉ−ＡＶ）＾３）／Ｄ …（１）ここで、“＾”は、べき乗を意味し、Ｘｉは、画素の輝
度値である。また、Ｄは画像全体の分散値であり、以下
の式（２）により算出される。Sk = (Σ (Xi-AV) ＾ 3) / D (1) Here, “＾” means a power, and Xi is a luminance value of a pixel. D is a variance value of the entire image, and is calculated by the following equation (2).

【００２３】Ｄ＝Σ（Ｘｉ−ＡＶ）＾２ …（２）上記の式（１）において、スキュー値は、各画素の輝度
値とその平均値との差分を３乗することにより算出され
るが、奇数乗であれば３乗に限定されるものではない。D = {(Xi−AV)} 2 (2) In the above equation (1), the skew value is calculated by raising the difference between the luminance value of each pixel and its average value to the cube. However, if it is an odd power, it is not limited to the third power.

【００２４】続くステップＳ３０７，Ｓ３０８では、ヒ
ストグラムの偏りの方向を判断する。まず、ステップＳ
３０７では、以下の式（３）により、ヒストグラムの偏
りの方向を判断する。これは、ヒストグラムの偏りが、
平均値ＡＶよりも小さい値の範囲にあるか否かの判断と
なる。In subsequent steps S307 and S308, the direction of the bias of the histogram is determined. First, step S
In 307, the bias direction of the histogram is determined by the following equation (3). This is because the histogram bias is
It is determined whether or not the value is in a range smaller than the average value AV.

【００２５】Ｓｋ＜−１．０ …（３）ステップＳ３０７において、算出したスキュー値につい
て、式（３）が「真」ならば、ステップ３１２へすす
み、また、式（３）が「偽」ならば、ステップＳ３０８
へ進む。このステップＳ３１２では、ＳＴＡＲＴは変化
させず、ＥＮＤに平均値ＡＶをセットする。そして、ス
テップＳ３０５に戻り、再び、ＳＴＡＲＴ値からＥＮＤ
値までの平均値ＡＶを算出する。Sk <−1.0 (3) In step S307, regarding the calculated skew value, if equation (3) is “true”, the process proceeds to step 312. If equation (3) is “false”, Step S308
Proceed to. In this step S312, the average value AV is set in END without changing START. Then, the process returns to step S305, where the END value is again changed from the START value.
An average value AV up to the value is calculated.

【００２６】一方、ステップＳ３０８では、以下に示す
式（４）により、ヒストグラムの偏り方向を判断する。
これは、ヒストグラムの偏りが、平均値ＡＶより大きい
値の範囲にあるか否かの判断となる。On the other hand, in step S308, the bias direction of the histogram is determined by the following equation (4).
This is to determine whether or not the bias of the histogram is in a range of values larger than the average value AV.

【００２７】Ｓｋ＞１．０ …（４）ステップＳ３０８において、求めたスキュー値に関して
式（４）が「真」ならば、処理をステップＳ３１３へ進
め、また、それが「偽」ならば、ステップＳ３０９へ進
む。ステップＳ３１３では、ＳＴＡＲＴに平均値ＡＶを
セットし、ＥＮＤは変化させない。そして、ステップＳ
３０５に戻り、再び、ＳＴＡＲＴ値からＥＮＤ値までの
平均値ＡＶを算出する。Sk> 1.0 (4) In step S308, if equation (4) is “true” with respect to the obtained skew value, the process proceeds to step S313, and if it is “false”, the process proceeds to step S313. Proceed to S309. In step S313, the average value AV is set in START, and END is not changed. And step S
Returning to 305, the average value AV from the START value to the END value is calculated again.

【００２８】一方、ステップＳ３０９では、ステップＳ
３０７，Ｓ３０８における条件が共に「偽」である場合
の平均値ＡＶを、量子化閾値ＴＨとして設定する。そし
て、ステップＳ３１０で、量子化閾値ＴＨを用いた量子
化処理を行なう。On the other hand, in step S309, step S309
The average value AV when the conditions in 307 and S308 are both “false” is set as the quantization threshold TH. Then, in step S310, a quantization process using the quantization threshold TH is performed.

【００２９】そして、ステップＳ３１１では、入力画像
の最後の処理ブロック（６４×６４画素）かどうかの判
断をし、最後の処理ブロックであるならば、処理を終了
し、未処理ブロックがあればステップＳ３０２に戻る。In step S311, it is determined whether or not the input image is the last processing block (64 × 64 pixels). If it is the last processing block, the processing is terminated. It returns to S302.

【００３０】この量子化の様子を、図４を参照して説明
する。The state of the quantization will be described with reference to FIG.

【００３１】上記のステップＳ３０３で算出されたヒス
トグラム中、ステップＳ３０９にて算出された量子化閾
値ＴＨの値よりも小さい領域をＢＢ、これとは逆に、Ｔ
Ｈよりも大きい領域をＷＢとする。通常は、ＢＢ領域の
代表値を０、ＷＢ領域の代表値を１に設定して２値化を
行なう。但し、この場合、グレイ情報は失われてしま
う。In the histogram calculated in step S303, an area smaller than the quantization threshold value TH calculated in step S309 is denoted by BB.
A region larger than H is defined as WB. Normally, binarization is performed by setting the representative value of the BB area to 0 and the representative value of the WB area to 1. However, in this case, the gray information is lost.

【００３２】そこで、本実施の形態では、ＢＢ領域の平
均値ＢＢＶと、ＷＢ領域の平均値ＷＢＶを算出し、これ
ら２つの平均値ＢＢＶ，ＷＢＶによって画像の量子化を
行なう。Therefore, in the present embodiment, the average value BBV of the BB area and the average value WBV of the WB area are calculated, and the image is quantized using these two average values BBV and WBV.

【００３３】この結果、図５の５０１に示すように、画
像領域の多値情報が、わずか２種類の多値情報で表現さ
れることになる。なお、量子化後のデータは、図５の５
０２に示すように、ＢＢＶで表わされる領域を０にて置
き換え、同様にＷＢＶで表わされる領域を１にて置き換
えたビットマップに、ＢＢＶ，ＷＢＶによるヘッダー情
報５０３を添付するようにしてもよい。また、ＢＢＶ，
ＷＢＶは、平均値に限定されるものではなく、ＢＢ領域
とＷＢ領域の各々の中央値としてもよい。As a result, as shown at 501 in FIG. 5, the multi-valued information of the image area is represented by only two types of multi-valued information. The data after quantization is represented by 5 in FIG.
As shown in 02, the header information 503 based on BBV and WBV may be attached to a bitmap in which the area represented by BBV is replaced by 0 and the area represented by WBV is replaced by 1. Also, BBV,
The WBV is not limited to the average value, but may be the median value of each of the BB area and the WB area.

【００３４】以上説明したように、本実施の形態におけ
る量子化処理が行なわれるが、式（３），（４）で示し
た範囲は、これに限定されるものではない。As described above, the quantization processing according to the present embodiment is performed, but the range shown by equations (3) and (4) is not limited to this.

【００３５】以下、具体的な画像の例を参照して、本実
施の形態に係る量子化処理について、更に詳細に説明す
る。図６に示すヒストグラムの例を用いて、本実施の形
態における量子化閾値ＴＨの決定処理について説明す
る。Hereinafter, the quantization processing according to the present embodiment will be described in more detail with reference to specific examples of images. Using the example of the histogram shown in FIG. 6, the process of determining the quantization threshold TH in the present embodiment will be described.

【００３６】図６は、ある画像（８ビット入力）のヒス
トグラムを示したものである。同図において、横軸は、
その左端が「０」、すなわち黒、右端が「２５５」、す
なわち白を表わす輝度のデジタル値であり、縦軸は、各
デジタル値の頻度を表わしている。FIG. 6 shows a histogram of a certain image (8-bit input). In the figure, the horizontal axis is
The left end is “0”, that is, black, and the right end is “255”, that is, a digital value of luminance representing white, and the vertical axis represents the frequency of each digital value.

【００３７】図７は、図６に示すようなヒストグラムを
有する画像に対して、図３に示す量子化処理において、
ステップ３０５とステップＳ３０６での処理の際の、各
パラメータの値の変化を示す図である。なお、図７の各
パラメータ値は、図３のステップＳ３０５及びステップ
Ｓ３０６を通過する回数によって、それぞれ示されてい
る。FIG. 7 shows an image having a histogram as shown in FIG. 6 in the quantization processing shown in FIG.
FIG. 14 is a diagram illustrating a change in the value of each parameter at the time of processing in steps 305 and S306. Each parameter value in FIG. 7 is indicated by the number of times of passing through step S305 and step S306 in FIG.

【００３８】まず、ステップＳ３０５，Ｓ３０６を通過
する１回目の処理では、ＳＴＡＲＴ＝０，ＥＮＤ＝２５
５で平均値ＡＶ，スキュー値Ｓｋを計算し、それぞれが
「１７７」，「−７８．９」という値を得る。この場
合、スキュー値Ｓｋが［−１．０」未満であるため、図
３のステップＳ３１２において、ＳＴＡＲＴ＝０，ＥＮ
Ｄ＝１７７が設定される。続いて、２回目の処理では、
ＳＴＡＲＴ＝０，ＥＮＤ＝１７７における平均値ＡＶ、
スキュー値Ｓｋを計算し、それぞれが「９１」，「−
８．６」という値を得る。これについても、そのスキュ
ー値Ｓｋが「−１．０」未満であるため、図３のステッ
プＳ３１２において、ＳＴＡＲＴ＝０，ＥＮＤ＝９１が
設定される。First, in the first process passing through steps S305 and S306, START = 0 and END = 25
5, the average value AV and the skew value Sk are calculated to obtain values of “177” and “−78.9”, respectively. In this case, since the skew value Sk is less than [-1.0], in step S312 in FIG. 3, START = 0, EN
D = 177 is set. Then, in the second process,
Average value AV at START = 0, END = 177,
The skew value Sk is calculated, and “91”, “−”
8.6 ". Also in this case, since the skew value Sk is less than "-1.0", START = 0 and END = 91 are set in step S312 in FIG.

【００３９】３回目の処理では、ＳＴＡＲＴ＝０，ＥＮ
Ｄ＝９１における平均値ＡＶ、スキュー値Ｓｋを計算
し、それぞれが「４３」，「９．６」という値を得る。
この場合はスキュー値Ｓｋが「１．０」を超えるため、
図３のステップＳ３１３において、ＳＴＡＲＴ＝４３，
ＥＮＤ＝９１が設定される。続く４回目の処理では、Ｓ
ＴＡＲＴ＝４３，ＥＮＤ＝９１における平均値ＡＶ、ス
キュー値Ｓｋを計算し、それぞれが「７２」，「−７．
０」という値を得る。この値についてもスキュー値Ｓｋ
が「−１．０」未満であるため、図３のステップＳ３１
２において、ＳＴＡＲＴ＝４３，ＥＮＤ＝７２が設定さ
れる。In the third processing, START = 0, EN
The average value AV and the skew value Sk at D = 91 are calculated to obtain values "43" and "9.6", respectively.
In this case, since the skew value Sk exceeds “1.0”,
In step S313 of FIG. 3, START = 43,
END = 91 is set. In the subsequent fourth processing, S
The average value AV and skew value Sk at TART = 43 and END = 91 are calculated, and are respectively “72” and “−7.
0 "is obtained. The skew value Sk for this value also
Is less than “−1.0”, so that step S31 in FIG.
At 2, START = 43 and END = 72 are set.

【００４０】５回目の処理では、ＳＴＡＲＴ＝４３，Ｅ
ＮＤ＝７２における平均値ＡＶ、スキュー値Ｓｋを計算
し、それぞれが「５８」，「−２．２」という値を得
る。これもそのスキュー値Ｓｋが「−１．０」未満であ
るため、図３のステップＳ３１２において、ＳＴＡＲＴ
＝４３，ＥＮＤ＝５８が設定される。そして、６回目の
処理ではＳＴＡＲＴ＝４３，ＥＮＤ＝５８における平均
値ＡＶ、スキュー値Ｓｋを計算し、それぞれが「５
０」，「−０．４」という値を得る。In the fifth processing, START = 43, E
The average value AV and the skew value Sk at ND = 72 are calculated to obtain values “58” and “−2.2”, respectively. Since the skew value Sk is also less than “−1.0”, in step S312 in FIG.
= 43 and END = 58 are set. Then, in the sixth process, the average value AV and the skew value Sk at START = 43 and END = 58 are calculated, and each of them is set to “5”.
0 ”and“ −0.4 ”are obtained.

【００４１】ここで、スキューとＳｋが「−１．０」以
上、かつ「１．０」以下となり、図３のステップＳ３０
７，Ｓ３０８の条件を満たさない（そこでの判定がＮ
Ｏ）ことになるので、処理をステップＳ３０９へ進め
て、量子化閾値ＴＨとして「５０」が設定される。そし
て、続くステップＳ３１０において、この量子化閾値Ｔ
Ｈを用いた量子化処理が行なわれ、量子化された画像
は、画像処理装置ｌ内の記憶部１０２に格納される。Here, the skew and Sk are not less than "-1.0" and not more than "1.0".
7, the condition of S308 is not satisfied (the determination there is N
O), the process proceeds to step S309, and “50” is set as the quantization threshold TH. Then, in the subsequent step S310, the quantization threshold T
The quantization process using H is performed, and the quantized image is stored in the storage unit 102 in the image processing apparatus l.

【００４２】この量子化は、量子化閾値ＴＨよりも小さ
な領域頻度の平均値を代表値１とし、量子化閾値ＴＨよ
りも大さな領域頻度の平均値を代表値２とし、この２つ
の値で量子化を行なうものである。ただし、代表値とし
ては、量子化閾値ＴＨよりも小さな領域頻度と、量子化
閾値ＴＨよりも大きな領域頻度の特徴量を表わすものな
らば何でもよく、例えば、平均値の代わりに中央値であ
っても構わない。In this quantization, the average value of the region frequency smaller than the quantization threshold value TH is set as the representative value 1, the average value of the region frequency larger than the quantization threshold value TH is set as the representative value 2, and these two values are used. Performs quantization. However, any representative value may be used as long as it represents a feature value of a region frequency smaller than the quantization threshold value TH and a region frequency larger than the quantization threshold value TH. For example, a median value may be used instead of the average value. No problem.

【００４３】＜像域分離処理の説明＞以下、本量子化結
果を用いた像域分離処理（図２のステップＳ２０３）を
図８に示すフローチャートを参照して詳細に説明する。<Description of Image Area Separation Processing> The image area separation processing (step S203 in FIG. 2) using the quantization result will be described in detail with reference to the flowchart shown in FIG.

【００４４】まず、図８のステップＳ８０１において、
量子化画像を入力して、それを記憶部１０２に格納す
る。ステップＳ８０２では、ｍ×ｎ画素が１画素となる
ように入力画像を間引き、像域分離用画像を生成する。
このとき、ｍ×ｎ画素中に１つでも黒画素が存在してい
れば、この画素を黒の１画素とする。そして、ステップ
Ｓ８０３では、像域分離用画像の全画素について、黒画
素が、上下、左右、斜め方向に所定数、連続している領
域を一つの領域として、領域分割を行なう。その際、領
域の検出順に番号を付すことにより、各領域に対するラ
ベル付けを行なう。First, in step S801 of FIG.
A quantized image is input and stored in the storage unit 102. In step S802, the input image is thinned such that m × n pixels become one pixel, and an image for image area separation is generated.
At this time, if at least one black pixel exists in the m × n pixels, this pixel is regarded as one black pixel. Then, in step S803, for all pixels of the image area separation image, area division is performed with a predetermined number of black pixels continuing vertically, horizontally, and diagonally as one area. At this time, labels are assigned to the respective regions by assigning numbers to the regions in the order of detection.

【００４５】次に、ステップＳ８０４において、各領域
の幅、高さ、面積領域内の黒画素密度により領域を分類
し、属性のラベル付けを行なう。領域の属性には、例え
ば、「テーブル」、「外枠領域」、「テキスト」等があ
る。そして、ステップＳ８０５では、「テキスト」とラ
ベル付けされた全ての領域の幅と高さの平均を算出し、
得られた平均幅が平均高さより大きい場合には、処理画
像は横書きであるとみなし、逆の場合は縦下記とみなす
ことにより、文字組を判断する。同時に、横書きならば
平均高さを、横書きならば平均幅をもって、一文字の文
字サイズとする。Next, in step S804, the regions are classified according to the width and height of each region and the density of black pixels in the area, and the attributes are labeled. Area attributes include, for example, “table”, “outer frame area”, “text”, and the like. Then, in step S805, the average of the width and height of all the regions labeled "text" is calculated,
When the obtained average width is larger than the average height, the character set is determined by regarding the processed image as horizontal writing, and conversely, regarding the processed image as vertical or lower. At the same time, the average height is used for horizontal writing, and the average width is used for horizontal writing, which is the size of one character.

【００４６】また、像域分離用画像上の縦方向（横書き
のとき）、または横方向（縦書きのとき）の「テキス
ト」領域全てのヒストグラムから、文章の段組み、行間
隔が検出される。ステップＳ８０６では、「テキスト」
領域において、文字サイズが大きい領域については「タ
イトル」とする。Further, from the histograms of all the "text" areas in the vertical direction (when writing horizontally) or in the horizontal direction (when writing vertically) on the image for separating the image area, columns of sentences and line intervals are detected. . In step S806, "text"
In the area, an area having a large character size is referred to as a “title”.

【００４７】ところが、従来の２値化画像による領域判
定では、タイトルと判別された領域の背景にタイトル強
調を意味する帯が存在していても背景情報が失われてい
るため、その存在を理解することは出来ない。同様の理
由で、タイトル文字自身に色付けされていても単に「黒
文字」として判定されてしまう。しかしながら、タイト
ルの背景に帯を入れたり、タイトル文字に色を付けるの
はドキュメント作成者がそのタイトルを他のタイトルと
差別化したい意図の現れにもかかわらず、全て同じ「タ
イトル」と判定してしまうが従来法の弱点であった。However, in the conventional area determination based on the binarized image, even if a band indicating title emphasis exists in the background of the area determined as the title, the background information is lost. I can't do that. For the same reason, even if the title character itself is colored, it is simply determined as "black character". However, adding a band to the background of the title or coloring the title character is determined to be the same "Title" regardless of the intention that the document creator wants to distinguish the title from other titles It was a weak point of the conventional method.

【００４８】本量子化画像を領域分離用画像に用いる最
大の利点は正にこの点の改善にある。例えば原画中のタ
イトルが図９の９０１のように、タイトル文字が８ビッ
ト表現で２００のグレー文字、背景が６４のグレー帯で
表現されていた場合、従来の２値化画像では、タイトル
文字色と背景色の有無に関わらず図９の９０２のように
２値化されるため文字色ならびに背景色情報が２値化の
時点で欠落してしまう。The greatest advantage of using the present quantized image as an image for segmentation lies in the improvement of this point. For example, when the title in the original image is represented by 200 gray characters in an 8-bit expression and the background is represented by a gray band of 64, as shown in 901 in FIG. Regardless of whether there is a background color or not, the character color and the background color information are lost at the time of the binarization, as shown in 902 in FIG.

【００４９】一方、本量子化では、図９の９０３のよう
に、例えば６４×６４画素ブロック単位の左ブロックで
算出された量子化閾値ＴＨの値よりも小さい領域、この
場合“ＴＩＴＬＥ”という文字の“Ｔ”文字領域の画素
平均値ＢＢＶは６４と算出されてる。同様に同ブロック
中、量子化閾値ＴＨの値よりも大きい領域の平均値、こ
の場合背景色に相当する領域の平均値ＷＢＶは２００と
算出され、このＢＢＶおよぴＷＢＶを通常の２値化画像
に加え６４×６４画素ブロック単位ごと添付するため、
タイトル文字色と背景色がかなり判別出来るようになっ
ている。On the other hand, in this quantization, as shown by 903 in FIG. 9, for example, an area smaller than the value of the quantization threshold TH calculated in the left block of 64 × 64 pixel block unit, in this case, the character “TITLE” The pixel average value BBV of the “T” character area is calculated to be 64. Similarly, in the same block, the average value of an area larger than the value of the quantization threshold value TH, in this case, the average value WBV of the area corresponding to the background color is calculated as 200, and the BBV and. In order to attach each 64 × 64 pixel block in addition to the image,
The title text color and the background color can be distinguished considerably.

【００５０】本実施形態では、タイトルと判定された領
域に対し、図１０に示す処理を施す。まず、ステップＳ
１００１では、上述のように量子化された画像情報のみ
使用して領域判定を行なう。ここで「タイトル」と判定
された領域に対しステップＳ１００２の処理を実行す
る。尚、図中の記号ＷＢＶ，ＢＢＶ，ＰＷ，ＰＢは以下
を表わしている。In this embodiment, the processing shown in FIG. 10 is performed on the area determined to be the title. First, step S
In step 1001, the area is determined using only the image information quantized as described above. Here, the process of step S1002 is performed on the area determined to be “title”. The symbols WBV, BBV, PW, PB in the figure represent the following.

【００５１】ＷＢＶ：ブロック単位で算出された量子化
閾値ＴＨの値よりも大きい領域の平均値ＢＢＶ：ブロック単位で算出された量子化閾値ＴＨの値
よりも小さい領域の平均値ＰＷ：原画の下地領域の代表値ＰＢ：原画の文字領域の代表値ＷＢＶとＢＢＶは本量子化の際、例えば６４×６４画素
寮ブロック単位ごと２値化情報に加え添付されている。
一方、ＰＷは原画の下地の代表値で、スキャナで読み込
んだ際の紙の白色濃度を意味する。ＰＷはスキャナの機
器間差や、紙種によって多少ばらつきがあるため前もっ
て何種類かのサンプルを基にセッティングしておく。Ｐ
Ｂは原画の文字領域の代表値で、スキャナで読み込んだ
際の紙上に印刷された文字濃度を意味する。ＰＢもＰＷ
同様スキャナの機器間差や、紙種によって多少ばらつき
があるため前もって何種類かのサンプルを基にセッティ
ングしておく。WBV: Average value of an area larger than the quantization threshold value TH calculated in block units BBV: Average value of an area smaller than the quantization threshold value TH calculated in block units PW: Base of original image Area representative value PB: Representative value of character area of original picture WBV and BBV are attached in addition to the binarization information for each 64 × 64 pixel dormitory block at the time of main quantization.
On the other hand, PW is a representative value of the base of the original image, and means the white density of the paper when read by the scanner. The PW is set in advance based on several types of samples because there is a slight variation between scanner devices and paper types. P
B is a representative value of the character area of the original image, and means the density of the character printed on the paper when read by the scanner. PB is also PW
Similarly, since there is some variation between scanner devices and paper types, the setting is made in advance based on several types of samples.

【００５２】先ず、ＷＢＶとＰＷを比較し、ＷＢＶがＰ
Ｗより小さい場合、このブロックの背景色は紙に下地よ
りも濃い背景色が存在すると推測する。次にＢＢＶとＰ
Ｂとを比較し、ＢＢＶがＰＢよりも大きい場合、このブ
ロックに印刷された文字濃度は通常の黒文字濃度よりも
薄い色文字と推測する。従って、ＷＢＶがＰＷより小さ
い、もしくはＷＢＢがＰＢよりも大きい場合、この領域
のタイトルにはタイトルを強調する背景色か文字色が存
在すると判定し、ステップＳ１００３に分岐し、それ以
外はステップＳ１００４に進む。これに対してステップ
Ｓ１００３では「強調タイトル」としてラベル付けを行
い、ステップＳ１００４では「通常タイトル」としてラ
ベル付けを行う。First, WBV and PW are compared, and WBV is
If it is smaller than W, it is assumed that the background color of this block exists on paper with a background color darker than the background. Next, BBV and P
B is compared, and if BBV is greater than PB, it is assumed that the character density printed in this block is a color character lighter than the normal black character density. Therefore, if WBV is smaller than PW or WBB is larger than PB, it is determined that the title in this area has a background color or a character color that emphasizes the title, and the process branches to step S1003. Otherwise, the process proceeds to step S1004. move on. On the other hand, in step S1003, labeling is performed as "highlighted title", and in step S1004, labeling is performed as "normal title".

【００５３】そして、ステップＳ８０７では、何の関連
もなく、ばらばらに存在したままの「タイトル」領域、
「テキスト」領域を、周りの領域との間隔に応じて併合
し、一つのまとまった領域とする。Then, in step S807, the "title" area which has been present independently without any relation,
The “text” area is merged according to the interval with the surrounding area to form one integrated area.

【００５４】次に、ステップＳ８０８において、各領域
毎に属性、原画像における座標や大きさ等の領域データ
を出力する。以上の処理を行なうことにより、量子化画
像の像域分離処理を行ない、領域データが得られる。Next, in step S808, area data such as an attribute and coordinates and size in the original image is output for each area. By performing the above processing, image area separation processing of the quantized image is performed, and area data is obtained.

【００５５】図１１は、上述した領域データの例を示す
図である。同図に示す各領域データ項目について、以下
説明する。・「番号」：領域の検出順序を示す。・「属性」：領域の属性情報を示し、以下に示す９通り
が用意されている。FIG. 11 is a diagram showing an example of the above-mentioned area data. Each area data item shown in FIG. "No.": Indicates the detection order of the areas. “Attribute”: Indicates attribute information of the area, and the following nine patterns are prepared.

【００５６】「ルート」入力画像そのものである
ことを示す。"Route" Indicates that the image is the input image itself.

【００５７】「テキスト」文字であることを示す。"Text" Indicates a character.

【００５８】「強調タイトル」強調された見出し領域で
あることを示す。"Emphasis title" indicates that the title area is emphasized.

【００５９】「通常タイトル」通常の見出し領域である
ことを示す。"Normal title" indicates a normal heading area.

【００６０】「テーブル」表領域であることを示
す。"Table" Indicates a table area.

【００６１】「ノイズ領域」文字とも画像とも判断で
きなかった領域であることを示す。「外枠領域」罫線などの領域であることを示す。"Noise area" Indicates an area that could not be determined as either a character or an image. "Outer frame area" Indicates an area such as a ruled line.

【００６２】「写真画像」写真領域であることを示
す。"Photo image" Indicates a photo area.

【００６３】「線画像」線画像領域であることを
示す。・「始点座標」：原画像における領域開始のＸ，Ｙ座標
を示す。・「終点座標」：原画像における領域終了のＸ，Ｙ座標
を示す。・「画素数」：領域内の全画素数を示す。・「文字組情報」：縦書き、横書き、不明の３通りの文
字組情報を示す。"Line image" Indicates a line image area. "Start point coordinates": X and Y coordinates of the area start in the original image. "End point coordinates": X and Y coordinates of the end of the area in the original image. "Number of pixels": Indicates the total number of pixels in the area. "Character set information": Indicates three types of character set information: vertical writing, horizontal writing, and unknown.

【００６４】図１１に示す領域データについて、その
「属性」が「テキスト」で示される領域のみ、図８のス
テップＳ８０７における併合前の、行に関する領域デー
タ（行領域データ）を階層的に保持している。In the area data shown in FIG. 11, only the area whose “attribute” is indicated by “text” hierarchically holds the area data (row area data) relating to the row before the merging in step S807 in FIG. ing.

【００６５】ここでは、上述のように像域分離処理が行
なわれるが、図１１に示した領域データは、本実施の形
態を適用した一例に過ぎず、画像処理装置に応じて、例
えば他の情報を適宜追加しても良いし、あるいは減らし
ても良い。Here, the image area separation processing is performed as described above. However, the area data shown in FIG. 11 is merely an example to which the present embodiment is applied, and for example, depending on the image processing apparatus, another area data may be used. Information may be added or reduced as appropriate.

【００６６】以上説明したように、本実施の形態によれ
ば、入力された多値画像の輝度頻度とその偏りを示すス
キュー値が、所定値まで収束するようにして量子化閾値
を決定し、その量子化閾値をもとに量子化を行なうこと
で、画像内の背景と対象物とを分離するために最も適し
た閾値が存在する領域を特定した後、この特定領域の平
均輝度値をもって量子化処理ができ、これにより、多値
入力画像上の領域内における各画素の輝度値を背景と対
象物との２つのクラスに分類する際の最適値を容易に求
めることができるとともに、高精細なＯＣＲ処理が実行
可能となる。As described above, according to the present embodiment, the quantization threshold is determined such that the luminance frequency of the input multi-valued image and the skew value indicating the deviation converge to a predetermined value. By performing quantization based on the quantization threshold, an area where a threshold most suitable for separating a background and an object in an image is specified, and then the average brightness value of this specific area is used to perform quantization. This makes it possible to easily obtain the optimum value for classifying the luminance value of each pixel in the area on the multi-valued input image into two classes, the background and the object, and to achieve high definition. OCR processing can be executed.

【００６７】なお、上述の実施の形態において、入力さ
れる画像は、８ビットの多値画像データとしたが、本発
明はこれに限定されるものではなく、例えば、カラー画
像等、量子化するために画像情報として複数ビットの情
報があれば良い。また、統計量であるスキュー値Ｓｋの
収束条件を±１．０としたが、これに限定されるもので
はない。換言すれば、スキュー値Ｓｋを用いて２値化の
閾値を決定するように構成されていれば良い。In the above-described embodiment, the input image is 8-bit multi-valued image data. However, the present invention is not limited to this. Therefore, it is sufficient if there is information of a plurality of bits as image information. Further, the convergence condition of the skew value Sk, which is a statistic, is ± 1.0, but is not limited thereto. In other words, it is sufficient that the threshold value for binarization is determined using the skew value Sk.

【００６８】本発明は、複数の機器（例えば、ホストコ
ンピュータ，インタフェイス機器，リーダ，プリンタな
ど）から構成されるシステムに適用しても、一つの機器
からなる装置（例えば、複写機，ファクシミリ装置な
ど）に適用してもよい。The present invention can be applied to a system composed of a plurality of devices (for example, a host computer, an interface device, a reader, a printer, etc.), but can also be applied to a single device (for example, a copying machine, a facsimile machine). Etc.).

【００６９】また、本発明の目的は前述した実施形態の
機能を実現するソフトウェアのプログラムコードを記録
した記憶媒体を、システム或いは装置に供給し、そのシ
ステム或いは装置のコンピュータ（ＣＰＵ若しくはＭＰ
Ｕ）が記憶媒体に格納されたプログラムコードを読出し
実行することによっても、達成されることは言うまでも
ない。Further, an object of the present invention is to supply a storage medium storing a program code of software for realizing the functions of the above-described embodiments to a system or an apparatus, and to provide a computer (CPU or MP) of the system or apparatus.
It goes without saying that U) can also be achieved by reading and executing the program code stored in the storage medium.

【００７０】この場合、記憶媒体から読出されたプログ
ラムコード自体が前述した実施形態の機能を実現するこ
とになり、そのプログラムコードを記憶した記憶媒体は
本発明を構成することになる。In this case, the program code itself read from the storage medium implements the functions of the above-described embodiment, and the storage medium storing the program code constitutes the present invention.

【００７１】プログラムコードを供給するための記憶媒
体としては、例えばフロッピーディスク，ハードディス
ク，光ディスク，光磁気ディスク，ＣＤ−ＲＯＭ，ＣＤ
−Ｒ，磁気テープ，不揮発性のメモリカード，ＲＯＭな
どを用いることができる。As a storage medium for supplying the program code, for example, a floppy disk, hard disk, optical disk, magneto-optical disk, CD-ROM, CD
-R, a magnetic tape, a nonvolatile memory card, a ROM, or the like can be used.

【００７２】また、コンピュータが読出したプログラム
コードを実行することにより、前述した実施形態の機能
が実現されるだけでなく、そのプログラムコードの指示
に基づき、コンピュータ上で稼働しているＯＳ（オペレ
ーティングシステム）などが実際の処理の一部又は全部
を行い、その処理によって前述した実施形態の機能が実
現される場合も含まれることは言うまでもない。When the computer executes the readout program code, not only the functions of the above-described embodiment are realized, but also the OS (Operating System) running on the computer based on the instruction of the program code. ) May perform some or all of the actual processing, and the processing may realize the functions of the above-described embodiments.

【００７３】更に、記憶媒体から読出されたプログラム
コードが、コンピュータに挿入された機能拡張ボードや
コンピュータに接続された機能拡張ユニットに備わるメ
モリに書込まれた後、そのプログラムコードの指示に基
づき、その機能拡張ボードや機能拡張ユニットに備わる
ＣＰＵなどが実際の処理の一部又は全部を行い、その処
理によって前述した実施形態の機能が実現される場合も
含まれることは言うまでもない。Further, after the program code read from the storage medium is written into a memory provided in a function expansion board inserted into the computer or a function expansion unit connected to the computer, based on the instruction of the program code, It goes without saying that the CPU included in the function expansion board or the function expansion unit performs part or all of the actual processing, and the functions of the above-described embodiments are realized by the processing.

【００７４】[0074]

【発明の効果】以上説明したように、本発明によれば、
多値入力画像の輝度頻度とその分布の偏りとに基づい
て、輝度頻度が極小となる領域を特定し、特定された領
域の平均輝度値を量子化閾値として量子化を行なうこと
により、画像内の対象物と背景濃度の間に適切な閾値を
設定でき、ブロック歪みの発生を抑えた画像を得ること
ができる。As described above, according to the present invention,
Based on the luminance frequency of the multi-valued input image and the bias of its distribution, an area where the luminance frequency is minimal is specified, and quantization is performed using the average luminance value of the specified area as a quantization threshold, thereby obtaining an image. An appropriate threshold value can be set between the target object and the background density, and an image in which occurrence of block distortion can be suppressed can be obtained.

【００７５】[0075]

[Brief description of the drawings]

【図１】本実施例における画像処理システムの構成を示
すブロック図である。FIG. 1 is a block diagram illustrating a configuration of an image processing system according to an embodiment.

【図２】本実施例における像域分離ＯＣＲ処理を示すフ
ローチャートである。FIG. 2 is a flowchart illustrating an image area separation OCR process according to the present embodiment.

【図３】本実施例における２値化処理を示すフローチャ
ートである。FIG. 3 is a flowchart illustrating a binarization process in the embodiment.

【図４】図３に示すステップＳ９での量子化処理を説明
するための図である。FIG. 4 is a diagram for explaining a quantization process in step S9 shown in FIG. 3;

【図５】量子化結果を説明するための図である。FIG. 5 is a diagram for explaining a quantization result.

【図６】本実施例における画像のヒストグラムの例を示
す図である。FIG. 6 is a diagram illustrating an example of an image histogram according to the present embodiment.

【図７】本実施例における２値化処理の各変数値の変換
例を示す図である。FIG. 7 is a diagram illustrating an example of conversion of each variable value in a binarization process in the embodiment.

【図８】本実施例における像域分離処理を示すフローチ
ャートである。FIG. 8 is a flowchart illustrating an image area separation process according to the present embodiment.

【図９】本実施例における強調タイトルを説明するため
の図である。FIG. 9 is a diagram for explaining emphasized titles in the embodiment.

【図１０】本実施例におけるタイトル判定処理を示すフ
ローチャートである。FIG. 10 is a flowchart illustrating a title determination process in the embodiment.

【図１１】本実施例における領域データの例を示す図で
ある。FIG. 11 is a diagram illustrating an example of area data according to the present embodiment.

[Explanation of symbols]

１画像処理装置２画像入力装置３画像表示装置１０１入力部１０２記憶部１０３輝度頻度累計部１０４量子化閾値算出部１０５量子化部１０６領域分離部１０７文字認識部１０８画像処理部１０９出力部 REFERENCE SIGNS LIST 1 image processing device 2 image input device 3 image display device 101 input unit 102 storage unit 103 luminance frequency accumulation unit 104 quantization threshold calculation unit 105 quantization unit 106 area separation unit 107 character recognition unit 108 image processing unit 109 output unit

Claims

[Claims]

1. An image processing apparatus for performing image processing by quantizing a multi-valued image, comprising: first calculating means for calculating a luminance frequency of the multi-valued image; Specifying means for specifying a quantization threshold, based on the specified quantization threshold and the luminance frequency,
An image, comprising: second calculating means for calculating a representative value used for quantization of the multi-valued image; and quantizing means for quantizing the multi-valued image using the calculated representative value. Processing equipment.

2. The method according to claim 1, wherein the quantization threshold is an average luminance value when the histogram distribution is converged such that the bias of the luminance frequency histogram distribution falls within a predetermined range. The image processing apparatus according to any one of the preceding claims.

3. The image processing apparatus according to claim 1, wherein the representative value is an average luminance value in each distribution area of a luminance frequency histogram distribution divided by the quantization threshold.

4. The method according to claim 1, wherein the representative value is a central luminance value in each distribution area of a luminance frequency histogram distribution divided by the quantization threshold.
The image processing apparatus according to any one of the preceding claims.

5. The image processing apparatus according to claim 1, further comprising means for performing image area separation of the image quantized by said quantization means, and outputting area data including an attribute of the image area separated area. The image processing apparatus according to any one of the preceding claims.

6. The image processing apparatus according to claim 5, further comprising means for judging whether or not the attribute of the area is a title when the attribute is a title.

7. An image processing method for performing image processing by quantizing a multi-valued image, wherein a luminance frequency of the multi-valued image is calculated, and a quantization threshold value for quantization is set based on the calculated luminance frequency. Identifying, based on the identified quantization threshold and the luminance frequency,
An image processing method, comprising: calculating a representative value used for quantization of the multi-valued image; and quantizing the multi-valued image using the calculated representative value.

8. The method according to claim 7, wherein the quantization threshold is an average luminance value when the histogram distribution is converged such that the deviation of the histogram distribution of the luminance frequency is within a predetermined range. The image processing method described in the above.

9. The image processing method according to claim 7, wherein the representative value is an average luminance value in each distribution area of a luminance frequency histogram distribution divided by the quantization threshold.

10. The image processing method according to claim 7, wherein the representative value is a central luminance value in each distribution area of a luminance frequency histogram distribution divided by the quantization threshold.

11. The image processing method according to claim 7, further comprising the step of performing image area separation of the image quantized in the quantization step, and outputting area data including an attribute of the image area separated area. The image processing method described in the above.

12. The image processing method according to claim 11, further comprising the step of determining whether the attribute of the tack area is a title if the attribute is a title.

13. A computer-readable storage medium storing a program code of an image processing method, comprising: a code of a step of calculating a luminance frequency of the multi-valued image; Code of the step of specifying a quantization threshold, Based on the specified quantization threshold and the luminance frequency,
A storage medium, comprising: a code for calculating a representative value used for quantization of the multi-valued image; and a code for quantizing the multi-valued image using the calculated representative value.