JPH064663A

JPH064663A - Picture data binarization device

Info

Publication number: JPH064663A
Application number: JP4156879A
Authority: JP
Inventors: Mitsuaki Matatsuma; 光明俣妻
Original assignee: Seiko Instruments Inc
Current assignee: Seiko Instruments Inc
Priority date: 1992-06-16
Filing date: 1992-06-16
Publication date: 1994-01-14

Abstract

PURPOSE:To obtain binary picture data by calculating an optimum threshold coincident with the visual sense characteristic of a human body an/d binarizing a multilevel picture data on the threshold. CONSTITUTION:A binarized error calculation means 2 calculates a binarized error occurred by binarizing multilevel picture data as a square error for the respective thresholds. A degree of complexity calculation means 3 obtains the degree of complexity of a binary picture obtained by binarizing multilevel picture data for the respective thresholds. An optimum threshold calculation means 4 calculates the threshold whose square error becomes minimum and the threshold and whose degree of complexity becomes minimum and calculates a complexity minimum threshold nearest to the minimum threshold of the square error as the optimum threshold from the both values. Multilevel picture data is binarized to binary picture data by the threshold. Thus, the occurrence of noise and the notches of a boundary part can be prevented in a character picture and a graphic picture after a binarization processing, and the binary picture of high quality can be obtained.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、多値の画像データを
２値の画像データに変換し、特に多値の文字画像データ
に対して２値化処理を行う画像データ２値化装置に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an image data binarization device for converting multi-valued image data into binary image data, and particularly for binarizing multi-valued character image data.

【０００２】[0002]

【従来の技術】従来の多値の画像データを２値化する技
術では、多値画像における信号レベルのヒストグラムを
算出し、そのヒストグラムの谷を閾値として、２値化す
るものがあった。また、この種の方法として、多値画像
の微分あるいはラプラシアンをとり、ヒストグラムの谷
を顕著化して、２値化する方法もあった。この他にも、
２値化で生成される２値画像の最適さのパラメータとし
て、多値画像と２値画像の２乗誤差を算出し、２乗誤差
が最小となる閾値を探すという方法もあった。また、こ
れと類似した考えで、人間が２値画像を見るときは、で
きるだけ単純なパターンとしてとらえようとする知覚特
性（ゲシタルト原理）を利用して、最適さのパラメータ
を２値画像の複雑度としたものがあった。2. Description of the Related Art A conventional technique for binarizing multi-valued image data has been one in which a histogram of signal levels in a multi-valued image is calculated and binarized using the valley of the histogram as a threshold. In addition, as a method of this kind, there is also a method of taking a differential or a Laplacian of a multivalued image to make the valleys of the histogram noticeable and binarizing it. Besides this,
There has also been a method of calculating a squared error between a multivalued image and a binary image as a parameter of the optimality of a binary image generated by binarization, and searching for a threshold value that minimizes the squared error. Also, based on a similar idea, when a human sees a binary image, he uses the perceptual characteristics (the gestalt principle) that he tries to capture as a simple pattern as much as possible, and sets the parameter of the optimality to the complexity of the binary image. There was a thing.

【０００３】このような従来の技術は、電子通信学会論
文誌「判別および最小２乗基準に基づく自動しきい値選
定法」，大津，’８０／４，Ｖｏｌ．Ｊ６３−Ｄ，Ｎ
ｏ．４，ｐ３４９〜３５６、または、電子通信学会論文
誌「２値画像の複雑さと多値画像の閾値処理に関する考
察」，谷口、河口，’８７／４，Ｖｏｌ．Ｊ７０−Ｄ，
Ｎｏ．１，ｐ１６４〜１７３等に開示されている。Such a conventional technique is disclosed in the journal of the Institute of Electronics and Communication Engineers, "Automatic threshold selection method based on discrimination and least squares standard", Otsu, '80 / 4, Vol. J63-D, N
o. 4, p349-356, or the Institute of Electronics and Communication Engineers, "A Study on Complexity of Binary Images and Thresholding of Multivalued Images," Taniguchi, Kawaguchi, 1987, Vol. J70-D,
No. 1, p164-173, etc.

【０００４】[0004]

【発明が解決しようとする課題】しかし、従来の方法
で、ヒストグラムの谷底を閾値とする方法は、生成され
る２値画像が視覚的に最適となっているかどうかという
観点で閾値が算出されていなかった。However, in the conventional method in which the valley bottom of the histogram is used as the threshold value, the threshold value is calculated from the viewpoint of whether or not the generated binary image is visually optimum. There wasn't.

【０００５】また、別の方法で、最適さのパラメータを
多値画像データと２値画像データの最小２乗誤差とする
方法があるが、これは、人間の視覚特性に根拠をおいた
ものではない。そして、最小２乗付近は、値のゆらぎと
して、極小値が数多くあり、単純に最小値を選んでも、
最適な閾値とは限らなかった。Another method is to use the least square error between the multi-valued image data and the binary image data as the optimum parameter, but this is not based on human visual characteristics. Absent. Then, there are many local minimum values as fluctuations in the value near the least square, and even if the minimum value is simply selected,
It was not always the optimal threshold.

【０００６】さらに、別の方法で、最適さのパラメータ
を複雑度とした方法については、人間の視覚特性に根拠
がおかれているものの、現実にこの方法で２値化する
と、複雑度が最小の閾値の場合が最適な２値画像となる
ため、単純に複雑度最小を閾値とすると２値化された画
像が真っ白か真っ黒となってしまう問題がある。この問
題を避けるため、複雑度の谷を検出して閾値とすること
になるが、実際に２値化するとき、複雑度の最小値付近
では、最小２乗誤差の方法と同様、数多くの極小値が生
じ、単純な最小値では、最適な閾値とは限らなかった。Further, regarding another method in which the parameter of the optimality is set to complexity, the human visual characteristics are based, but when binarization is actually performed by this method, the complexity is minimized. Since the threshold value of 2 is the optimum binary image, there is a problem that the binary image becomes pure white or pure black when the minimum complexity is simply set as the threshold value. To avoid this problem, the valley of the complexity is detected and used as a threshold value. However, when actually binarizing, in the vicinity of the minimum value of the complexity, as in the method of the least square error, many local minima are generated. A value occurred and a simple minimum was not always the optimal threshold.

【０００７】以上に述べたように、従来の方法では、最
適な閾値が決められず、このため、生成される２値画像
にノイズが生じたり、文字や線画の境界部にギザギザが
生じるなど、２値画像の品質が劣化するという課題があ
った。そこで、この発明の目的は、人間の視覚特性に合
致した最適な２値化閾値を決定する画像データ２値化装
置を提供するものである。As described above, according to the conventional method, the optimum threshold value is not determined, so that noise is generated in the generated binary image or jaggedness occurs in the boundary portion of characters or line drawings. There is a problem that the quality of the binary image deteriorates. Therefore, an object of the present invention is to provide an image data binarization apparatus that determines an optimal binarization threshold value that matches human visual characteristics.

【０００８】[0008]

【課題を解決するための手段】上記課題を解決するため
に、この発明は、多値画像データをある閾値で２値化す
る画像データ２値化装置において、前記多値画像データ
を２値化して生じる２値化誤差を算出する２値化誤差算
出手段と、前記多値画像データを２値化して得られる２
値画像の複雑度を算出する複雑度算出手段と、前記２値
化誤差及び前記複雑度より２値画像が最適となる閾値を
算出する最適閾値算出手段と、前記最適閾値を記憶する
閾値記憶手段と、前記閾値記憶手段に記憶される閾値に
よって多値画像データを２値画像データに２値化する２
値化手段とを有する構成とした。In order to solve the above-mentioned problems, the present invention is an image data binarizing device which binarizes multi-valued image data with a certain threshold value. A binarization error calculating means for calculating a binarization error that occurs, and 2 obtained by binarizing the multi-valued image data.
A complexity calculating means for calculating the complexity of the value image, an optimum threshold calculating means for calculating a threshold that optimizes the binary image based on the binarization error and the complexity, and a threshold storing means for storing the optimum threshold. And binarizing the multivalued image data into binary image data according to the threshold value stored in the threshold value storage means 2
It is configured to have a quantizing means.

【０００９】[0009]

【作用】上記のように構成された画像データ２値化装置
においては、多値画像データを２値化して生じる２値化
誤差と、多値画像データを２値化して得られる２値画像
の複雑度の両方から、２値画像が最適となる閾値を算出
し、その閾値によって多値画像データを２値画像データ
に２値化する。In the image data binarizing apparatus configured as described above, the binarization error generated by binarizing the multivalued image data and the binary image obtained by binarizing the multivalued image data A threshold that optimizes the binary image is calculated from both of the degrees of complexity, and the multivalued image data is binarized into the binary image data by the threshold.

【００１０】[0010]

【実施例】以下に、この発明の実施例を図面に基づいて
説明する。図１は、本発明の画像データ２値化装置の実
施例を示すブロック図である。図１において、２値化誤
差算出手段２では、多値画像データ入力手段１から得ら
れた２値化しようとする多値画像データから、各閾値に
おける２値化誤差を算出する。多値画像データは、例え
ば、文字画像、図形画像等を読み取った８ビットの画像
データである。この２値化誤差の算出方法を、図２のフ
ローチャートを用いて説明する。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing an embodiment of an image data binarizing apparatus of the present invention. In FIG. 1, the binarization error calculation means 2 calculates the binarization error at each threshold from the multivalued image data to be binarized obtained from the multivalued image data input means 1. The multi-valued image data is, for example, 8-bit image data obtained by reading a character image, a graphic image, or the like. A method of calculating the binarization error will be described with reference to the flowchart of FIG.

【００１１】ステップ２００では、多値画像データの入
力が１回目かどうかを判断する。１回目であれば、ステ
ップ２０１に行き、そうでなければ、ステップ２０３に
いく。ステップ２０１では、多値画像データより、横軸
を信号レベルとし、縦軸を各信号レベルにおける画素数
とする濃度ヒストグラムを作成する。このヒストグラム
は、文字、線画の画像データの場合、図３に示されるよ
うな２つの山をもつ双峰形をなす。In step 200, it is judged whether or not the input of multi-valued image data is the first time. If it is the first time, go to step 201. If not, go to step 203. In step 201, a density histogram in which the horizontal axis represents the signal level and the vertical axis represents the number of pixels at each signal level is created from the multi-valued image data. In the case of image data of characters and line drawings, this histogram has a bimodal shape having two peaks as shown in FIG.

【００１２】ステップ２０２では、ステップ２０１が作
成するヒストグラムにおいて、２つの山の頂上にあたる
信号レベルを読みとる。例えば、図３では、信号レベル
の低い側の山の頂上の信号レベルをＣ１、信号レベルの
高い側の山の頂上の信号レベルをＣ２として、読みとっ
ている。In step 202, the signal levels at the tops of the two peaks are read in the histogram created in step 201. For example, in FIG. 3, the signal level on the top of the mountain on the low signal level side is C1, and the signal level on the mountain top on the high signal level side is C2.

【００１３】ステップ２０３では、閾値θで多値画像デ
ータを式１に従い、２値化する。ｇｉ＝Ｃ１｛ｆｉ＜θ ｝Ｃ２｛ｆｉ＞＝θ｝ ……… （式１）ここで、多値画像データにおけるある画素の信号レベル
をｆｉ（添字ｉは画素の位置を示す）、式１の結果得ら
れる２値画像データの信号レベルをｇｉとする。θは、
初期値は０である。後でも述べるが、このステップ２０
３は複数回繰り返され、その度にθは１づつ加算され
る。In step 203, the multi-valued image data is binarized according to the equation 1 with the threshold value θ. gi = C1 {fi <θ} C2 {fi> = θ} (Equation 1) Here, the signal level of a certain pixel in the multi-valued image data is represented by fi (subscript i indicates the pixel position), Equation 1 Let gi be the signal level of the binary image data obtained as a result. θ is
The initial value is 0. As will be described later, this step 20
3 is repeated a plurality of times, and θ is incremented by 1 each time.

【００１４】ステップ２０４では、多値画像とステップ
２０３による閾値θの２値画像における２乗誤差平均ｅ
（θ）を算出する。計算式は、式２のようである。ここで、Ｎは、全画素数である。In step 204, the mean squared error e of the multivalued image and the binary image of the threshold value θ obtained in step 203.
Calculate (θ). The calculation formula is like Formula 2. Here, N is the total number of pixels.

【００１５】ステップ２０５では、ステップ２０３の閾
値θが信号レベルの最大値Ｍ未満か確認する。Ｍは、例
えば、信号が８ビットであれば、２５５である。θがＭ
未満の場合は、ステップ２０６にいく。θがＭ以上の場
合は、ステップ２０７にいく。In step 205, it is confirmed whether the threshold value θ in step 203 is less than the maximum value M of the signal level. M is 255, for example, if the signal is 8 bits. θ is M
If less, go to step 206. If θ is equal to or greater than M, go to step 207.

【００１６】ステップ２０６では、閾値θに１を加え
て、ステップ２０３で２値化するための新たな閾値θと
する。ステップ２０７では、各閾値θ（０〜Ｍ）におけ
る２乗誤差平均ｅ（θ）が生成される。このｅ（θ）が
２値化誤差となる。これを横軸θ、縦軸ｅ（θ）で図示
すると、通常、図４のように、θが０とＭの位置で山、
中央が谷で、谷の部分は、数多くの極小値がある曲線と
なる。At step 206, 1 is added to the threshold value θ to obtain a new threshold value θ for binarization at step 203. In step 207, the mean squared error e (θ) at each threshold θ (0 to M) is generated. This e (θ) becomes a binarization error. If this is illustrated by the horizontal axis θ and the vertical axis e (θ), normally, as shown in FIG. 4, the peaks at the positions where θ is 0 and M,
The valley is in the center and the valley is a curve with many local minima.

【００１７】図１において、複雑度算出手段３では、多
値画像データから生成される２値画像の複雑度を算出す
る。複雑度の算出方法を図５のフローチャートを用いて
説明する。ステップ５０１では、閾値θで多値画像デー
タを式１に従い２値化する。In FIG. 1, the complexity calculating means 3 calculates the complexity of a binary image generated from multivalued image data. A method of calculating the complexity will be described with reference to the flowchart of FIG. In step 501, the multi-valued image data is binarized according to Expression 1 with the threshold value θ.

【００１８】ここで、多値画像データにおけるある画素の信号レベル
をｆ、式１の結果得られる２値画像データの信号レベル
をｇとする。θは、初期値は０である。後でも述べる
が、このステップ５０１は複数回繰り返され、その度に
θは１づつ加算される。[0018] Here, the signal level of a certain pixel in the multi-valued image data is f, and the signal level of the binary image data obtained as a result of Expression 1 is g. The initial value of θ is 0. As will be described later, step 501 is repeated a plurality of times, and θ is incremented by 1 each time.

【００１９】ステップ５０２では、ステップ５０１にお
ける閾値θでの２値画像データｇより、複雑度Ｃ（θ）
を算出する。複雑度の算出方法は、尺度の違いにより３
種類ある。１つめの尺度は白黒の連結成分数、２つめの
尺度は白黒の境界線の長さ、３つめの尺度はＤＦ表現で
の素画数である。In step 502, the complexity C (θ) is calculated from the binary image data g at the threshold θ in step 501.
To calculate. The method of calculating complexity depends on the scale.
There are types. The first scale is the number of black and white connected components, the second scale is the length of the black and white boundary line, and the third scale is the number of prime strokes in the DF expression.

【００２０】連結成分数について説明する。ここでは、
連結性の定義を４連結とする。４連結とは、ある画素と
別の画素が辺を介して隣あって連結していることをい
う。連結成分数とは、例えば、図６に示すように、斜線
で囲まれるような同画素の塊の個数をいう。ここでは、
白黒いずれも４連結した数とするので、図６の連結成分
数は、黒が３、白が１で、合計４となる。ここで、白画
素は、信号レベル１、黒画素は信号レベル０を指す。The number of connected components will be described. here,
The definition of connectivity is 4 connections. The 4-connection means that a certain pixel and another pixel are adjacently connected to each other via a side. The number of connected components means, for example, as shown in FIG. 6, the number of clusters of the same pixel surrounded by diagonal lines. here,
Since the number of connected components is 4 for both black and white, the number of connected components in FIG. 6 is 3 for black and 1 for a total of 4. Here, a white pixel indicates a signal level 1 and a black pixel indicates a signal level 0.

【００２１】連結成分数を尺度とした複雑度算出方法を
図７のフローチャートを用いて説明する。ステップ７０
１では注目画素を決める。決め方は基本的には、画像上
の左側から右側を主走査方向、上方から下方を副走査方
向として走査する。すなわち、最初の注目画素を左上と
し、その後の注目画素は前の注目画素の右隣とする。た
だし、一度注目画素となった画素はとばす。これは、後
でも述べるが、ステップ７０２においても、注目画素を
決める場合があるためである。A complexity calculation method using the number of connected components as a scale will be described with reference to the flowchart of FIG. Step 70
In 1, the pixel of interest is determined. Basically, the scanning is basically performed from the left side to the right side of the image in the main scanning direction and from the upper side to the lower side in the sub scanning direction. That is, the first target pixel is set to the upper left and the subsequent target pixel is set to the right of the previous target pixel. However, the pixel that once becomes the target pixel is skipped. This is because the target pixel may be determined in step 702 as well, which will be described later.

【００２２】ステップ７０２では、連結している１つの
画素群を注目画素にする。方法は、図８に示されるよう
に、注目画素Ｘに対して、図８で示すＡ、Ｂ、Ｃ、Ｄの
４画素が連結しているかをみて、連結している画素があ
る場合は、全て、注目画素とする。この新たな注目画素
に対して、四方の画素が連結しているかをみて、連結画
素を注目画素にする。これを繰り返す。連結している画
素がなくなったら、ステップ７０３にいく。In step 702, one connected pixel group is set as a target pixel. As shown in FIG. 8, the method checks whether four pixels A, B, C, and D shown in FIG. 8 are connected to the pixel of interest X, and if there is a connected pixel, All are considered as target pixels. With respect to this new pixel of interest, it is determined whether pixels on all sides are connected, and the connected pixel is set as the pixel of interest. Repeat this. When there are no connected pixels, go to step 703.

【００２３】ステップ７０３では、ステップ７０２で連
結画素がなくなる回数、すなわち連結成分数αをカウン
トする。ステップ７０４では、全画素が注目画素になっ
たか確かめる。なった場合は、ステップ７０５にいき、
なっていない場合は、ステップ７０１にいく。In step 703, the number of times the connected pixels disappear in step 702, that is, the number of connected components α is counted. In step 704, it is confirmed whether all the pixels have become the target pixel. If so, go to step 705,
If not, go to step 701.

【００２４】ステップ７０５では、連結成分数αから複
雑度Ｃ（θ）を式４で算出して、終了する。Ｃ（θ）＝α／全画素数 ……… （式４）境界線の長さを尺度とする複雑度の算出方法を図９のフ
ローチャートを用いて説明する。In step 705, the complexity C (θ) is calculated from equation (4) from the number of connected components α, and the process ends. C (θ) = α / total number of pixels (Equation 4) A method of calculating the complexity using the length of the boundary line as a scale will be described with reference to the flowchart of FIG. 9.

【００２５】ステップ９０１では、注目画素を決める。
決め方は、例えば、画像上の左側から右側を主走査方
向、上方から下方を副走査方向として、走査する。すな
わち、最初の注目画素を左上とし、その後の注目画素
は、前の注目画素の右隣とする。ステップ９０２では、
図１０に示すように、注目画素Ｘとその右隣の画素Ａが
白と黒というように異なる種類の画素であるかどうかを
調べる。異なる画素であれば、ステップ９０３にいき、
同じ画素であれば、ステップ９０４にいく。右隣の画素
がない場合は、無条件でステップ９０３にいく。In step 901, the pixel of interest is determined.
The determination method is, for example, scanning from the left side to the right side on the image in the main scanning direction and from the upper side to the lower side in the sub scanning direction. That is, the first pixel of interest is the upper left, and the subsequent pixel of interest is on the right of the previous pixel of interest. In step 902,
As shown in FIG. 10, it is checked whether the pixel of interest X and the pixel A on the right of the pixel of interest are different types of pixels such as white and black. If the pixel is different, go to step 903,
If the pixels are the same, the process goes to step 904. If there is no adjacent pixel on the right, the processing unconditionally proceeds to step 903.

【００２６】ステップ９０４では、図１０に示すよう
に、注目画素Ｘとその下方の画素Ｂが白と黒というよう
に異なる種類の画素であるかどうかを調べる。異なる画
素であれば、ステップ９０５にいき、同じ画素であれ
ば、ステップ９０６にいく。下方の画素がない場合は、
無条件でステップ９０５にいく。In step 904, as shown in FIG. 10, it is checked whether the target pixel X and the pixel B below it are different types of pixels such as white and black. If the pixels are different, the process proceeds to step 905, and if the pixels are the same, the process proceeds to step 906. If there is no lower pixel,
Unconditionally, go to step 905.

【００２７】ステップ９０３および９０５では、ステッ
プ９０２及びステップ９０４で異なる画素となった回数
を境界線の長さαとして、カウントする。ステップ９０
６では、全画素が走査されたか調べる。走査された場合
は、ステップ９０７にいき、そうでない場合はステップ
９０１にいく。In steps 903 and 905, the number of times of different pixels in step 902 and step 904 is counted as the boundary line length α. Step 90
At 6, it is checked if all pixels have been scanned. If scanned, go to step 907, otherwise go to step 901.

【００２８】ステップ９０７では、ステップ９０５の境
界線の長さαから式５で複雑度Ｃ（θ）を算出して終了
する。Ｃ（θ）＝α／（全画素数＊２） ……… （式５）次に、図１１の画像を１例とし、ＤＦ表現での素画数を
算出する方法を図１２のフローチャートを用いて説明す
る。なお、素画数を算出するには、画素が２のべき乗×
２のべき乗の画像である必要がある。例えば、図１１に
示されるような８×８の画像である。At step 907, the complexity C (θ) is calculated by the equation 5 from the boundary length α at step 905, and the process ends. C (θ) = α / (total number of pixels * 2) ... Explain. To calculate the number of elementary strokes, the number of pixels is a power of 2.
It must be a power of 2 image. For example, it is an 8 × 8 image as shown in FIG.

【００２９】ステップ１２０１では、画像を図１３のよ
うに、Ａ１、Ａ２、Ａ３、Ａ４の小画像に４分割する。
ステップ１２０２では、ステップ１２０１での小画像か
ら、４分木をつくる。作り方は、小画像の全画素が黒の
場合は、黒とし、全部または一部が白の場合は、白とす
る。図１１の画像であれば、第１回目の分割の四分木
は、図１４のようなる。In step 1201, the image is divided into four small images A1, A2, A3 and A4 as shown in FIG.
In step 1202, a quadtree is created from the small image in step 1201. The way of making is to make it black if all the pixels of the small image are black, and to make it white if all or some of them are white. In the case of the image of FIG. 11, the quadtree of the first division is as shown in FIG.

【００３０】ステップ１２０３では、小画像がさらに４
分割できるか調べる。可能であれば、ステップ１２０１
にいき、不可能であればステップ１２０４にいく。ステ
ップ１２０４では、ステップ１２０２でつくられた４分
木から、葉ノード数をカウウトし素画数とする。例え
ば、図１１の画像であれば、図１５のような４分木がつ
くられ、丸で囲ってあるマークが葉ノードであり、この
場合１３個である。よって素画数は１３となる。In step 1203, the small image is further divided into four.
Check if it can be divided. Step 1201 if possible
If not, go to step 1204. In step 1204, the number of leaf nodes is counted out from the quadtree created in step 1202 to obtain the number of prime images. For example, in the case of the image of FIG. 11, a quadtree as shown in FIG. 15 is created, and the circled marks are leaf nodes, and in this case there are thirteen. Therefore, the number of elementary strokes is 13.

【００３１】ステップ１２０５では、素画数から式５よ
り、複雑度Ｃ（θ）が算出される。Ｃ（θ）＝素画数／全画素数 ……… （式６）以上で、図５のステップ５０２における３種類の複雑度
算出方法の説明を終了する。At step 1205, the complexity C (θ) is calculated from the number of prime strokes by the equation (5). C (θ) = number of prime images / total number of pixels (Equation 6) Above, the description of the three types of complexity calculation methods in step 502 of FIG. 5 is completed.

【００３２】図５のステップ５０３では、ステップ５０
１の閾値θが多値信号レベルの最大値Ｍ（８ビットであ
れば、２５５）以下であるか調べる。Ｍ以下であれば、
ステップ５０４にいき、そうでなければステップ５０５
にいく。ステップ５０４では、閾値θを１だけ加算し、
ステップ５０１での新たな閾値とする。In step 503 of FIG. 5, step 50
It is checked whether the threshold value θ of 1 is less than or equal to the maximum value M of the multilevel signal level (255 for 8 bits). If M or less,
Go to step 504, otherwise step 505
go to. At step 504, the threshold value θ is incremented by 1, and
A new threshold is set in step 501.

【００３３】ステップ５０５では、ステップ５０２に閾
値θにおける複雑度Ｃ（θ）が生成されている。この複
雑度Ｃ（θ）を横軸θ、縦軸Ｃ（θ）で図示すると、通
常、図１６に示すような両端、中央付近で谷となるよう
な曲線となる。次に図１における最適閾値算出手段４を
説明する。最適閾値算出手段４では、２値化誤差算出手
段２から算出された２値化誤差ｅ（θ）及び複雑度算出
手段３から算出された複雑度Ｃ（θ）から最適な閾値θ
を決定するものである。最適な閾値を決定する方法は２
種類ある。In step 505, the complexity C (θ) at the threshold value θ is generated in step 502. When the complexity C (θ) is represented by the horizontal axis θ and the vertical axis C (θ), the curve normally has a valley at both ends and near the center as shown in FIG. Next, the optimum threshold value calculation means 4 in FIG. 1 will be described. In the optimum threshold value calculation means 4, the optimum threshold value θ is calculated from the binarization error e (θ) calculated by the binarization error calculation means 2 and the complexity C (θ) calculated by the complexity degree calculation means 3.
Is to determine. There are two ways to determine the optimal threshold.
There are types.

【００３４】以下、それぞれ説明する。第１の最適閾値
決定方法を図１７を用いて説明する。ステップ１７０１
では、複雑度Ｃ（θ）が極小となる閾値θを全ての極小
に対して求め、θ１、θ２、・・・、θｎを得る。Each will be described below. The first optimum threshold value determining method will be described with reference to FIG. Step 1701
Then, the threshold value θ at which the complexity C (θ) becomes the minimum is obtained for all the minimums, and θ1, θ2, ..., θn are obtained.

【００３５】ステップ１７０２では、２値化誤差ｅ
（θ）の最小となる閾値θｍｉｎを求める。ステップ１
７０３では、ステップ１７０１の各θｉとステップ１７
０２のθｍｉｎの差をとり、θｍｉｎに最も近いθｉを
探し、そのθを最適な閾値とする。At step 1702, the binarization error e
A threshold value θmin that minimizes (θ) is obtained. Step 1
In step 703, each θi in step 1701 and step 17
The difference of θmin of 02 is taken, θi closest to θmin is searched, and the θ is set as the optimum threshold.

【００３６】第２の最適閾値決定方法を図１８を用いて
説明する。ステップ１８０１では、複雑度Ｃ（θ）の第
１最大値及び第２最大値に対応する閾値θａ、θｂを求
める。ステップ１８０２では、θａからθｂの区間で、
複雑度Ｃ（θ）が最小となる閾値θｍｉｎを求める。The second optimum threshold value determining method will be described with reference to FIG. In step 1801, thresholds θa and θb corresponding to the first maximum value and the second maximum value of the complexity C (θ) are obtained. In step 1802, in the section from θa to θb,
A threshold value θmin that minimizes the complexity C (θ) is obtained.

【００３７】ステップ１８０３では、２値化誤差ｅ
（θ）が極小となる閾値θ１、θ２、・・・、θｎを算
出する。ステップ１８０４では、ステップ１８０３の各
θｉとステップ１８０２のθｍｉｎの差をとり、θｍｉ
ｎに最も近いθｉを探し、そのθを最適な閾値とする。At step 1803, the binarization error e
Thresholds θ1, θ2, ..., θn at which (θ) becomes minimum are calculated. In step 1804, the difference between each θi in step 1803 and θmin in step 1802 is calculated to obtain θmi
The θi closest to n is searched, and the θ is set as the optimum threshold.

【００３８】以上で、図１の最適閾値算出手段４におけ
る最適閾値算出方法の説明を終了する。図１の閾値記憶
手段５では、最適閾値算出手段４が算出した最適閾値を
記憶する。This is the end of the description of the optimum threshold value calculating method in the optimum threshold value calculating means 4 of FIG. The threshold value storage means 5 of FIG. 1 stores the optimum threshold value calculated by the optimum threshold value calculation means 4.

【００３９】２値化手段６では、閾値記憶手段５に記憶
される閾値で、多値画像データを２値化し、２値画像デ
ータ７を得る。以上説明した手段は、すべてＣＰＵを用
いてプログラムにても実現可能であるし、また、一部あ
るいはすべてハードウェア化して実現することも可能で
ある。The binarizing means 6 binarizes the multivalued image data with the threshold value stored in the threshold value storage means 5 to obtain the binary image data 7. All the means described above can be implemented by a program using a CPU, or can be implemented by a part or all of hardware.

【００４０】以上の述べた実施例は、対象となる多値画
像全体を一つの閾値で２値化する例であったが、多値画
像を複数の領域に分割し、領域ごとに最適閾値を算出す
る２値化装置にも、本実施例を適用できるのはもちろん
である。The embodiment described above is an example in which the entire target multi-valued image is binarized with one threshold value. However, the multi-valued image is divided into a plurality of areas, and the optimum threshold value is set for each area. Of course, the present embodiment can be applied to the binarizing device for calculation.

【００４１】[0041]

【発明の効果】この発明は、以上説明したように、２値
画像の２値化誤差及び複雑度の両方から最適となる閾値
を算出し、その閾値によって多値画像データを２値画像
データに２値化する構成としたので、２値画像が最適と
なる閾値を算出できる。このため、２値画像上のノイズ
の発生を防止し、文字線画の境界部にギザギザが生じな
くなるなど、高品質な２値画像が得られる効果がある。As described above, according to the present invention, an optimum threshold value is calculated from both the binarization error and the complexity of a binary image, and the multivalued image data is converted into binary image data by the threshold value. Since it is configured to be binarized, it is possible to calculate the threshold value that optimizes the binary image. Therefore, there is an effect that a high-quality binary image can be obtained by preventing generation of noise on the binary image and eliminating jaggedness at the boundary portion of the character line drawing.

[Brief description of drawings]

【図１】本発明の画像データ２値化装置の実施例を示し
たブロック図である。FIG. 1 is a block diagram showing an embodiment of an image data binarizing apparatus of the present invention.

【図２】２値化誤差算出方法を示したフローチャートで
ある。FIG. 2 is a flowchart showing a binarization error calculation method.

【図３】多値画像のヒストグラムを示した説明図であ
る。FIG. 3 is an explanatory diagram showing a histogram of a multi-valued image.

【図４】２乗誤差平均ｅ（θ）の特性を示した説明図で
ある。FIG. 4 is an explanatory diagram showing characteristics of a squared error average e (θ).

【図５】複雑度算出方法をを示したフローチャートであ
る。FIG. 5 is a flowchart showing a complexity calculation method.

【図６】連結成分数の説明図である。FIG. 6 is an explanatory diagram of the number of connected components.

【図７】連結成分数による複雑度算出方法を示したフロ
ーチャートである。FIG. 7 is a flowchart showing a method of calculating complexity based on the number of connected components.

【図８】連結画素の説明図である。FIG. 8 is an explanatory diagram of connected pixels.

【図９】境界線長による複雑度算出方法を示したフロー
チャートである。FIG. 9 is a flowchart showing a method of calculating complexity based on a boundary line length.

【図１０】画素配置の説明図である。FIG. 10 is an explanatory diagram of a pixel arrangement.

【図１１】２値画像の１例を示した説明図である。FIG. 11 is an explanatory diagram showing an example of a binary image.

【図１２】ＤＦ表現の素画数による複雑度算出方法を示
したフローチャートである。FIG. 12 is a flowchart showing a method of calculating a complexity based on the number of prime strokes of a DF expression.

【図１３】画像分割方法を示した説明図である。FIG. 13 is an explanatory diagram showing an image division method.

【図１４】四分木の一例を示した説明図である。FIG. 14 is an explanatory diagram showing an example of a quadtree.

【図１５】四分木の一例を示した説明図である。FIG. 15 is an explanatory diagram showing an example of a quadtree.

【図１６】複雑度Ｃ（θ）の特性を示した説明図であ
る。FIG. 16 is an explanatory diagram showing characteristics of complexity C (θ).

【図１７】最適閾値算出方法の一例を示したフローチャ
ートである。FIG. 17 is a flowchart showing an example of an optimum threshold value calculation method.

【図１８】最適閾値算出方法の一例を示したフローチャ
ートである。FIG. 18 is a flowchart showing an example of an optimum threshold value calculation method.

[Explanation of symbols]

１多値画像データ入力手段２２値化誤差算出手段３複雑度算出手段４最適閾値算出手段５閾値記憶手段６２値化手段７２値画像データ 1 Multi-valued image data input means 2 Binarization error calculation means 3 Complexity calculation means 4 Optimal threshold value calculation means 5 Threshold value storage means 6 Binarization means 7 Binary image data

Claims

[Claims]

1. An image data binarizing device for binarizing multivalued image data with a certain threshold, comprising: a binarization error calculating means for calculating a binarization error generated by binarizing the multivalued image data. , A complexity calculating means for calculating the complexity of a binary image obtained by binarizing the multi-valued image data, and an optimum for calculating a threshold value for optimizing the binary image based on the binarization error and the complexity An image data binarization device comprising: a threshold value calculation unit; and a binarization unit that binarizes multi-valued image data into binary image data according to the optimum threshold value.