JPH10143607A

JPH10143607A - Image processor

Info

Publication number: JPH10143607A
Application number: JP8302078A
Authority: JP
Inventors: Masaru Sugioka; 賢杉岡; Koji Ito; 晃治伊東
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1996-11-13
Filing date: 1996-11-13
Publication date: 1998-05-29

Abstract

PROBLEM TO BE SOLVED: To appropriately separate a character part and a background part even for images for which a part of characters is thinned by comparing line widths corresponding to respective connection components with a threshold value, judging whether the connection component is the character part or the background part and outputting a result. SOLUTION: A connection component detection part 30 detects the connection component of the same color as character information from inputted source image data and the respective connection components detected by the connection component detection part 30 are stored in a connection component storage part 40. For the respective connection components outputted from the connection component storage part 40, a line width parameter is detected by a line width detection part 50. The line width parameter values of the respective connection components outputted to a character/background judgement part 60 are compared with the threshold value internally possessed by the character/background judgement part 60 and whether the respective connection components are a character data part or the background part is judged.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、画像の文字部分
と背景部分とを切り分ける画像処理装置に関するもので
あり、特に、２値画像イメージとして取り込まれた文書
等の原画像イメージに含まれるノイズや地紋等を除去す
るための装置に適用して好適なものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an image processing apparatus for separating a character portion from a background portion of an image, and more particularly to noise and noise contained in an original image such as a document captured as a binary image. It is suitable for application to an apparatus for removing a tint block or the like.

【０００２】[0002]

【従来の技術】光学式文字読取装置やファクシミリにお
いては、ＣＣＤセンサ等を用いて文書等を光学的に読取
り、白黒の２値に量子化された文書等の原画像イメージ
を得る。文書等の場合、この原画像イメージは、文字や
線画（この明細書においては、これらをまとめて文字と
呼ぶ）の画像データ（以下、文字データという）と文字
背景部の画像データ（以下、背景データという）とから
成る。一般に、背景データは画像処理の妨げとなる不要
な雑音を含むことが多く、画像処理のために、前もって
背景データに含まれる雑音を除去する処理が多く行われ
る。2. Description of the Related Art In an optical character reading apparatus or a facsimile, a document or the like is optically read by using a CCD sensor or the like, and an original image image of a document or the like quantized into black and white binary values is obtained. In the case of a document or the like, the original image image includes image data (hereinafter, referred to as character data) of characters and line drawings (hereinafter, collectively referred to as characters) and image data of a character background portion (hereinafter, referred to as background). Data). Generally, background data often includes unnecessary noise that hinders image processing, and for image processing, a process of removing noise included in background data is performed in advance in many cases.

【０００３】また、新聞の見出し記事等に見られるよう
に、文書の中には文字背景部に縦線や横線や斜線等の線
模様が入っていたりするものがある。この場合、背景デ
ータは、線模様の画像データ（以下、模様データとい
う）を含むが、このような模様データを含んだ状態のま
ま原画像データの画像処理を行うと、初期の画像処理を
行えなかったり、画像処理の効率が悪くなったりする等
の不都合が生じる。[0003] Further, as seen in a headline article of a newspaper or the like, some documents have a line pattern such as a vertical line, a horizontal line, or an oblique line in a character background portion. In this case, the background data includes image data of a line pattern (hereinafter, referred to as pattern data). If the image processing of the original image data is performed while such pattern data is included, initial image processing can be performed. Inconveniences such as no image processing and poor image processing efficiency occur.

【０００４】例えば、光学式文字読取装置を用いて文字
認識処理を行う場合に、模様データを除去しないまま文
字切出し処理を行うことは非常に困難であり、一般に
は、模様データを除去する処理を行ってから文字切出し
処理を行っている。For example, when character recognition processing is performed using an optical character reader, it is very difficult to perform character extraction processing without removing pattern data. Generally, processing for removing pattern data is performed. After that, character extraction processing is performed.

【０００５】このような模様データや雑音等である文字
背景部の画像データの処理を行う手法としては、黒ラン
長や収縮（縮退）を用いるものが知られている（例え
ば、文献１pp.54-56参照）。As a method of processing such pattern data or image data of a character background portion such as noise, a method using a black run length or contraction (degeneration) is known (for example, reference 1 pp. 54). -56).

【０００６】文献１『坂井利之著、情報基礎学演習、コ
ロナ社、1986』黒ラン長を用いる方法は、例えば、原画像データの縦方
向、横方向にそれぞれ走査し、黒ラン長が所定のしきい
値より小さい黒ランを除去することにより、文字線より
細い背景の線模様やノイズ線等の不要要素を除去するも
のである。なお、文字色が白の場合には、白ラン長によ
り同様な処理を行なう。Literature 1, "Toshiyuki Sakai, Seminar on Fundamental Information Science, Corona Co., 1986" The method of using the black run length is, for example, to scan the original image data in the vertical direction and the horizontal direction, and to set the black run length to a predetermined value. By removing black runs smaller than the threshold value, unnecessary elements such as background line patterns and noise lines that are thinner than character lines are removed. When the character color is white, similar processing is performed according to the white run length.

【０００７】また、収縮を用いる方法は、例えば、原画
像全体に４近傍収縮を背景の線模様がなくなるまで繰り
返し、その後膨張処理を行って文字を復元するものであ
る。In the method using erosion, for example, a 4-neighbor erosion is repeated in the entire original image until there is no background line pattern, and then expansion processing is performed to restore characters.

【０００８】[0008]

【発明が解決しようとする課題】しかし、上記いずれの
方法も背景の線模様やノイズ線等が文字線よりも細くな
ければ正確な背景除去は不可能である。However, in any of the above methods, accurate background removal is not possible unless the background line pattern, noise line, or the like is thinner than the character line.

【０００９】図２（ａ）は、文字線が黒画素で構成され
ている文字（以下、黒文字という）画像に、黒の背景模
様が付された線模様付き画像の例であり、図２（ｂ）
は、文字線が白画素で構成されている文字（以下、白文
字という）画像に、白の背景模様が付された線模様付き
画像の例であり、文字の横線の太さと背景の線の太さと
がほぼ同じ場合を示したものである。FIG. 2A is an example of an image with a line pattern in which a black background pattern is added to a character (hereinafter, referred to as a black character) image in which a character line is composed of black pixels. b)
Is an example of an image with a line pattern in which a white background pattern is added to a character (hereinafter, referred to as a white character) image in which a character line is composed of white pixels. This shows a case where the thickness is almost the same.

【００１０】例えば、黒ラン長を用いて、図２（ａ）に
示された原画像データから黒の縦線模様である背景デー
タを除去すると図３（ａ）に示すように「文字」の横線
が消失する。一方、収縮を用いた場合でも、図３（ｂ）
に示すように「文字」の横線が消失し、この後たとえ膨
張処理を行っても、消失した文字の横線を復元すること
は不可能である。[0010] For example, when the background data, which is a black vertical line pattern, is removed from the original image data shown in FIG. 2A using the black run length, as shown in FIG. The horizontal line disappears. On the other hand, even when shrinkage is used, FIG.
As shown in (1), the horizontal line of the "character" disappears, and it is impossible to restore the horizontal line of the lost character even if the expansion processing is performed thereafter.

【００１１】図２（ａ）のように文字が明朝体に類似し
た書体で印刷された原画像の場合、文字の横線が極端に
細くなってしまうことから、上記のような従来方法では
背景だけを正確に除去することができず、そのために後
段の処理である文字切出し処理や文字認識処理等が妨げ
られるという課題があった。In the case of an original image in which characters are printed in a typeface similar to the Mincho style as shown in FIG. 2A, the horizontal lines of the characters become extremely thin. However, there is a problem that character removal processing and character recognition processing, which are subsequent processing, cannot be performed accurately.

【００１２】そこで、文字の一部が細くなっている画像
でも文字部分と背景部分とを適切に分けられる画像処理
装置が望まれている。Therefore, there is a demand for an image processing apparatus that can appropriately separate a character portion from a background portion even in an image in which a portion of a character is thin.

【００１３】[0013]

【課題を解決するための手段】かかる課題を解決するた
めに、この発明の画像処理装置は、（１）原画像データ
から文字線の色と同じ色の連結成分を検出する連結成分
検出手段と（２）上記各連結成分の線幅を検出する線幅
検出手段と（３）上記各連結成分に対応した線幅と、し
きい値とを比較することにより、上記各連結成分が文字
部分か又は背景部分かを判定し、判定結果を出力する文
字背景判定手段とを有することを特徴とする。In order to solve this problem, an image processing apparatus according to the present invention comprises: (1) a connected component detecting means for detecting a connected component having the same color as a character line color from original image data; (2) a line width detecting means for detecting a line width of each of the connected components; and (3) a line width corresponding to each of the connected components is compared with a threshold to determine whether each of the connected components is a character portion. Or a character background determining unit that determines whether the character is a background part and outputs a determination result.

【００１４】[0014]

BEST MODE FOR CARRYING OUT THE INVENTION

（Ａ）第１の実施形態以下、この発明による画像処理装置の第１の実施形態に
ついて図面を参照しながら詳述する。(A) First Embodiment Hereinafter, a first embodiment of an image processing apparatus according to the present invention will be described in detail with reference to the drawings.

【００１５】（Ａ−１）第１の実施形態の構成図１は、この実施形態の画像処理装置の機能的ブロック
図である。図１において、この実施形態の画像処理装置
は、画像記録媒体１０と、画像格納部２０と、連結成分
検出部３０と、連結成分格納部４０と、線幅検出部５０
と、文字背景判定部６０と、背景除去部７０と、出力画
像格納部９０とでなる。(A-1) Configuration of First Embodiment FIG. 1 is a functional block diagram of an image processing apparatus according to this embodiment. 1, the image processing apparatus of this embodiment includes an image recording medium 10, an image storage unit 20, a connected component detection unit 30, a connected component storage unit 40, and a line width detection unit 50.
, A character background determination unit 60, a background removal unit 70, and an output image storage unit 90.

【００１６】画像記録媒体１０は、文字や線画等が存在
する複数の文書等の原画像のイメージを２値画像イメー
ジとして格納するものである。The image recording medium 10 stores an image of an original image such as a plurality of documents including characters and line drawings as a binary image.

【００１７】画像格納部２０は、画像記録媒体１０から
今処理対象としている上記２値画像イメージを原画像デ
ータとして入力し、これを格納し（例えば、白を
「０」、黒を「１」として格納し）、後述する連結成分
検出部３０と背景除去部７０とへ出力するものである。The image storage unit 20 inputs the binary image to be processed from the image recording medium 10 as original image data and stores it (for example, “0” for white and “1” for black). And outputs it to the connected component detection unit 30 and the background removal unit 70 described later.

【００１８】連結成分検出部３０は、文字線の色である
文字色が黒でも白でも対応できるものであり、その時点
での原画像の文字色を示す文字色情報と、画像格納部２
０から出力された原画像データとを入力とし、文字色情
報と同じ色の連結成分を原画像データから検出するもの
である。The connected component detecting section 30 can handle the character color of the character line, whether black or white, and character color information indicating the character color of the original image at that time and the image storage section 2.
The original image data output from 0 is input, and a connected component having the same color as the character color information is detected from the original image data.

【００１９】具体的には、連結成分検出部３０は、画像
格納部２０から原画像データを受け取り、原画像データ
全体を走査する。そして、文字色情報と同色であって、
互いに４連結の関係にある一連の画素を連結成分として
原画像データから検出し、連結成分格納部４０と背景除
去部７０とに出力する。例えば、対象とする画素の４近
傍（対象画素の前後左右の４画素）から、対象とする画
素と同色の画素を選択し、選択された画素についても同
様の処理を行うことによって、連結成分を検出する。More specifically, the connected component detection unit 30 receives the original image data from the image storage unit 20 and scans the entire original image data. And the same color as the character color information,
A series of pixels having a 4-connected relationship with each other are detected from the original image data as connected components, and output to the connected component storage unit 40 and the background removal unit 70. For example, a pixel having the same color as the target pixel is selected from four neighborhoods of the target pixel (four pixels before, after, right and left of the target pixel), and the same processing is performed on the selected pixel, thereby forming the connected component. To detect.

【００２０】連結成分格納部４０は、連結成分検出部３
０によって検出された１又は複数の連結成分を格納し、
線幅検出部５０へ出力するものである。The connected component storage 40 stores the connected component detector 3
Storing one or more connected components detected by 0;
This is output to the line width detection unit 50.

【００２１】線幅検出部５０は、連結成分格納部４０
に格納されている連結成分について、それぞれの連結成
分の線幅に応じた値（以下、線幅パラメータと呼ぶ）を
検出し、線幅パラメータを文字背景判定部６０へ出力す
るものである。The line width detecting section 50 includes a connected component storing section 40
, A value (hereinafter, referred to as a line width parameter) corresponding to the line width of each connected component is detected, and the line width parameter is output to the character background determination unit 60.

【００２２】具体的には、線幅検出部５０は、図４に示
すようなａ０〜ａ３からなる２行２列の観測窓を具備
し、連結成分格納部４０のメモリ上の画像データ（各連
結成分毎の画像データ）を構成する全画素を観測窓ａ０
及び観測窓ａ０〜ａ３で覗くように、２×２の観測窓を
ラスタ走査する。More specifically, the line width detecting section 50 has an observation window of two rows and two columns consisting of a0 to a3 as shown in FIG. All the pixels constituting the image data for each connected component) are displayed in the observation window a0.
Then, the 2 × 2 observation window is raster-scanned so as to be viewed through the observation windows a0 to a3.

【００２３】そして、この２×２の観測窓と、２×２の
観測窓から見える連結成分格納部４０のメモリ上の画素
との比較を行い、その結果、２×２の観測窓から見える
連結成分格納部４０のメモリ上の画素が全て、文字色情
報が示す文字色と同じ色のとき（この具体的説明におい
ては文字色を黒とする）、４黒点数Ｑの値を１つ増すよ
うに計数する。また、観測窓ａ０から覗いた連結成分格
納部４０のメモリ上の画素が文字色情報と同じ色のと
き、黒点数Ａの値を１つ増すように計数する。Then, a comparison is made between the 2 × 2 observation window and the pixels on the memory of the connected component storage unit 40 seen from the 2 × 2 observation window, and as a result, the connection observed from the 2 × 2 observation window is obtained. When all the pixels on the memory of the component storage unit 40 have the same color as the character color indicated by the character color information (the character color is assumed to be black in this specific description), the value of the number of black points Q is increased by one. To be counted. When the pixel on the memory of the connected component storage unit 40 viewed from the observation window a0 has the same color as the character color information, the value of the number A of black points is counted up by one.

【００２４】ただし、４黒点数Ｑ及び黒点数Ａの初期値
は０とする。また、画像データを走査する場合、２×２
の観測窓の一部が連結成分格納部４０のメモリの外側に
位置したときは、当該観測窓から見える画素は文字色と
別の色とする。However, the initial values of the number of black spots Q and the number of black spots A are 0. When scanning image data, 2 × 2
When a part of the observation window is located outside the memory of the connected component storage unit 40, the pixel seen from the observation window is different from the character color.

【００２５】当該連結成分において、（１）式に示すよ
うに、黒点数Ａを、黒点数Ａから４黒点数Ｑを差し引い
た値で割った値を線幅パラメータＷと定義する。In the connected component, a value obtained by dividing the number of black points A by a value obtained by subtracting the number of four black points Q from the number of black points A is defined as a line width parameter W as shown in the equation (1).

【００２６】線幅パラメータ：Ｗ＝Ａ／（Ａ−Ｑ） …（１）文字背景判定部６０は、線幅検出部５０によって検出さ
れた各連結成分の線幅パラメータ値のそれぞれを、所定
のしきい値と比較することにより、当該連結成分が文字
データ部分か背景データ部分かを判定するものである。
すなわち、文字背景判定部６０は、連結成分の線幅パラ
メータ値に対するしきい値を予め格納しておき、連結成
分の線幅パラメータ値がそのしきい値以上のとき、当該
連結成分を文字データ部分と判定し、線幅パラメータ値
がそのしきい値未満のとき、当該連結成分を背景データ
部分と判定する。Line width parameter: W = A / (A−Q) (1) The character background determination unit 60 determines each of the line width parameter values of each connected component detected by the line width detection unit 50 by a predetermined value. By comparing with a threshold value, it is determined whether the connected component is a character data portion or a background data portion.
That is, the character background determination unit 60 stores a threshold value for the line width parameter value of the connected component in advance, and when the line width parameter value of the connected component is equal to or larger than the threshold value, the connected component When the line width parameter value is less than the threshold value, the connected component is determined to be a background data portion.

【００２７】背景除去部７０は、文字背景判定部６０か
ら出力された判定結果を入力し、連結成分検出部３０か
ら出力された連結成分が背景データ部分である場合に
は、画像格納部２０から出力された原画像データから当
該連結成分部分を除去するものである。The background removing unit 70 receives the judgment result output from the character background judging unit 60, and if the connected component output from the connected component detecting unit 30 is a background data part, The connected component portion is removed from the output original image data.

【００２８】具体的には、背景除去部７０は、画像格納
部２０から原画像データを、連結成分検出部３０から原
画像データにおける当該連結成分の位置情報を、文字背
景判定部６０から判定結果を入力し、当該連結成分が背
景データ部分の場合には、画像格納部２０から出力され
た原画像データにおける当該連結成分部分を黒白反転し
て出力画像格納部９０に出力し、当該連結成分が文字デ
ータ部分の場合には、画像格納部２０から出力された原
画像データの当該連結成分部分をそのまま出力画像格納
部９０に出力する。More specifically, the background removing unit 70 determines the original image data from the image storage unit 20, the connected component detecting unit 30 determines the position information of the connected component in the original image data, and the character / background determining unit 60 determines the determination result. If the connected component is a background data portion, the connected component portion in the original image data output from the image storage unit 20 is black-and-white inverted and output to the output image storage unit 90, and the connected component is In the case of a character data portion, the connected component portion of the original image data output from the image storage unit 20 is output to the output image storage unit 90 as it is.

【００２９】出力画像格納部９０は、背景除去部７０か
ら出力された背景データ部分が除去された画像データを
格納するものである。The output image storage section 90 stores the image data output from the background removal section 70 from which the background data portion has been removed.

【００３０】（Ａ−２）第１の実施形態の動作以上の構成を有する第１の実施形態の画像処理装置は、
以下のような動作を行なう。(A-2) Operation of First Embodiment The image processing apparatus of the first embodiment having the above configuration is
The following operation is performed.

【００３１】文字が存在する複数の文書等の２値画像イ
メージは、画像記録媒体１０に格納されている。画像記
録媒体１０から出力された処理対象の２値画像イメージ
は、画像格納部２０によって格納され、連結成分検出部
３０と背景除去部７０とへ出力される。連結成分検出部
３０は、入力された原画像データから、文字色情報と同
色の連結成分を検出する。連結成分検出部３０によって
検出された各連結成分は、連結成分格納部４０に格納さ
れる。連結成分格納部４０から出力された各連結成分に
ついては、線幅検出部５０によって、線幅パラメータＷ
が検出される。A binary image such as a plurality of documents having characters is stored in the image recording medium 10. The binary image to be processed output from the image recording medium 10 is stored by the image storage unit 20 and output to the connected component detection unit 30 and the background removal unit 70. The connected component detection unit 30 detects a connected component having the same color as the character color information from the input original image data. Each connected component detected by the connected component detection unit 30 is stored in the connected component storage unit 40. For each connected component output from the connected component storage unit 40, the line width detection unit 50 uses the line width parameter W
Is detected.

【００３２】例えば、黒文字画像と黒線模様を含む背景
画像とから成る画像である図２（ａ）を原画像として、
この実施形態の画像処理装置に入力したとき、この線幅
検出部５０が検出した各連結成分の線幅パラメータＷの
値は、図５（ａ）と図５（ｂ）と示したとおりになる。
図５（ａ）によれば、図２（ａ）に示された画像の背景
データ部分の連結成分の個数は５６であり、連結成分の
線幅パラメータＷの値は１．０〜２．０の範囲内の値と
なっている。一方、図５（ｂ）によれば、図２（ａ）に
示された画像の文字データ部分の連結成分の個数は４で
あり、連結成分の線幅パラメータＷの値は３．０〜３．
６の範囲の値となっている。For example, FIG. 2A, which is an image composed of a black character image and a background image including a black line pattern, is used as an original image.
When input to the image processing apparatus of this embodiment, the value of the line width parameter W of each connected component detected by the line width detection unit 50 is as shown in FIGS. 5A and 5B. .
According to FIG. 5A, the number of connected components in the background data portion of the image shown in FIG. 2A is 56, and the value of the line width parameter W of the connected components is 1.0 to 2.0. Value within the range. On the other hand, according to FIG. 5B, the number of connected components in the character data portion of the image shown in FIG. 2A is 4, and the value of the line width parameter W of the connected components is 3.0 to 3 .
The value is in the range of 6.

【００３３】文字背景判定部６０に出力された各連結成
分の線幅パラメータ値は、文字背景判定部６０に内部保
有されているしきい値と比較され、それぞれの連結成分
が文字データ部分か又は背景データ部分かが判定され
る。The line width parameter value of each connected component output to the character background determination unit 60 is compared with a threshold value internally stored in the character background determination unit 60, and each connected component is determined to be a character data portion or It is determined whether it is a background data part.

【００３４】例えば、図２（ａ）に示された画像をこの
実施形態に入力した場合、上述したように、文字データ
部分の線幅パラメータ値の範囲は３．０〜３．６とな
り、背景データ部分の線幅パラメータ値の範囲は１．０
〜２．０となったので、この文字背景判定部６０におい
て、例えば、しきい値として２．５を設定しておけば、
各連結成分についてそれが文字データ部分か背景データ
部分かを正しく判別することができる。For example, when the image shown in FIG. 2A is input to this embodiment, the range of the line width parameter value of the character data portion is 3.0 to 3.6 as described above, and The line width parameter value range of the data part is 1.0
Since the threshold value is set to 2.5 in the character background determination unit 60, for example, 2.5 is set as the threshold value.
It is possible to correctly determine whether each connected component is a character data portion or a background data portion.

【００３５】連結成分検出部３０から出力された連結成
分が背景データ部分の場合、背景除去部７０は、画像格
納部２０から出力された原画像データにおける当該連結
成分部分を黒白反転して出力画像格納部９０に出力し、
連結成分検出部３０から出力された連結成分が文字デー
タ部分の場合、画像格納部２０から出力された原画像デ
ータの当該連結成分部分をそのまま出力画像格納部９０
に出力する。When the connected component output from the connected component detecting unit 30 is a background data portion, the background removing unit 70 inverts the connected component portion in the original image data output from the image storage unit 20 into black and white and outputs an output image. Output to the storage unit 90,
When the connected component output from the connected component detection unit 30 is a character data portion, the connected component portion of the original image data output from the image storage unit 20 is used as it is in the output image storage unit 90.
Output to

【００３６】背景データ部分が除去された画像データ
は、出力画像格納部９０に格納される。The image data from which the background data portion has been removed is stored in the output image storage section 90.

【００３７】図６（ａ）は、この第１実施形態によっ
て、背景データ部分が含まれた図２（ａ）の黒文字画像
から背景データ部分を除去し、目的とする文字のみを抽
出した画像の例である。FIG. 6A shows an image of the image obtained by removing the background data portion from the black character image of FIG. 2A including the background data portion and extracting only the target characters according to the first embodiment. It is an example.

【００３８】図６（ｂ）は、この第１実施形態によっ
て、背景データ部分が含まれた図２（ｂ）の白文字画像
から背景データ部分を除去し、目的とする文字のみを抽
出した画像の例である。FIG. 6B shows an image obtained by removing the background data portion from the white character image of FIG. 2B including the background data portion and extracting only the target character according to the first embodiment. This is an example.

【００３９】（Ａ−３）第１の実施形態の効果この第１の実施形態の画像処理装置によれば、原画像デ
ータ（文字画像データと背景データ）から、文字色情報
と同一色の連結成分を検出し、各連結成分ごとに線幅パ
ラメータ値を算出し、それをしきい値と比較することに
より、当該連結成分が文字データ部分か背景データ部分
かを判定するので、以下の効果が生じる。(A-3) Effects of the First Embodiment According to the image processing apparatus of the first embodiment, concatenation of the same color as the character color information from the original image data (character image data and background data) By detecting the component, calculating the line width parameter value for each connected component, and comparing it with the threshold value, it is determined whether the connected component is a character data portion or a background data portion. Occurs.

【００４０】原画像イメージにおいて、文字の一部が細
い場合でも、ある程度一定した値を有する線幅パラメー
タ値が得られ、線幅パラメータの値により背景の線模様
やノイズ線等と、文字線との区別が容易に行え、細い文
字線部分を消失することなく、線模様やノイズ線等の不
要な線のみを除去することができる。In the original image, even when a part of a character is thin, a line width parameter value having a certain constant value is obtained, and the line width parameter value and the background line pattern, noise line, etc. Can be easily distinguished, and only unnecessary lines such as line patterns and noise lines can be removed without losing thin character lines.

【００４１】その結果、文字認識等を行うに際して、文
字の切出し処理や文字の認識処理が正確に行われる。特
に、文字データと背景データとが混在する新聞の見出し
記事等の文字認識に、この実施形態の画像処理装置を適
用すると有効である。As a result, when performing character recognition or the like, character extraction processing and character recognition processing are performed accurately. In particular, it is effective to apply the image processing apparatus of this embodiment to character recognition of a headline article of a newspaper in which character data and background data are mixed.

【００４２】（Ｂ）第２の実施形態次に、この発明による画像処理装置の第２の実施形態に
ついて図面を参照しながら説明する。(B) Second Embodiment Next, an image processing apparatus according to a second embodiment of the present invention will be described with reference to the drawings.

【００４３】なお、この第２の実施形態は、第１の実施
形態における画像処理装置の機能に加え、入力された画
像の文字線の色を自動的に判定する機能を有するもので
ある。The second embodiment has a function of automatically determining the color of a character line of an input image in addition to the function of the image processing apparatus in the first embodiment.

【００４４】（Ｂ−１）第２の実施形態の構成図７は、この第２の実施形態の画像処理装置の機能的ブ
ロック図である。図７において、この第２の実施形態
は、画像記録媒体１０と、画像格納部２０と、連結成分
検出部３０と、連結成分格納部４０と、線幅検出部５０
と、文字背景判定部６０と、背景除去部７０と、文字色
判定部８２と、出力画像格納部９０とから構成されてい
る。(B-1) Configuration of the Second Embodiment FIG. 7 is a functional block diagram of an image processing apparatus according to the second embodiment. 7, in the second embodiment, the image recording medium 10, the image storage unit 20, the connected component detection unit 30, the connected component storage unit 40, and the line width detection unit 50
, A character background determination unit 60, a background removal unit 70, a character color determination unit 82, and an output image storage unit 90.

【００４５】この第２の実施形態と第１の実施形態との
相違点は、第１の実施形態では、連結成分検出部３０と
線幅検出部５０とに対して、文字色情報が外部から与え
られていたが、この第２の実施形態では、原画像での文
字線の色を文字色判定部８２によって判定し、判定結果
を文字色情報として連結成分検出部３０と線幅検出部５
０とに供給する点にある。The difference between the second embodiment and the first embodiment is that, in the first embodiment, character color information is supplied from outside to the connected component detection unit 30 and the line width detection unit 50. However, in the second embodiment, the color of the character line in the original image is determined by the character color determination unit 82, and the determination result is used as character color information as the connected component detection unit 30 and the line width detection unit 5.
0.

【００４６】従って、画像記録媒体１０と、画像格納部
２０と、連結成分検出部３０と、連結成分格納部４０
と、線幅検出部５０と、文字背景判定部６０と、背景除
去部７０と、出力画像格納部９０とが有する機能につい
ては、第１の実施形態で詳述したので、この第２の実施
形態ではそれらの説明は省略する。Therefore, the image recording medium 10, the image storage unit 20, the connected component detection unit 30, and the connected component storage unit 40
The functions of the line width detection unit 50, the character background determination unit 60, the background removal unit 70, and the output image storage unit 90 have been described in detail in the first embodiment. In the embodiment, those descriptions are omitted.

【００４７】文字色判定部８２は、画像格納部２０から
出力された原画像データを入力とし、原画像データにお
ける文字線の色を判定し、判定結果を文字色情報として
出力するものである。なお、文字色判定部８２は、その
有する機能によって３つの構成部分に分けることができ
る。以下、文字色判定部８２を構成する各部について詳
述する。The character color determination unit 82 receives the original image data output from the image storage unit 20, determines the color of the character line in the original image data, and outputs the determination result as character color information. The character color determination unit 82 can be divided into three components according to the functions of the character color determination unit 82. Hereinafter, each unit constituting the character color determination unit 82 will be described in detail.

【００４８】図８は、文字色判定部８２の機能的ブロッ
ク図である。文字色判定部８２は、黒塊算出部８４と、
画像収縮部８６と、判定部８８とから構成されている。
以下、上記各部について詳述する。FIG. 8 is a functional block diagram of the character color judgment section 82. The character color determination unit 82 includes a black lump calculation unit 84,
It comprises an image contracting section 86 and a determining section 88.
Hereinafter, the above components will be described in detail.

【００４９】画像収縮部８６は、画像格納部２０から出
力された原画像データを入力とし、原画像データと同じ
大きさの領域のメモリを保有し、原画像データを８近傍
収縮した収縮画像データを作成し、当該収縮画像データ
を出力するものである。The image shrinking section 86 receives the original image data output from the image storage section 20 as input, holds a memory having an area of the same size as the original image data, and shrinks the original image data by about 8 times. Is created and the contracted image data is output.

【００５０】黒塊率算出部８４は、画像格納部２０から
出力された原画像データと、画像収縮部８６から出力さ
れた収縮画像データとを入力とし、原画像データと収縮
画像データとのそれぞれのデータについて、黒塊率を算
出するものである。The black lump ratio calculating unit 84 receives the original image data output from the image storage unit 20 and the contracted image data output from the image contracting unit 86 as inputs, and outputs the original image data and the contracted image data respectively. Is used to calculate the black lump ratio.

【００５１】具体的には、黒塊算出部８４は、図９に示
すようにｂ１〜ｂ９からなる３行３列の観測窓を具備
し、原画像データ及び収縮画像データを構成する全画素
を上記３×３マスクの観測窓の中心の窓である観測窓ｂ
５と３×３マスクｂ１〜ｂ９とで覗くように上記３×３
マスクの観測窓をラスタ走査する。このとき、観測窓ｂ
５の画素が黒の場合には、黒点数Ａ２値を１つ増すよう
に計数する。また、３×３マスクの観測窓から見える全
ての画素が黒の場合には、黒塊数Ｂを１つ増すように計
数する。ただし、黒点数Ａ２及び黒塊数Ｂの初期値は０
とする。Specifically, as shown in FIG. 9, the black lump calculation unit 84 includes an observation window of three rows and three columns, b1 to b9, and calculates all pixels constituting the original image data and the contracted image data. Observation window b which is the center window of the observation window of the above 3 × 3 mask
5 and 3 × 3 masks b1 to b9 so that the 3 × 3 mask
The observation window of the mask is raster-scanned. At this time, observation window b
When the pixel of No. 5 is black, counting is performed so that the value A2 of the number of black points is increased by one. When all the pixels visible from the observation window of the 3 × 3 mask are black, the counting is performed so that the black block number B is increased by one. However, the initial values of the number of black points A2 and the number of black blocks B are 0.
And

【００５２】そして、（２）式に示すように、原画像デ
ータの黒塊数ＢＧと黒点数Ａ２Ｇとの比によって原画像
データにおける黒塊率ＫＧを定義する。また、（３）式
に示すように、収縮画像データの黒塊数ＢＳと黒点数Ａ
２Ｓとの比によって、収縮画像データにおける黒塊率Ｋ
Ｓを定義する。ただし、原画像データの黒点数Ａ２Ｇが
０のときは、原画像データの黒塊率ＫＧは０とし、収縮
画像データの黒点数Ａ２Ｓが０のときは、収縮画像デー
タの黒塊率ＫＳは０とする。Then, as shown in the equation (2), the black lump ratio KG in the original image data is defined by the ratio of the number of black lump BG of the original image data to the number of black dots A2G. Further, as shown in equation (3), the number of black blocks BS and the number of black points A in the contracted image data are obtained.
The black lump ratio K in the contracted image data is determined by the ratio with 2S.
Define S. However, when the number of black dots A2G of the original image data is 0, the black lump ratio KG of the original image data is 0, and when the number of black dots A2S of the contracted image data is 0, the black lump ratio KS of the contracted image data is 0. And

【００５３】原画像データの黒塊率：ＫＧ＝ＢＧ／Ａ２Ｇ …（２）収縮像データの黒塊率：ＫＳ＝ＢＳ／Ａ２Ｓ …（３）判定部８８は、黒塊率算出部８４から出力された原画像
データの黒塊率ＫＧと収縮画像データの黒塊率ＫＳとを
比較することによって、原画像データの文字色を判断
し、判定結果を文字色情報として出力するものである。
すなわち、判定部８８は、原画像データの黒塊率ＫＧの
値が収縮像データの黒塊率ＫＳの値のより小さいとき、
原画像データの文字を黒と判断し、原画像データの黒塊
率ＫＧの値が収縮像データの黒塊率ＫＳの値以上のと
き、原画像データの文字を白と判定し、判定結果を文字
色情報として連結成分検出部３０と線幅検出部５０とに
出力する。Black lump ratio of original image data: KG = BG / A2G (2) Black lump ratio of contracted image data: KS = BS / A2S (3) The judgment unit 88 outputs from the black lump ratio calculation unit 84 By comparing the black block rate KG of the original image data and the black block rate KS of the contracted image data, the character color of the original image data is determined, and the determination result is output as character color information.
That is, when the value of the black block rate KG of the original image data is smaller than the value of the black block rate KS of the contracted image data,
The character of the original image data is determined to be black, and when the value of the black lump ratio KG of the original image data is equal to or greater than the value of the black lump ratio KS of the contracted image data, the character of the original image data is determined to be white. It is output to the connected component detection unit 30 and the line width detection unit 50 as character color information.

【００５４】（Ｂ−２）第２の実施形態の動作上述したように、第２の実施形態と第１の実施形態との
相違点は、第１の実施形態では、外部から与えられてい
た文字色情報を、第２の実施形態では、第２の実施形態
の画像処理装置を構成する文字色判定部８２で生成する
点にある。(B-2) Operation of the Second Embodiment As described above, the difference between the second embodiment and the first embodiment is that in the first embodiment, the difference is given externally. In the second embodiment, the character color information is generated by the character color determination unit 82 included in the image processing apparatus according to the second embodiment.

【００５５】そこで、この第２の実施形態については、
文字色判定部８２の動作についてのみ詳述する。Therefore, in the second embodiment,
Only the operation of the character color determination unit 82 will be described in detail.

【００５６】画像格納部２０から出力された原画像デー
タから８近傍収縮した収縮画像データが、画像収縮部８
６によって作成される。黒塊率算出部８４では、画像格
納部２２から出力された原画像データから原画像データ
の黒点数Ａ２Ｇと原画像データの黒塊数ＢＧとが計数さ
れ、原画像データの黒塊率ＫＧが、上記黒点数Ａ２Ｇと
黒塊数ＢＧとの比によって算出される。同様に、画像収
縮部８６から出力された収縮画像データから、収縮画像
データの黒点数Ａ２Ｓと収縮画像データの黒塊数ＢＳと
が計数され、収縮画像データの黒塊率ＫＳが算出され
る。黒塊率算出部８４から出力された原画像データの黒
塊率ＫＧと収縮画像データの黒塊率ＫＳとが、判定部８
８で比較され、その大小関係によって原画像データを構
成する文字の色が判断され、判定結果が文字色情報とし
て出力される。From the original image data output from the image storage unit 20, the contracted image data that has been contracted by about 8 is converted to the image contracted unit 8.
6 is created. The black lump ratio calculating unit 84 counts the number of black points A2G of the original image data and the number of black lump BG of the original image data from the original image data output from the image storage unit 22, and calculates the black lump ratio KG of the original image data. , The number of black points A2G and the number of black blocks BG. Similarly, from the contracted image data output from the image contracting unit 86, the number of black points A2S of the contracted image data and the number of black blocks BS of the contracted image data are counted, and the black block ratio KS of the contracted image data is calculated. The black block rate KG of the original image data and the black block rate KS of the contracted image data output from the black block rate calculation section 84 are determined by the determination section 8.
Then, the colors of the characters constituting the original image data are determined based on the magnitude relation, and the determination result is output as character color information.

【００５７】例えば、図２（ａ）と図２（ｂ）とのそれ
ぞれを原画像として、第２の実施形態の画像処理装置に
入力したとき、黒塊算出部８４と判定部８８とのそれぞ
れが行う動作は以下のとおりになる。For example, when each of FIG. 2 (a) and FIG. 2 (b) is input to the image processing apparatus of the second embodiment as an original image, each of the black lump calculating unit 84 and the determining unit 88 Is performed as follows.

【００５８】まず、図２（ａ）を原画像データとした場
合、黒塊算出部８４によって、原画像データの黒点数Ａ
２Ｇは７５７７と、黒塊数ＢＧは１３３９と計数され、
結局、原画像データの黒塊率ＫＧは０．１７と算出され
る。また、収縮画像データの黒点数Ａ２Ｓは１３３９
と、黒塊数ＢＳは５３０と計数され、収縮画像データの
黒塊率ＫＳは０．３９と算出される。従って、原画像デ
ータの黒塊率ＫＧは収縮画像データの黒塊率ＫＳより小
さいので、判定部８８によって、原画像データは黒文字
と判定され、判定結果が文字色情報として出力される。First, when FIG. 2A is used as the original image data, the black lump calculator 84 calculates the number of black points A of the original image data.
2G is counted as 7577, black lump count BG is counted as 1339,
As a result, the black lump ratio KG of the original image data is calculated to be 0.17. The number of black points A2S of the contracted image data is 1339.
Is counted as 530, and the black chunk ratio KS of the contracted image data is calculated as 0.39. Therefore, since the black lump ratio KG of the original image data is smaller than the black lump ratio KS of the contracted image data, the determination unit 88 determines that the original image data is a black character, and outputs the determination result as character color information.

【００５９】一方、図２（ｂ）を原画像データとした場
合、黒塊算出部８４によって、原画像データの黒点数Ａ
２Ｇは４０６４５と、黒塊数ＢＧは３３５８８と計数さ
れ、結局、原画像データの黒塊率ＫＧは０．８２と算出
される。また、収縮画像データの黒点数Ａ２Ｓは３３５
８８と、黒塊数ＢＳは２６０９７と計数され、収縮画像
データの黒塊率ＫＳは０．７７と算出される。従って、
原画像データの黒塊率ＫＧは収縮画像データの黒塊率Ｋ
Ｓより大きいので、判定部８８によって、原画像データ
は白文字と判定され、判定結果が文字色情報として出力
される。On the other hand, when the original image data is shown in FIG. 2B, the black lump calculating section 84 calculates the number of black points A of the original image data.
2G is counted as 40645, and the number of black lumps BG is counted as 33588. Eventually, the black lumps ratio KG of the original image data is calculated as 0.82. The number of black points A2S of the contracted image data is 335.
88 and the number of black lumps BS are counted as 26097, and the lumps ratio KS of the contracted image data is calculated as 0.77. Therefore,
The black block rate KG of the original image data is the black block rate K of the contracted image data.
Since it is larger than S, the determination unit 88 determines that the original image data is a white character, and outputs the determination result as character color information.

【００６０】文字色判定部８２から出力された文字色情
報は、連結成分検出部３０と線幅検出部５０とに入力さ
れる。なお、これ以降の第２の実施形態の動作について
は、第１の実施形態の動作と変わるところがないのでそ
の記載を省略する。The character color information output from the character color determination unit 82 is input to the connected component detection unit 30 and the line width detection unit 50. Note that the subsequent operation of the second embodiment is not different from the operation of the first embodiment, and thus the description thereof is omitted.

【００６１】図６（ａ）は、この第２実施形態によっ
て、図２（ａ）に示された画像の文字色を判定し（文字
色は、黒である）、第１の実施例と同様の動作を行なう
ことによって、背景データ部分が含まれた図２（ａ）の
画像から背景データ部分を除去し、目的とする文字のみ
を抽出した画像の例である。図６（ｂ）は、図６（ａ）
の場合と同様、この第２実施形態によって、図２（ｂ）
の画像の文字色を判定し（文字色は、白である）、背景
データ部分を除去し、目的とする文字のみを抽出した画
像の例である。FIG. 6A illustrates the character color of the image shown in FIG. 2A (the character color is black) according to the second embodiment, and is similar to that of the first embodiment. 2A is an example of an image in which the background data portion is removed from the image of FIG. 2A including the background data portion and only the intended characters are extracted. FIG. 6 (b) is the same as FIG.
As in the case of FIG. 2B, the second embodiment makes it possible to use FIG.
This is an example of an image in which the character color of the image is determined (the character color is white), the background data portion is removed, and only the target character is extracted.

【００６２】（Ｂ−３）第２の実施形態の効果第２の実施形態の画像処理装置によっても、原画像デー
タから文字色と同一色の連結成分を検出し、各連結成分
ごとに線幅パラメータ値を算出し、それをしきい値と比
較することにより、当該連結成分が文字データ部分か背
景データ部分かを判定するので、原画像イメージにおい
て、文字の一部が細い場合でも、ある程度一定した値を
有する線幅パラメータ値が得られ、線幅パラメータの値
により背景の線模様やノイズ線等の不要な背景要素と、
文字線との区別が容易に行え、細い文字線部分を消失す
ることなく、線模様やノイズ等の不要な背景要素のみを
除去することができる。(B-3) Effects of the Second Embodiment The image processing apparatus of the second embodiment also detects connected components of the same color as the character color from the original image data, and sets the line width for each connected component. By calculating the parameter value and comparing it with a threshold value, it is determined whether the connected component is a character data portion or a background data portion. A line width parameter value having the calculated value is obtained, and an unnecessary background element such as a background line pattern or a noise line is obtained according to the value of the line width parameter.
Character lines can be easily distinguished, and unnecessary background elements such as line patterns and noise can be removed without losing thin character line portions.

【００６３】これに加えて、第２の実施形態の画像処理
装置によれば、原画像データの黒塊率と収縮画像データ
の黒塊率とを算出し、これら２つの黒塊率を比較するこ
とにより、原画像の文字色を自動的に判別する機能を有
するので、黒文字画像又は白文字画像のいずれが入力さ
れるかが未知なときにでも、文字部分と背景部分との切
り分けを適切に行うことができる。In addition to this, according to the image processing apparatus of the second embodiment, the black block ratio of the original image data and the black block ratio of the contracted image data are calculated, and these two black block ratios are compared. In this way, since it has a function of automatically determining the character color of the original image, even when it is unknown whether a black character image or a white character image is input, it is possible to appropriately separate the character portion from the background portion. It can be carried out.

【００６４】そのため、黒文字画像又は白文字画像のい
ずれで記載されるかが統一されていない新聞見出し記事
等の文字認識に、この第２実施形態の画像処理装置を適
用すると有効である。Therefore, it is effective to apply the image processing apparatus of the second embodiment to character recognition of a newspaper headline article or the like in which whether it is described as a black character image or a white character image is unified.

【００６５】（Ｃ）その他の実施形態なお、第１及び第２の実施形態では、文字データ部分と
背景データ部分とを線幅パラメータにより判定する際に
必要となるしきい値を、文字背景判定部に予め格納して
おくこととしているが、例えば、原画像の幅、高さなど
の情報から文字の大きさを推定することによって、適当
なしきい値を自動的に設定する方法を用いても良い。ま
た、外部からユーザーが任意に設定できるようにしても
良い。(C) Other Embodiments In the first and second embodiments, the threshold value required for determining the character data portion and the background data portion by the line width parameter is determined by the character background determination. Although it is supposed to be stored in advance in the section, for example, a method of automatically setting an appropriate threshold value by estimating the character size from information such as the width and height of the original image may be used. good. Further, the user may be able to arbitrarily set the setting from outside.

【００６６】また、第１及び第２の実施形態では、背景
除去部において連結成分が背景データ部分の場合には、
当該連結成分について白黒反転して出力することとして
いるが、これに限らず、出力画像格納部の出力画像を文
字色と別の色で初期化しておき、文字背景判定部におい
て背景データ部分と判定された連結成分については、背
景除去部から出力しないことにより、結果的に背景デー
タ部分を除去する方法を用いても良い。In the first and second embodiments, when the connected component is the background data portion in the background removing unit,
The connected component is output after being inverted in black and white, but is not limited to this. The output image of the output image storage unit is initialized with a color different from the character color, and the character background determination unit determines the background data part. A method of removing the background data portion by not outputting the connected component from the background removal unit may be used.

【００６７】さらに、第２の実施形態において、原画像
の文字色が白の場合、最終出力画像（出力画像格納部に
格納される画像）の文字色を白、文字背景部の色を黒と
しているが、文字色判定部において文字色を判定した
後、原画像の文字色が白の場合だけ原画像データ全体を
白黒反転させ、黒文字として連結成分検出部以降の処理
を続けることとしても良い。Further, in the second embodiment, when the character color of the original image is white, the character color of the final output image (the image stored in the output image storage unit) is white, and the color of the character background portion is black. However, after the character color is determined by the character color determination unit, the entire original image data may be inverted between black and white only when the character color of the original image is white, and the processing after the connected component detection unit may be continued as black characters.

【００６８】さらにまた、第２の実施形態では、画像収
縮部において画像を収縮するとき、８近傍収縮を用いて
画像を収縮することとしたが、この発明はこの方法に限
定されるものではない。画像の収縮には画像の解像度に
応じた適当な収縮方法を用いれば良く、例えば、４近傍
収縮を用いても良い。Furthermore, in the second embodiment, when the image is contracted in the image contracting section, the image is contracted by using near-eight contraction. However, the present invention is not limited to this method. . An appropriate contraction method according to the resolution of the image may be used for the contraction of the image. For example, 4-neighbor contraction may be used.

【００６９】また、第２の実施形態では、文字色判定部
において、原画像データの黒塊率と収縮画像データの黒
塊率とを算出し、これら２つの黒塊率を比較することに
より、黒文字画像又は白文字画像のいずれかを判定する
こととしていたが、原画像データの黒塊率と膨張画像デ
ータの黒塊率とを算出し、これら２つの黒塊率を比較す
ることにより、黒文字画像又は白文字画像のいずれかを
判定することととしても良い。この場合、原画像データ
の黒塊率と膨張画像データの黒塊率との大小による文字
色の判定基準は、実施形態のものと逆になる。In the second embodiment, the character color determination unit calculates the black lump ratio of the original image data and the black lump ratio of the contracted image data, and compares these two black lump ratios. Either the black character image or the white character image was determined, but the black character ratio of the original image data and the black character ratio of the expanded image data were calculated, and by comparing these two black character ratios, the black character ratio was determined. Either the image or the white character image may be determined. In this case, the criterion for determining the character color based on the magnitude of the black lump ratio of the original image data and the black lump ratio of the expanded image data is opposite to that of the embodiment.

【００７０】さらに、第２の実施形態では、黒塊算出部
において、黒塊を３×３マスクを埋める９つの黒画素の
集まりとして定義しているが、この発明はこれに限定さ
れるものではない。例えば、黒塊を画像の解像度に応じ
た適当な大きさ、形を有する黒画素の集まりと定義して
も良い。また、白塊を利用して文字色を判定しても良
い。この場合も、原画像データの白塊率と収縮画像デー
タの白塊率との大小による文字色の判定基準は、実施形
態のものと逆になる。Further, in the second embodiment, the black chunk calculating section defines the black chunk as a set of nine black pixels filling a 3 × 3 mask, but the present invention is not limited to this. Absent. For example, a black block may be defined as a group of black pixels having an appropriate size and shape according to the resolution of an image. Alternatively, the character color may be determined using a white lump. Also in this case, the criterion of the character color based on the size of the white lump ratio of the original image data and the white lump ratio of the contracted image data is opposite to that of the embodiment.

【００７１】さらにまた、第１と第２の実施形態では、
画像記録媒体を入力装置として用いているが、この発明
はこれに限定されるものではない。例えば、いわゆるス
キャナやファクシミリ等の画像読取装置を入力装置とし
て用いても良い。Further, in the first and second embodiments,
Although the image recording medium is used as the input device, the present invention is not limited to this. For example, an image reading device such as a so-called scanner or facsimile may be used as the input device.

【００７２】また、第１と第２の実施形態では、線幅検
出部において、黒点数と４黒点数とから線幅パラメータ
を検出することとしたが、連結成分の線幅を検出するこ
とのできる他の方法を用いても良いことは勿論である。Further, in the first and second embodiments, the line width detecting unit detects the line width parameter from the number of black points and the number of four black points. Of course, other possible methods may be used.

【００７３】さらに、第２の実施形態では、文字色判定
部において、原画像データと、原画像データの収縮画像
データとそれぞれについて黒塊率を算出し、その比率を
比較することによって、原画像データの文字線の色を判
定することとしたが、原画像データの文字線の色を判定
することのできる他の方法を用いても良い。Furthermore, in the second embodiment, the character color determination unit calculates the black lump ratio for each of the original image data and the contracted image data of the original image data, and compares the ratios to obtain the original image data. Although the color of the character line of the data is determined, another method that can determine the color of the character line of the original image data may be used.

【００７４】[0074]

【発明の効果】以上のように、この発明によれば、原画
像データから文字色と同一色の連結成分を検出し、各連
結成分ごとに線幅を算出し、それをしきい値と比較する
ことにより、当該連結成分が文字データ部分か背景デー
タ部分かを判定するので、原画像イメージにおいて、文
字の一部が細くなっている画像でも文字部分と背景部分
とを適切に分けることができる。As described above, according to the present invention, a connected component having the same color as a character color is detected from original image data, a line width is calculated for each connected component, and the line width is compared with a threshold value. By doing so, it is determined whether the connected component is a character data portion or a background data portion. Therefore, even in an image in which a part of a character is thin in the original image, the character portion and the background portion can be appropriately separated. .

[Brief description of the drawings]

【図１】第１の実施形態の機能的ブロック図である。FIG. 1 is a functional block diagram of a first embodiment.

【図２】原画像例である。FIG. 2 is an example of an original image.

【図３】背景データ部分を除去した後の画像例である。FIG. 3 is an image example after a background data portion is removed.

【図４】黒点を検出するための観測窓を示した図であ
る。FIG. 4 is a diagram showing an observation window for detecting a black spot.

【図５】連結成分とそれに対応する線幅パラメータ値と
を記したものである。FIG. 5 shows connected components and corresponding line width parameter values.

【図６】背景データ部分を除去した後の画像である。FIG. 6 is an image after removing a background data portion.

【図７】第２の実施形態の機能的ブロック図である。FIG. 7 is a functional block diagram of the second embodiment.

【図８】文字色判定部８２の機能的ブロック図である。FIG. 8 is a functional block diagram of a character color determination unit 82;

【図９】黒塊を検出するための観測窓を示した図であ
る。FIG. 9 is a diagram showing an observation window for detecting a black lump.

[Explanation of symbols]

３０…連結成分検出部、５０…線幅検出部、６０…文字
背景判定部。Reference numeral 30 denotes a connected component detection unit, 50 denotes a line width detection unit, and 60 denotes a character background determination unit.

Claims

[Claims]

1. A connected component detecting means for detecting a connected component having the same color as a character line color from original image data; a line width detecting means for detecting a line width of each of the connected components; Image processing characterized by having a character background determining means for determining whether each of the connected components is a character part or a background part by comparing the determined line width with a threshold value and outputting a determination result apparatus.

2. The image processing apparatus according to claim 1, further comprising a background removing unit that removes only the connected component in the original image data when the connected component is a background according to the determination result of the character background determining unit. An image processing apparatus according to claim 1.

3. The image processing apparatus according to claim 1, further comprising a character color determination unit configured to determine a color of a character line of the original image data.
Or the image processing apparatus according to 2.

4. The character color determining means creates contracted image data from the original image data, and, for each of the original image data and the contracted image data, a pixel having the same color as the character color among all pixels constituting the image data. Is counted as the number of character color pixels, and for each of all pixels constituting the image data, the number of pixels of interest in which all surrounding pixels around the pixel of interest are the same color as the character color is defined as the number of character color blocks. Counting, calculating the value obtained by dividing the number of character color blocks by the number of character color pixels, as a character color block rate, and calculating the color constituting the character by the character color block rate of the original image data and the character color block rate of the contracted image data The image processing apparatus according to claim 3, wherein: