JP2000341515A

JP2000341515A - Method and device for extracting text area and recording medium

Info

Publication number: JP2000341515A
Application number: JP11151565A
Authority: JP
Inventors: Hirofumi Nishida; 広文西田
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1999-05-31
Filing date: 1999-05-31
Publication date: 2000-12-08

Abstract

PROBLEM TO BE SOLVED: To surely extract a text area that is printed in a colored ground from a colored image. SOLUTION: In a method for extracting a text area, a colored image expressed in R, G, and B colors is transformed into color coordinates, having high independency between components reflecting the visual characteristics of humans (S20) and image texture characteristics are extracted for every transformed component (S30). Then each pixel is classified into a text area and a non-text area by utilizing the image texture characteristics (S40) and the connecting components of the pixels classified into the text area are extracted as the text area (S50).

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、文書画像処理の分
野に係り、特に、デジタルカラー画像から文字や記号等
から構成されるテキスト領域を抽出する技術に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to the field of document image processing and, more particularly, to a technique for extracting a text area composed of characters and symbols from a digital color image.

【０００２】[0002]

【従来の技術】従来、文書は白黒印刷されたものが主だ
ったが、最近はカラープリンタの普及に伴い、文書のカ
ラー化が進んできている。文書画像処理技術も、白黒の
２値画像や濃淡のグレースケール画像を対象とするもの
が多かったが、カラー文書画像を対象とするものの必要
性が高まっている。2. Description of the Related Art Conventionally, documents are mainly printed in black and white, but recently, with the spread of color printers, the colorization of documents has been progressing. Many of the document image processing techniques target black-and-white binary images and gray-scale grayscale images, but the need for color document images is increasing.

【０００３】このようなカラー文書画像処理技術の応用
分野としては、紙に印刷された文書画像をイメージスキ
ャナなどの画像入力機器を通してコンピュータに入力し
て得られるデジ夕ルカラー画像を処理して、ＯＣＲによ
り文字などのテキスト情報をコード化する文書画像認織
や、デジタルコピアにおいて、入力された画像から文字
領域と中問調領域を分別して処理することによって画質
向上をはかるための像域分離などが挙げられる。このよ
うな処理を施すためには、まず、デジタルカラー画像か
ら文字や記号などから構成されたテキスト領域を抽出す
ることが必要である。As an application field of such a color document image processing technology, a digital color image obtained by inputting a document image printed on paper to a computer through an image input device such as an image scanner is processed, and the OCR is performed. Document image recognition that encodes text information such as characters, and image area separation for improving image quality by separating and processing character areas and medium tone areas from input images in digital copiers. No. In order to perform such processing, it is necessary to first extract a text area composed of characters, symbols, and the like from a digital color image.

【０００４】デジタルカラー画像からテキスト領域を抽
出するための従来技術として、文献［Anil K.Jain and
Yao Chen, " Address block location using color and
texture analysis," ＣＶＧＩＰ: Image Understandin
g, vol.60, no.2, pp.179−190, September 1994 ］に
述べられている方法が知られている。この方法において
は、ＲＧＢの３成分で表されたデジタルカラー画像の各
画素の輝度値を計算し、その輝度値を用いた（又はＧ値
を輝度値として代用した）グレースケール画像を生成す
る。このグレースケール画像の各画素値をある閾値で２
値化することにより、白の領域とそれ以外の領域とに分
離する。また、グレースケール画像のテクスチャ特徴を
用いて、テキスト領域とそれ以外の領域とに分離する。
そして、２つの処理結果を統合し、画像テキスチャ特徴
によってテキスト領域とされた領域のうちで、白領域に
囲まれた領域のみを最終的にテキスト領域として抽出す
る。[0004] As a conventional technique for extracting a text region from a digital color image, a document [Anil K. Jain and
Yao Chen, "Address block location using color and
texture analysis, "CVGIP: Image Understandin
g, vol. 60, no. 2, pp. 179-190, September 1994]. In this method, a luminance value of each pixel of a digital color image represented by three components of RGB is calculated, and a grayscale image using the luminance value (or using the G value as a luminance value) is generated. Each pixel value of this grayscale image is 2
By binarizing, it is separated into a white area and other areas. In addition, the image is separated into a text region and other regions using the texture characteristics of the grayscale image.
Then, the two processing results are integrated, and only the area surrounded by the white area is finally extracted as the text area from the text area by the image texture feature.

【０００５】[0005]

【発明が解決しようとする課題】前記従来技術は、白地
に印刷されたテキストの領域を抽出する目的には有効で
ある。また、画像テクスチャ特徴の計算に、デジタルＦ
ＩＲフィルタとしてハードウェアで実装できるガボア
（Ｇａｂｏｒ）フィルタを用いるので、処理速度の点で
も優れている。しかしながら、色地に印刷されたテキス
ト領域の抽出精度が悪く、特に、グレースケール画像上
では、地（背景）とテキストの濃淡値が近いような場
合、あるいは、地の濃淡値が黒に近いような場合に正常
なテキスト領域抽出が難しいという問題がある。The above prior art is effective for the purpose of extracting an area of a text printed on a white background. In addition, a digital F
Since a Gabor filter that can be implemented by hardware is used as the IR filter, the processing speed is excellent. However, the extraction accuracy of a text area printed on a color background is poor, and particularly, on a grayscale image, when the gray value of the ground (background) and the text are close, or the gray value of the ground is close to black. In such a case, there is a problem that normal text region extraction is difficult.

【０００６】よって、本発明の目的は、そのような問題
を解決し、多種多様なカラー文書画像のテキスト領域を
精度良く抽出する方法及び装置を提供することにある。
本発明のもう一つの目的は、人間の色彩感覚により近い
テキスト領域抽出を可能にすることである。SUMMARY OF THE INVENTION It is therefore an object of the present invention to solve such a problem and to provide a method and an apparatus for accurately extracting text regions of various color document images.
Another object of the present invention is to enable text region extraction closer to human color perception.

【０００７】[0007]

【課題を解決するための手段】上記目的を達成するた
め、本発明においては、処理対象のデジタルカラー画像
の各成分毎に画像テキスチャ特徴を抽出し、抽出した画
像テキスチャ特徴を利用して、処理対象デジタルカラー
画像中の各画素をテキスト領域又は非テキスト領域に分
類する。好ましくは、処理対象のデジタルカラー画像に
対し、人間の視覚特性を反映した成分間の独立性が高い
他の色彩座標への変換を行い、変換後の各成分の画像テ
キスチャ特徴を抽出し、それを各画素の分類に利用す
る。In order to achieve the above object, according to the present invention, an image texture feature is extracted for each component of a digital color image to be processed, and processing is performed using the extracted image texture feature. Classify each pixel in the target digital color image into a text area or a non-text area. Preferably, the digital color image to be processed is converted to another color coordinate having a high degree of independence between components reflecting human visual characteristics, and an image texture feature of each converted component is extracted. Is used to classify each pixel.

【０００８】[0008]

【発明の実施の形態】添付図面を参照して、本発明の好
ましい実施の形態を説明する。添付図面中、図１はテキ
スト領域抽出処理手順の一例を示すフローチャート、図
２は画像テキスチャ特徴抽出処理手順の一例を示すフロ
ーチャートである。また、図３は、図１及び図２に示し
た処理を実行するテキスト領域抽出装置の一例を示すブ
ロック図である。以下、図１又は図２に示す処理の流れ
に沿い、テキスト領域抽出処理の内容とテキスト領域抽
出装置の構成を関連付けて説明する。Preferred embodiments of the present invention will be described with reference to the accompanying drawings. In the accompanying drawings, FIG. 1 is a flowchart showing an example of a text area extraction processing procedure, and FIG. 2 is a flowchart showing an example of an image texture feature extraction processing procedure. FIG. 3 is a block diagram illustrating an example of a text region extracting apparatus that executes the processing illustrated in FIGS. 1 and 2. Hereinafter, the contents of the text region extraction process and the configuration of the text region extraction device will be described in association with the flow of the process shown in FIG. 1 or FIG.

【０００９】《ステップＳ１０》カラー画像入力部１
００は、イメージスキャナやデジタルコピア等のカラー
画像入力機器、通信回線、あるいは、フロッピーディス
クや光磁気ディスクといった記録媒体を通して、テキス
ト領域を含むデジタルカラー画像を取り込む。ここで
は、入力されたデジタルカラー画像は、各画素の色が
Ｒ，Ｇ，Ｂに対応する色彩座標で表現されているものと
して説明する。入力されたデジタルカラー画像はメモリ
１０１に格納される。<< Step S10 >> Color image input unit 1
Reference numeral 00 captures a digital color image including a text area through a color image input device such as an image scanner or a digital copier, a communication line, or a recording medium such as a floppy disk or a magneto-optical disk. Here, the input digital color image will be described assuming that the color of each pixel is represented by color coordinates corresponding to R, G, and B. The input digital color image is stored in the memory 101.

【００１０】《ステップＳ２０》色彩座標変換部１０
２において、メモリ１０１に格納されているデジタルカ
ラー画像に対し、より人間の視覚特性を反映し成分間の
独立性が高い他の色彩座標への変換を行い、Ｒ，Ｇ，Ｂ
成分を別の３成分によって表現する。変換後の色彩座標
の各成分について各画素の値を集めると、３枚の濃淡値
画像すなわち第１成分画像、第２成分画像、第３成分画
像が得られる。これら各成分画像はメモリ１０３，１０
４，メモリ１０５にそれぞれ格納される。変換後の色彩
座標としては、例えば、次に挙げるような濃淡成分と２
つの独立な色相成分からなる色彩座標を利用可能であ
る。<< Step S20 >> Color coordinate conversion unit 10
In step 2, the digital color image stored in the memory 101 is converted into another color coordinate that reflects human visual characteristics and has high independence between components, thereby obtaining R, G, and B colors.
The component is represented by another three components. When the values of the respective pixels are collected for each component of the converted color coordinates, three gray value images, that is, a first component image, a second component image, and a third component image are obtained. These component images are stored in memories 103 and 10.
4, stored in the memory 105, respectively. As the color coordinates after the conversion, for example,
Color coordinates consisting of two independent hue components are available.

【００１１】（１）擬似ＫＬ空間ＲＧＢ空間から擬似ＫＬ空間への変換は次のような行列
で与えられる。(1) Pseudo KL space Conversion from RGB space to pseudo KL space is given by the following matrix.

【数１】第１成分Ｉ1 は濃淡（輝度）を表し、第２成分Ｉ2 及び
第３成分Ｉ3 は色相を表す。(Equation 1) The first component I1 represents shading (luminance), and the second component I2 and the third component I3 represent hue.

【００１２】（２）ＹyＣxＣz空間ＣＩＥＬａｂ空間を線形近似した空間で、ＸＹＺ空間
からの変換はつぎのような行列で与えられる。(2) YyCxCz space A space obtained by linearly approximating the CIE Lab space, and the conversion from the XYZ space is given by the following matrix.

【数２】第１成分Ｙy は濃淡（輝度）を表し、第２成分Ｃx と第
３成分Ｃz は色相を表す。ここで、ＲＧＢ空間からＸＹ
Ｚ空間への変換は次の行列で与えられる。(Equation 2) The first component Yy represents shading (luminance), and the second component Cx and the third component Cz represent hue. Here, XY from RGB space
The transformation to Z space is given by the following matrix:

【数３】 (Equation 3)

【００１３】（３）Ｙy−ｒg−ｙb空間ＸＹＺ空間からの変換は次のような行列で与えられる。(3) Yy-rg-yb space Conversion from XYZ space is given by the following matrix.

【数４】 (Equation 4)

【００１４】前記（３）式を考慮すると、ＹyＣxＣz と
Ｙy−ｒg−ｙb とは、第２成分が赤−緑の反対色応答、
第３成分が黄−青の反対色応答にそれぞれ対応する成分
から構成される色空間となる。Considering the above equation (3), YyCxCz and Yy-rg-yb are the opposite color responses of the second component of red-green,
The third component is a color space composed of components corresponding to the opposite color responses of yellow-blue respectively.

【００１５】なお、上に挙げた変換のほか、テレビジョ
ンで使われているＹＩＱ空間（Ｙは輝度、ＩとＱは色
差）などへの変換も利用し得る。In addition to the above-mentioned conversion, conversion to a YIQ space (Y is luminance, I and Q are color differences) used in television can be used.

【００１６】《ステップＳ３０》画像テクスチャ特徴
抽出部１０６において、各メモリ１０３，１０４，１０
５に格納された各成分画像（各成分の濃淡値画像）のテ
クスチャ特徴を抽出する。具体的には、図２に示すよう
にガボアフィルタを用いて画像テクスチャ特徴を計算す
る。<< Step S30 >> In the image texture feature extraction unit 106, each of the memories 103, 104, 10
The texture feature of each component image (the gray-scale value image of each component) stored in 5 is extracted. Specifically, an image texture feature is calculated using a Gabor filter as shown in FIG.

【００１７】すなわち、画素（ｍ，ｎ）での第ｉ成分
（ｉ＝１，２，３）の値をＩｉ（ｍ，ｎ）と表し、第ｉ
成分（ｉ＝１，２，３）の画像をＩｉと表す。ガボアフ
ィルタは、画像での方向性・周期的パターン特徴の検出
のための線形フィルタであり、４つのパラメータ（方向
θ、空問周波数ｕ、平滑化パラメータσx、σy）によっ
て決まり、そのインパルス応答は次式で与えられる。That is, the value of the ith component (i = 1, 2, 3) at the pixel (m, n) is expressed as Ii (m, n), and
The image of the component (i = 1, 2, 3) is represented by Ii. The Gabor filter is a linear filter for detecting directional / periodic pattern features in an image, and is determined by four parameters (direction θ, spatial frequency u, smoothing parameters σx, σy). Given by the formula.

【００１８】[0018]

【数５】ここで，空間周波数ｕの単位は１／画素である。平滑化
パラメータであるσx とσy はそれぞれ２つの直交方向
のガウシアン関数の標準偏差（単位：画素）である。Ｉ
ｉ（ｍ，ｎ）にガボアフィルタをかけた結果は、畳み込
みＧｉ（ｍ，ｎ）＝Ｉｉ（ｍ，ｎ）＊ｈ（ｍ，ｎ；ｕ，
θ，σx，σy）で与えられる。図４にθ＝０のときのガ
ボアフィルタのインパルス応答の例を示す。なお，σx
とσy は、ｕとθの値に応じて、周波数バンド幅と方向
バンド幅が適当な値になるように設定する。(Equation 5) Here, the unit of the spatial frequency u is 1 / pixel. The smoothing parameters σx and σy are standard deviations (unit: pixel) of Gaussian functions in two orthogonal directions, respectively. I
The result of applying Gabor filter to i (m, n) is convolution Gi (m, n) = Ii (m, n) * h (m, n; u,
θ, σx, σy). FIG. 4 shows an example of the impulse response of the Gabor filter when θ = 0. Note that σx
And σy are set in accordance with the values of u and θ so that the frequency bandwidth and the directional bandwidth have appropriate values.

【００１９】図２を参照する。まず、ｕとθの値をいく
つか選び（ステップＳ３１）、それぞれの（ｕ，θ）の
値の組について、各成分画像にガボアフィルタをかけた
結果であるフィルタ画像Ｇｉ（ｍ，ｎ）＝Ｉｉ（ｍ，
ｎ）＊ｈ（ｍ，ｎ；ｕ，θ，σx，σy），ｉ＝１，２，
３を計算する（ステップＳ３２）。Referring to FIG. First, several values of u and θ are selected (step S31), and for each set of values of (u, θ), a filtered image Gi (m, n) = Ii, which is the result of applying a Gabor filter to each component image. (M,
n) * h (m, n; u, θ, σx, σy), i = 1, 2,
3 is calculated (step S32).

【００２０】次に，各フィルタ画像の各画素の値ｔを次
式によって変換する（ステップＳ３３）。Next, the value t of each pixel of each filter image is converted by the following equation (step S33).

【数６】 (Equation 6)

【００２１】この変換後の画素値からなる各画像をｒ
（ｍ，ｎ）と表し，次式に示すように、画素（ｍ，ｎ）
で画像ｒ（ｍ，ｎ）の値をウィンドウ平均することによ
り、画像テクスチャ特徴を計算する。Each image composed of the converted pixel values is represented by r
(M, n), and as shown in the following equation, the pixel (m, n)
Calculates the image texture feature by window averaging the values of the image r (m, n).

【数７】ただし、Ｗm,n は画素（ｍ，ｎ）を中心とする、大きさ
Ｍ×Ｍのウィンドウである。(Equation 7) Here, Wm, n is a window of size M × M centered on pixel (m, n).

【００２２】このようにして３つの成分画像から計算さ
れた、（ｕ，θ）の値の各組についての画像テクスチャ
特徴のベクトルを作成し、それを分類部１０７へ渡す
（ステップＳ３５）。A vector of the image texture feature for each set of (u, θ) values calculated from the three component images in this manner is created and passed to the classification unit 107 (step S35).

【００２３】《ステップＳ４０》分類部１０７におい
て、前ステップで抽出された画像テクスチャ特徴を用
い、各画素をテキスト領域又はそれ以外の非テキスト領
域に分類する。分類結果はメモリ１０８に保存される。
この分類には、最近傍法やニューラルネットワークのよ
うなパターン分類の手法を用いることができる。予め学
習データを用いて、テキスト領域に属する画素の特徴量
と非テキスト領域に属する画索の特徴量とによって、分
類部１０７をトレーニングしておく。今の場合、特徴量
は様々な周波数ｕと方向θのガボアフィルタを用いて計
算されたテクスチャ特徴を並べたべクトルである。周波
数ｕの値をＭ通り，角度θの値をＮ通り選んだ場合，特
徴べクトルの次元は３ＭＮになる。<Step S40> The classifying unit 107 classifies each pixel into a text area or other non-text area using the image texture features extracted in the previous step. The classification result is stored in the memory 108.
For this classification, a method of pattern classification such as a nearest neighbor method or a neural network can be used. Using the learning data in advance, the classification unit 107 is trained based on the feature amounts of the pixels belonging to the text region and the feature amounts of the graphics belonging to the non-text region. In this case, the feature amount is a vector in which texture features calculated using a Gabor filter with various frequencies u and directions θ are arranged. When M values of the frequency u and N values of the angle θ are selected, the dimension of the feature vector is 3MN.

【００２４】《ステップＳ５０》テキスト領域抽出部
１０９において、メモリ１０８に格納されている各画素
の分類結果を参照し、テキスト領域に分類された画素の
連結成分を、入力されたデジタルカラー画像のテキスト
領域として抽出する。<< Step S50 >> The text area extraction unit 109 refers to the classification result of each pixel stored in the memory 108, and connects the connected components of the pixels classified into the text area to the text of the input digital color image. Extract as a region.

【００２５】以上説明した本発明の方法及び装置によれ
ば、デジタルカラー画像の各成分の画像テクスチャ特徴
を利用するため、前記従来技術で問題となった色地に印
刷されているようなテキスト領域を、グレースケール画
像に変換して見たときに地とテキストの濃淡値が近いよ
うな場合でも地の領域の濃淡値が黒に近いような場合で
も、精度良く抽出することができる。特に、画像テクス
チャ特徴の抽出に先だって、人間の視覚特性を反映した
成分間の独立性が高い色彩座標への変換を行うため、人
間の視覚によればかなりはっきりと地色と区別できる色
であるが、ＲＧＢデータでは差が小さい色で印刷された
テキストの領域も確実に抽出することができ、また、抽
出されたテキスト領域を人間が認識したテキスト領域
に、より接近させることができる。すなわち、人間の色
彩感覚により近いテキスト領域抽出が可能になる。According to the above-described method and apparatus of the present invention, since the image texture characteristics of each component of the digital color image are used, a text area printed on a color background, which is a problem in the prior art, is used. Can be accurately extracted regardless of whether the grayscale value of the ground is close to the text when converted into a grayscale image or the grayscale value of the ground area is close to black. In particular, prior to the extraction of image texture features, since the conversion to color coordinates with high independence between components reflecting human visual characteristics is performed, it is a color that can be clearly distinguished from ground color according to human vision. However, in the RGB data, a text region printed in a color having a small difference can be reliably extracted, and the extracted text region can be brought closer to a text region recognized by a human. That is, it is possible to extract a text area closer to human color sense.

【００２６】なお、ステップＳ３０の画像テクスチャフ
ィルタ抽出において、必ずしもガボアフィルタを利用し
なくともよい。しかし、前述のようにガボアフィルタは
ハードウエアで容易に実装できるため、処理速度の点で
優れている。In the extraction of the image texture filter in step S30, it is not always necessary to use a Gabor filter. However, as described above, the Gabor filter can be easily implemented by hardware, and is therefore excellent in processing speed.

【００２７】また、図示しないが、他の実施例によれば
ステップＳ２０の色彩座標変換が省かれ、入力されたデ
ジタルカラー画像のＲ，Ｇ，Ｇ各成分の画像テクスチャ
特徴が抽出され、これが分類ステップＳ４０で利用され
る。このようにしても、色地に印刷されているようなテ
キスト領域を確実に抽出することができる。ただし、前
述のように、ステップＳ２０の色彩座標変換を経由させ
たほうが、人間の色彩感覚により近いテキスト領域抽出
が可能である。Although not shown, according to another embodiment, the color coordinate conversion in step S20 is omitted, and image texture features of R, G, and G components of the input digital color image are extracted, which are classified. It is used in step S40. Even in this case, it is possible to reliably extract a text area printed on a color background. However, as described above, it is possible to extract a text region closer to human color sense by performing the color coordinate conversion in step S20.

【００２８】本発明は、例えば、図５に示すようなＣＰ
Ｕ２００、メモリ２０１、ディスプレイ装置２０２、キ
ーボードやマウス等のユーザ入力装置２０３、ハードデ
ィスク装置２０４、例えばフロッピーディスクや光磁気
ディスク、ＣＤ−ＲＯＭ等の記録媒体２０５の読み書き
のための媒体駆動装置２０６、イメージスキャナ等の画
像入力機器２０７や通信回線２０８との接続のための外
部インターフェース部２０９，２１０等をシステムバス
２１１で接続した一般的なコンピュータを利用し、ソフ
トウエアにより実施することも可能である。According to the present invention, for example, as shown in FIG.
U200, memory 201, display device 202, user input device 203 such as keyboard and mouse, hard disk device 204, medium drive device 206 for reading and writing recording medium 205 such as floppy disk, magneto-optical disk, CD-ROM, image It is also possible to use software such as a general computer in which external interface units 209 and 210 for connection to an image input device 207 such as a scanner and a communication line 208 are connected by a system bus 211.

【００２９】図１及び図２に関連して説明した本発明の
テキスト領域抽出処理の各ステップをコンピュータに実
行されるためのプログラム、又は、図３に関連して説明
した本発明のテキスト領域抽出装置の各手段の機能をコ
ンピュータのハードウエアを利用して実現させるための
プログラムは、例えば、同プログラムが記録された記録
媒体２０５からメモリ２０１に読み込まれて実行され
る。又は、同プログラムは予めハードディスク装置２０
４に格納されており、その実行時にメモリ２０１に読み
込まれて実行される。A program for causing a computer to execute each step of the text region extraction processing of the present invention described with reference to FIGS. 1 and 2, or the text region extraction of the present invention described with reference to FIG. A program for realizing the function of each unit of the apparatus by using hardware of a computer is read from the recording medium 205 in which the program is recorded into the memory 201 and executed. Alternatively, the program is stored in the hard disk drive 20 in advance.
4 is read into the memory 201 and executed at the time of execution.

【００３０】また、処理対象のデジタルカラー画像は、
例えば、記録媒体２０５や、外部インターフェース部２
０９，２１０を介し接続された画像入力機器２０７又は
通信回線２０８よりコンピュータに取り込まれる。テキ
スト領域抽出結果データはメモリ２０１上に得られる
が、これはハードディスク装置２０４や記録媒体２０５
に出力されたり、ディスプレイ装置２０２に画面出力さ
れたり、あるいは通信回線２０８に出力される。The digital color image to be processed is
For example, the recording medium 205 or the external interface unit 2
The image data is input to a computer via the image input device 207 or the communication line 208 connected via the input terminals 09 and 210. The text area extraction result data is obtained on the memory 201, which is stored in the hard disk drive 204 or the recording medium 205.
, The screen output to the display device 202, or the communication line 208.

【００３１】[0031]

【発明の効果】以上の説明から明らかなように、請求項
１乃至６の各項記載の発明によれば、色地に印刷された
ようなテキスト領域の抽出が可能であり、従来技術で最
も問題てあった、グレースケール画像に変換して見たと
きに地とテキストの濃淡値が近いような場合や地の領域
の濃淡値が黒に近いような場合でも確実なテキスト領域
抽出が可能であり、さらに請求項２又は５記載の発明に
よれば、人間の色彩感覚に近い高精度のテキスト領域抽
出が可能である。請求項３又は６記載の発明によれば、
画像テクスチャ特徴抽出の高速化が容易であり、したが
ってテキスト領域抽出処理全体の高速化に有利である。
また、請求項７又は８記載の発明によれば、一般的なコ
ンピュータを利用して、上に述べたようなテキスト領域
抽出処理を容易に行うことができる等々の効果を得られ
る。As is clear from the above description, according to the first to sixth aspects of the present invention, it is possible to extract a text area as if it were printed on a color background, and this is the most conventional technique. Even if the gray value of the ground and the text are close to each other when converted to a grayscale image and the gray value of the ground area is close to black, reliable text area extraction is possible. In addition, according to the second or fifth aspect of the present invention, it is possible to extract a text region with high accuracy close to human color sense. According to the invention of claim 3 or 6,
It is easy to speed up image texture feature extraction, which is advantageous for speeding up the entire text region extraction process.
Further, according to the seventh or eighth aspect of the invention, it is possible to obtain such effects that the above-described text region extraction processing can be easily performed using a general computer.

[Brief description of the drawings]

【図１】本発明によるテキスト領域抽出処理手順の一例
を示すフローチャートである。FIG. 1 is a flowchart showing an example of a text area extraction processing procedure according to the present invention.

【図２】画像テキスチャ特徴抽出処理手順の一例を示す
フローチャートである。FIG. 2 is a flowchart illustrating an example of an image texture feature extraction processing procedure;

【図３】本発明によるテキスト領域抽出装置の構成の一
例を示すブロック図である。FIG. 3 is a block diagram illustrating an example of a configuration of a text region extraction device according to the present invention.

【図４】方向θ＝０のときのガボアフィルタのインパル
ス応答の例を示す図である。FIG. 4 is a diagram illustrating an example of an impulse response of a Gabor filter when a direction θ = 0.

【図５】本発明をソフトウエアにより実施するためのコ
ンピュータの一例を示すブロック図である。FIG. 5 is a block diagram showing an example of a computer for implementing the present invention by software.

[Explanation of symbols]

１００カラー画像入力部１０１デジタルカラー画像記憶用メモリ１０２色彩座標変換部１０３第１成分画像記憶用メモリ１０４第２成分画像記憶用メモリ１０５第３成分画像記憶用メモリ１０６画像テクスチャ特徴抽出部１０７分類部１０８分類結果記憶用メモリ１０９テキスト領域抽出部 REFERENCE SIGNS LIST 100 color image input unit 101 digital color image storage memory 102 color coordinate conversion unit 103 first component image storage memory 104 second component image storage memory 105 third component image storage memory 106 image texture feature extraction unit 107 classification unit 108 Classification result storage memory 109 Text area extraction unit

フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｈ０４Ｎ 1/46 ＺＦターム(参考） 5C077 MP06 MP08 PP01 PP27 PP31 PP32 PP43 PQ22 5C079 HB01 HB11 LA01 LA06 MA01 NA29 5C082 AA27 BA02 BA20 BA27 BA34 CA21 CA54 CB01 DA87 5L096 AA02 AA06 AA07 BA07 BA17 EA26 FA44 FA81 GA40 GA55 9A001 HH28 HH31 JJ28 Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat II (Reference) H04N 1/46 Z F Term (Reference) 5C077 MP06 MP08 PP01 PP27 PP31 PP32 PP43 PQ22 5C079 HB01 HB11 LA01 LA06 MA01 NA29 5C082 AA27 BA02 BA20 BA27 BA34 CA21 CA54 CB01 DA87 5L096 AA02 AA06 AA07 BA07 BA17 EA26 FA44 FA81 GA40 GA55 9A001 HH28 HH31 JJ28

Claims

[Claims]

1. A method for extracting a text area of a digital color image to be processed represented by a certain color coordinate for each pixel color, the image texture feature for each component of the color coordinate from the digital image to be processed. And extracting each pixel of the digital color image to be processed into a text area or a non-text area using the extracted image texture feature.

2. A method for extracting a text area of a processing target digital color image represented by a certain color coordinate, wherein a color of each pixel is represented by human visual characteristics in the processing target digital color image. Conversion to other color coordinates with high independence between components is performed, and image texture features for each component of the other color coordinates are extracted from the converted digital color image, using the extracted image texture features. A method for extracting a text region, wherein each pixel of the digital color image to be processed is classified into a text region or a non-text region.

3. The method according to claim 1, wherein a Gabor filter is used for extracting image texture features.

4. A means for inputting a digital color image represented by a certain color coordinate of a color of each pixel, and means for extracting an image texture feature for each component of the color coordinate from the digital color image input by this means. Means for classifying each pixel of the input digital color image into a text area or a non-text area using the image texture features extracted by this means, and connected components of the pixels classified into the text area by this means Region extracting device, comprising: means for extracting a character as a text region.

5. A means for inputting a digital color image represented by a certain color coordinate of a color of each pixel, wherein the input digital color image has high independence between components reflecting human visual characteristics. Means for performing conversion to other color coordinates, means for extracting image texture features for each component of the other color coordinates from the digital color image converted by this means, and using the image texture features extracted by this means. Means for classifying each pixel of the input digital color image into a text area or a non-text area,
And a text region extracting apparatus comprising: a unit that extracts, as a text region, a connected component of pixels classified into a text region by the unit.

6. The text region extracting apparatus according to claim 4, wherein a Gabor filter is used for extracting image texture features.

7. A step of inputting a digital color image represented by a certain color coordinate for each pixel color, and extracting an image texture feature for each component of the color coordinate from the input digital color image in this step. Classifying each pixel of the input digital color image into a text area or a non-text area using the image texture features extracted by this step; and
A computer-readable recording medium on which a program for causing a computer to execute a step of extracting, as a text area, a connected component of pixels classified into a text area by this step is recorded.

8. A step of inputting a digital color image represented by a certain color coordinate for each pixel color, wherein in the input digital color image, independence between components reflecting human visual characteristics is obtained. Performing a conversion to other high color coordinates, extracting image texture features for each component of the other color coordinates from the digital color image after the conversion in this step, extracting the image texture features extracted in this step. Using a computer to perform a step of classifying each pixel of the input digital color image into a text area or a non-text area by using the computer and a step of extracting a connected component of the pixels classified into the text area from the step as a text area. A computer-readable recording medium on which a program for causing a computer to execute is recorded.