JPH08202819A

JPH08202819A - Underline extraction method

Info

Publication number: JPH08202819A
Application number: JP7010469A
Authority: JP
Inventors: Teruo Akiyama; 照雄秋山
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1995-01-26
Filing date: 1995-01-26
Publication date: 1996-08-09
Anticipated expiration: 2016-03-19
Also published as: JP3147281B2

Abstract

PURPOSE: To precisely extract an underline filled in a document picture even if an inclination and blurring exist or it is brought into contact with a character. CONSTITUTION: A document is sampled and the values of respective pixels are quantized to the binary of '1' for black and '0' for white in a document input stage 1. In a pixel substitution stage 2, the black pixels are substituted for white pixels if more than one black pixels are included at every prescribed number of continuous pixels. In a vertical black run extension stage 3, the picture where the black pixels continue in a vertical direction is executed. In a horizontal black run extraction stage 4, a horizontal black run having more than prescribed length is extracted for the picture obtained in the vertical black run extraction stage 3. In a vertical black run degradation stage 5, a degradation processing is executed for the number of times similar to that of an extension processing executed in the vertical black run extension stage 3. In an underline candidate extraction stage 6, an underline candidate is extracted. In a lower contour extraction stage 7, the lower contour of the underline candidate is extracted. In an underline extraction stage 8, the lower contour is scanned from a left end and the underline is extracted.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、文書中の文字列の下に
印刷ないし手書きで記入されたアンダラインを抽出する
方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for extracting underlines printed or handwritten below a character string in a document.

【０００２】[0002]

【従来の技術】従来、文書中に記入されたキーワード等
をＯＣＲで認識する際、その認識すべき文字の位置をＯ
ＣＲに指示するための方法として、その認識すべきキー
ワードの下にアンダラインを引く方法がとられている。
このアンダラインを抽出するための方法として、図１７
に示すように、黒画素のランレングスが一定以上の長さ
を持つ１まとまりの図形を抽出し、アンダラインとして
抽出する方法が考えられる。図１７では水平方向の黒ラ
ンのうち、連続量が一定値を超えるものをまとめて１つ
の図形とし、アンダラインとして抽出する。これによ
り、破線で囲まれた長方形の領域がアンダラインの領域
として抽出される。2. Description of the Related Art Conventionally, when a keyword or the like written in a document is recognized by OCR, the position of the character to be recognized is marked by O.
As a method for instructing the CR, an underline is drawn below the keyword to be recognized.
As a method for extracting this underline, FIG.
As shown in FIG. 3, a method of extracting a group of figures in which the run length of black pixels is a certain length or more and extracting the figure as an underline can be considered. In FIG. 17, among the black runs in the horizontal direction, those whose continuous amount exceeds a certain value are collectively made into one figure and extracted as an underline. As a result, the rectangular area surrounded by the broken line is extracted as the underline area.

【０００３】[0003]

【発明が解決しようとする課題】この方法はワードプロ
セッサなどを用いて出力されたアンダラインや、定規を
用いて水平に記入されたアンダラインに対しては有効で
あったが、手書きで自由に記入されて手振れによる変動
があるアンダライン、直線であっても傾いて入力された
アンダラインに対しては抽出が困難になるという欠点が
あった。また、黒画素のランの途中にかすれなどの白画
素があると抽出ができなくなるという欠点があった。こ
の欠点を補うものとして、図１８に示すように、輪郭の
追跡を行い、輪郭点の外接矩形が細長い図形のものをア
ンダラインとして抽出する方法がある。図１８の方法も
図１７の方法と同様に、破線で囲まれた部分がアンダラ
インとして抽出されるが、この方法も図１９に示すよう
にアンダラインの上に記入された文字がアンダラインに
接触すると輪郭の追跡が文字の部分にまで及んでしま
い、アンダラインの抽出が困難になるという欠点があっ
た。This method was effective for underlines output by using a word processor or the like, and underlines horizontally written by using a ruler, but it is possible to write by handwriting freely. However, there is a drawback in that it is difficult to extract an underline that is subject to fluctuation due to camera shake and an underline that is obliquely input even if it is a straight line. Further, there is a drawback that extraction cannot be performed if there is a white pixel such as a blur in the middle of the black pixel run. As a method of compensating for this drawback, there is a method of tracing an outline and extracting an underline having an elongated circumscribed rectangle of an outline point as shown in FIG. In the method of FIG. 18 as well as in the method of FIG. 17, the portion surrounded by the broken line is extracted as an underline, but in this method as well, the characters written above the underline become the underline as shown in FIG. When touched, the contour tracing extends to the character portion, which makes it difficult to extract the underline.

【０００４】本発明の目的は、文書画像中に記入された
アンダラインを傾きや手振れがあっても、また文字に接
触していても正確に抽出でき、また入力時のかすれや破
線に対してもある程度の対応が可能なアンダライン抽出
方法を提供することにある。An object of the present invention is to accurately extract an underline entered in a document image even if there is an inclination or a hand shake, or even if it is in contact with a character, and for a blur or a broken line at the time of input. It is to provide an underline extraction method that can deal with the problem to some extent.

【０００５】[0005]

【課題を解決するための手段】本発明のアンダライン抽
出方法は、文書を入力し、該入力文書を標本化と量子化
によって２値の数値で表現される文書画像に変換する文
書入力段階と、文書入力段階によって得られた文書画像
を走査し、水平方向に連続した一定数の連続した画素毎
に、該連続画素に所定数以上の黒画素が含まれていた場
合、全ての白画素を黒画素に置換する処理を行なう画素
置換段階と、画素置換段階で得られた文書画像に対し、
垂直方向に黒画素が連続した黒ランの伸長処理を１回以
上行う垂直黒ラン伸長段階と、垂直黒ラン伸長段階で得
られた文書画像から、一定の範囲の長さの水平方向の黒
ランを抽出する水平黒ラン抽出段階と、水平黒ラン抽出
段階で得られた文書画像に対し、垂直方向に黒画素が連
続した黒ランの縮退処理を垂直黒ラン伸長段階で行った
伸長処理と同じ回数だけ行う垂直黒ラン縮退段階と、垂
直黒ラン縮退段階で得られた文書画像と、画素置換段階
で得られた画素置換画像との論理積をとることにより、
アンダラインの候補画像を抽出するアンダライン候補抽
出段階と、アンダライン候補抽出段階で得られたアンダ
ライン候補画像に対し論理演算を施すことによりアンダ
ラインの下輪郭を抽出する下輪郭抽出段階と、下輪郭抽
出段階で得られた下輪郭点のうち、一定の長さ以上連続
した輪郭点列をアンダラインとして抽出するアンダライ
ン抽出段階を有する。An underline extraction method according to the present invention comprises a document input step of inputting a document and converting the input document into a document image represented by a binary number by sampling and quantization. , The document image obtained by the document input step is scanned, and for every fixed number of consecutive pixels continuous in the horizontal direction, if the consecutive pixels include a predetermined number or more of black pixels, all white pixels are For the pixel replacement step of performing the process of replacing with black pixels and the document image obtained in the pixel replacement step,
A vertical black run decompression step in which a black run in which black pixels are continuous in the vertical direction is performed once or more, and a horizontal black run in a certain range length is calculated from a document image obtained in the vertical black run decompression step. The same as the decompression process of the black run in the vertical black run decompression stage in which the black run in which the black pixels are continuous in the vertical direction is applied to the document image obtained in the horizontal black run extraction stage and the horizontal black run extraction stage. By taking the logical product of the vertical black run degeneracy stage performed only the number of times, the document image obtained in the vertical black run degeneracy stage, and the pixel replacement image obtained in the pixel replacement stage,
An underline candidate extraction step of extracting an underline candidate image, and a lower contour extraction step of extracting a lower contour of the underline by performing a logical operation on the underline candidate image obtained in the underline candidate extraction step, Among the lower contour points obtained in the lower contour extraction step, there is an underline extraction step of extracting, as an underline, a contour point sequence that is continuous for a certain length or longer.

【０００６】前記アンダライン下輪郭抽出処理は、前記
アンダラインの候補画像の垂直方向の黒画素群を上方向
に１画素だけシフトし、シフト前の画像とシフト後の画
像の排他的論理和をとった画像を求め、該画像と前記シ
フト後の画像を反転した画像の論理積を求めることによ
り行なわれる。In the underline lower contour extraction processing, the vertical black pixel group of the underline candidate image is shifted by one pixel in the upward direction, and the exclusive OR of the image before shift and the image after shift is calculated. This is performed by obtaining the taken image and obtaining the logical product of the image and the inverted image of the shifted image.

【０００７】本発明の他のアンダライン抽出方法は、前
記アンダライン抽出段階で抽出されたアンダラインの上
方、一定距離以内に存在する別のアンダラインの下輪郭
点を探索することにより、前記文書画像の文字列の下に
記入または印刷された２重アンダラインを抽出する２重
アンダライン抽出過程をさらに有する。According to another underline extraction method of the present invention, by searching for a lower contour point of another underline existing within a certain distance above the underline extracted in the underline extraction step, the document is extracted. The method further includes a double underline extraction step of extracting the double underline written or printed below the character string of the image.

【０００８】[0008]

【作用】２値表現される原文書に対し一定数の連続した
画素毎に、その中に所定数以上の黒画素が含まれていた
場合、全ての白画素を黒画素に置換し、連続した黒画素
に置き換えた画素置換画像を作成する。これによりアン
ダラインとして破線が用いられたりかすれていても安定
にアンダラインを抽出できる。次に、一定の長さの水平
方向の黒画素のランを、アンダラインが傾いていた場合
でも安定に抽出するために、垂直方向の黒画素のランの
伸長を行った上で、一定の長さの水平方向の黒画素のラ
ンを抽出し、さらに垂直方向の黒画素のランの縮退処理
を行う。次に、縮退処理を行った画像と画素置換画像と
の論理積をとることによりアンダラインの候補領域を抽
出し、文書画像のシフトと論理演算の組み合わせによっ
てアンダライン候補領域の下輪郭点を抽出した後、下輪
郭点の追跡処理を行ってアンダラインの抽出を行う。When a predetermined number or more of black pixels are included in every predetermined number of continuous pixels in the original document expressed in binary, all white pixels are replaced with black pixels and continuous. A pixel replacement image in which black pixels are replaced is created. As a result, the underline can be stably extracted even if a broken line is used as the underline or is faint. Next, in order to stably extract a horizontal black pixel run of a fixed length even if the underline is tilted, the vertical black pixel run is expanded and then fixed length The horizontal black pixel run is extracted, and the vertical black pixel run is degenerated. Next, the candidate area of the underline is extracted by taking the logical product of the image subjected to the degeneracy processing and the pixel replacement image, and the lower contour point of the underline candidate area is extracted by the combination of the shift of the document image and the logical operation. After that, the lower contour points are traced to extract the underline.

【０００９】したがって、文書画像中に記入されたアン
ダラインに傾きや手振れがあっても、また文字に接触し
ていても正確にアンダラインの抽出を行うことができ
る。Therefore, it is possible to accurately extract the underline even if the underline entered in the document image has an inclination or a hand shake or is in contact with a character.

【００１０】なお、本発明は、文書中に記入された手書
きのアンダラインのみならず、文書に印刷されたアンダ
ライン、罫線も全く同様の方法で抽出することができ
る。また、水平方向の罫線だけでなく垂直方向の罫線も
同様の方法で抽出可能である。According to the present invention, not only handwritten underlines written in a document but also underlines and ruled lines printed in the document can be extracted by the same method. Further, not only horizontal ruled lines but also vertical ruled lines can be extracted by the same method.

【００１１】[0011]

【実施例】次に、本発明の実施例について図面を参照し
て説明する。Embodiments of the present invention will now be described with reference to the drawings.

【００１２】図１は本発明の一実施例のアンダライン抽
出方法を示すフローチャートである。FIG. 1 is a flowchart showing an underline extraction method according to an embodiment of the present invention.

【００１３】本実施例のアンダライン抽出方法は文書入
力段階１と画素置換段階２と垂直黒ラン伸長段階３と水
平黒ラン抽出段階４と垂直黒ラン縮退段階５とアンダラ
イン候補抽出段階６と下輪郭抽出段階７とアンダライン
抽出段階８で構成される。The underline extraction method of this embodiment includes a document input step 1, a pixel replacement step 2, a vertical black run decompression step 3, a horizontal black run extraction step 4, a vertical black run degeneration step 5, and an underline candidate extraction step 6. It is composed of a lower contour extraction stage 7 and an underline extraction stage 8.

【００１４】文書入力段階１では、ファクシミリ、スキ
ャナなどを用いて文書を標本化し、さらに各画素の値を
黒は１、白は０の２値に量子化を行い、原画像を作成す
る。画素置換段階２では、作成された原画像の水平方向
に連続した一定数の画素毎に、その中に１個以上黒画素
が含まれていれば、全ての白画素を黒画素に置換する。
垂直黒ラン伸長段階３では、水平方向の黒ランを、アン
ダラインが傾いていた場合でも安定に抽出するために、
入力された画像（黒画素）を垂直下方向へ１画素だけ太
め処理（伸長処理）を行う。水平黒ラン抽出段階４で
は、垂直黒ラン伸張段階３で得られた画像に対して一定
以上の長さをもつ水平黒ランを抽出する。垂直黒ラン縮
退段階５では、伸長した垂直方向の黒ランを元の状態に
戻すためのもので、垂直黒ラン伸長段階３において行わ
れた伸長処理と同じ回数だけ縮退処理を行う。縮退処理
は、画像全体を垂直黒ラン伸張段階３で行った伸長処理
と反対の方向、すなわち上方向に１画素シフトして、シ
フト前の画像との論理積をとることにより実現する。さ
らに縮退処理を行う場合にはこの処理を反復すればよ
い。アンダライン候補抽出段階６では、垂直黒ラン縮退
段階５で得られた画像と画素置換段階２で得られた画像
の論理積をとることによってアンダライン候補を抽出す
る。論理積の演算を行うのは、アンダラインが罫線に近
接して存在する場合に垂直黒ラン伸長段階５においてア
ンダラインが罫線と接触してしまい、垂直黒ラン縮退段
階５を経てもアンダラインが罫線と近接している場合、
罫線とアンダラインが分離しない場合があるからであ
る。下輪郭抽出段階７では、アンダライン候補抽出段階
６で得られたアンダライン候補の下輪郭を抽出する。ア
ンダライン抽出段階８では、下輪郭抽出段階７で得られ
た画素列を左端から走査し、一定の長さ以上のものを抽
出し、その下輪郭点全体を含む図形全体をアンダライン
として抽出する。In the document input step 1, a document is sampled by using a facsimile, a scanner or the like, and the value of each pixel is quantized into a binary value of 1 for black and 0 for white to create an original image. In the pixel replacement step 2, for every fixed number of pixels continuous in the horizontal direction of the created original image, if at least one black pixel is included therein, all white pixels are replaced with black pixels.
In the vertical black run extension stage 3, in order to stably extract the horizontal black run even if the underline is inclined,
The input image (black pixel) is thickened (extended) by one pixel vertically downward. In the horizontal black run extraction step 4, a horizontal black run having a length equal to or more than a certain length is extracted from the image obtained in the vertical black run extension step 3. In the vertical black run degeneracy stage 5, the decompression process is performed in the same number as the decompression process performed in the vertical black run decompression stage 3 in order to restore the decompressed vertical black run to the original state. The degeneration process is realized by shifting the entire image by one pixel in the direction opposite to the decompression process performed in the vertical black run decompression stage 3, that is, in the upward direction, and taking the logical product with the image before the shift. When the degeneracy process is further performed, this process may be repeated. In the underline candidate extraction step 6, the underline candidates are extracted by taking the logical product of the image obtained in the vertical black run degeneracy step 5 and the image obtained in the pixel replacement step 2. The logical product is calculated because the underline comes into contact with the ruled line in the vertical black run decompression stage 5 when the underline exists close to the ruled line, and the underline remains even after the vertical black run degeneracy stage 5. If it is close to the ruled line,
This is because the ruled line and the underline may not be separated. In the lower contour extraction step 7, the lower contour of the underline candidates obtained in the underline candidate extraction step 6 is extracted. In the underline extraction step 8, the pixel array obtained in the lower contour extraction step 7 is scanned from the left end, and those having a certain length or more are extracted, and the entire figure including the entire lower contour points is extracted as an underline. .

【００１５】次に、本実施例のアンダライン抽出方法の
具体的動作を図２〜図１５により説明する。Next, the specific operation of the underline extraction method of this embodiment will be described with reference to FIGS.

【００１６】図２は、文書入力段階１で２値化された文
書画像の例を示している。アンダラインは傾いて入力さ
れており、しかもかすれによる白画素が含まれている。
これに対し、画素置換段階２では、例えば図２の△を左
端とする水平方向の連続した４画素が１つでも黒画素を
含めば、この画素群の白画素を全て黒画素に置き換え
る。これにより図２に示した原画像は図３のようにな
る。これにより、アンダラインとして破線が用いられて
いたり、かすれていてもアンダラインを安定に抽出でき
る。次の垂直黒ラン伸長段階３では、垂直方向の各黒画
素群を垂直下方向に１画素伸ばす黒ラン伸長処理を行な
う。例えば図３の矢印で示す垂直方向の黒画素群（図４
（ａ））を１画素下にシフトして（図４（ｂ））、シフ
ト前の画像（図４（ａ））と論理和をとる（図４
（ｃ））ことにより、垂直黒ラン伸長処理を実現する。
図５は垂直黒ラン伸長処理後の画像を示している。さら
に伸長処理を行う場合にはこの処理を反復すればよい。
水平黒ラン抽出段階４では、垂直黒ラン伸長処理で得ら
れた画像に対して例えば長さ１０以上の水平黒ランを抽
出する（図６）。なお、図６では図３で示された水平黒
ランを波下線で示している。一方、図３に示した画像に
対し、２回の垂直黒ラン伸長処理を施し、同様に長さ１
０以上の黒ランを抽出したものが図７である。図６と比
べ安定に水平黒ランが抽出されているのがわかる。ここ
で示した例では、元の画像に対し、下方向のみに伸長処
理を行ったが、上方向、あるいは上下両方向に伸長処理
を行ってもよい。また、１画素の伸張処理を複数回行な
ったが、１回の処理で２画素以上の伸張処理を行なって
もよい。垂直黒ラン縮退段階５では、水平黒ラン抽出段
階４で得られた画像の垂直方向の各黒画素群の一番下の
黒画素を白画素に変更する。図８は垂直黒ラン縮退段階
５の処理の様子を示している。すなわち、垂直方向の画
素群（図８（ａ））を１画素だけ上方向にシフトし（図
８（ｂ））、これと元の画素群（図８（ａ））の論理積
をとる（図８（ｃ））。図９は水平黒ラン抽出段階４で
抽出した水平黒ラン（図７）に対し２回の垂直黒ラン縮
退処理を施した結果を示している。伸長・縮退処理を行
うことより、図３における長さ１０未満の水平黒ランも
抽出されていることがわかる。アンダライン候補抽出段
階６では、垂直黒ラン縮退段階５で得られた画像と画素
置換段階２で得られた画像の論理積をとる。これはアン
ダラインが罫線等の図形と近接している場合に、縮退処
理によってもアンダラインが罫線等の図形と分離しない
場合が起きるからである。具体的に図を用いて説明す
る。図１０は図３に示したアンダラインに近接して罫線
が存在している場合の例である。このアンダラインに対
し、垂直黒ランの伸長を行ったものが図１１、縮退処理
を行ったものが図１２である。このようにアンダライン
と罫線が接触してしまい１つの図形となってしまう。こ
のような場合でも図１０の画像と図１２の画像の論理積
をとることにより、ランレングスによる罫線の安定抽出
を行いつつ伸長縮退処理によって接触した部分を図１０
のように復元にすることが可能になる。次に、下輪郭抽
出段階７で、アンダライン候補抽出段階６で得られたア
ンダライン候補の下輪郭を抽出する。下輪郭抽出の原理
を図１３に示す。図１３（ａ）は図９の△の垂直方向の
画像を示している。まず、図１３（ａ）の画像を１画素
上方向にシフトした画像（図１３（ｂ））を作成し、元
の画像（図１３（ａ））との排他的論理和をとった画像
（図１３（ｃ））を求める。この論理演算によって、画
像（図１３（ａ））の下から３番目と６番目の画素が黒
画素の画像が得られる。この画像（図１３（ｃ））と、
画像（図１３（ｂ））を反転した画像（図１３（ｄ））
との論理積を求めることにより元の画像（図１３
（ａ））の下から３番目の画素のみが黒画素の画像（図
１３（ｅ））を得ることができる。このように元の画像
に対して、アンダラインの下輪郭点を画像同士の論理演
算のみで求めることができる。このように、アンダライ
ン候補抽出段階６で得られた画像に対し、論理演算を組
合せて適用することにより、図１４に◎印で示すアンダ
ラインの下輪郭点が抽出される。最後のアンダライン抽
出段階８では、図１５に示すように、まず、下輪郭点抽
出段階７で得られた輪郭点の左端（▲）を検出する。次
に、右方向（矢印方向）に下輪郭点を追跡する。右方向
に連続する下輪郭点が存在しない場合には段を変えて下
輪郭点を追跡する。図１５では追跡する段を１つ上に上
げて下輪郭点を追跡している。なお、画像によっては下
輪郭点が２段以上跳躍することがある。この場合には一
定の値を設け、その値の範囲内で段差をもつ下輪郭点を
連続して追跡すればよい。ここで、あまり大きな値を設
定すると、本来、別のものであるアンダラインを同一の
アンダラインとして抽出してしまうことになる。追跡が
完了した時点で、下輪郭点の連続画素数を計数し、一定
の長さ以上のものについて、その下輪郭点全体を含む図
形全体をアンダラインとして抽出する。FIG. 2 shows an example of a document image binarized in the document input stage 1. The underline is input with a tilt, and includes white pixels due to blurring.
On the other hand, in the pixel replacement step 2, if at least one of the four consecutive horizontal pixels with Δ in FIG. 2 as the left end includes a black pixel, all white pixels in this pixel group are replaced with black pixels. As a result, the original image shown in FIG. 2 becomes as shown in FIG. As a result, the underline can be stably extracted even if a broken line is used as the underline or the line is faint. In the next vertical black run decompression stage 3, a black run decompression process is performed in which each black pixel group in the vertical direction is expanded vertically by one pixel. For example, a vertical black pixel group indicated by an arrow in FIG.
(A)) is shifted down by one pixel (FIG. 4 (b)), and is logically ORed with the unshifted image (FIG. 4 (a)) (FIG. 4).
By performing (c)), the vertical black run extension processing is realized.
FIG. 5 shows an image after the vertical black run expansion processing. When further decompression processing is performed, this processing may be repeated.
In the horizontal black run extraction step 4, a horizontal black run having a length of 10 or more is extracted from the image obtained by the vertical black run expansion processing (FIG. 6). In FIG. 6, the horizontal black run shown in FIG. 3 is indicated by a wave underline. On the other hand, the image shown in FIG.
FIG. 7 shows the extracted black runs of 0 or more. It can be seen that horizontal black runs are extracted more stably than in FIG. In the example shown here, the expansion process is performed only on the original image in the downward direction, but the expansion process may be performed in the upward direction or both directions. Further, although the expansion processing for one pixel is performed a plurality of times, the expansion processing for two or more pixels may be performed once. In the vertical black run degeneracy step 5, the bottom black pixel of each black pixel group in the vertical direction of the image obtained in the horizontal black run extraction step 4 is changed to a white pixel. FIG. 8 shows how the vertical black run degeneracy step 5 is performed. That is, the vertical pixel group (FIG. 8A) is shifted upward by one pixel (FIG. 8B), and the logical product of this and the original pixel group (FIG. 8A) is obtained ( FIG. 8 (c)). FIG. 9 shows the result of performing the vertical black run degeneracy process twice on the horizontal black run (FIG. 7) extracted in the horizontal black run extraction step 4. By performing the decompression / degeneration process, it can be seen that the horizontal black run having a length less than 10 in FIG. 3 is also extracted. In the underline candidate extraction step 6, the logical product of the image obtained in the vertical black run degeneracy step 5 and the image obtained in the pixel replacement step 2 is calculated. This is because when the underline is close to a figure such as a ruled line, the underline may not be separated from the figure such as a ruled line even by the degeneracy process. This will be specifically described with reference to the drawings. FIG. 10 shows an example in which a ruled line exists near the underline shown in FIG. FIG. 11 is a diagram in which the vertical black run is extended to this underline, and FIG. 12 is a diagram in which the degeneracy process is performed. In this way, the underline and the ruled line come into contact with each other, resulting in one figure. Even in such a case, the logical product of the image of FIG. 10 and the image of FIG. 12 is taken to stably extract the ruled line by the run length, and the portion contacted by the expansion / reduction process is shown in FIG.
It will be possible to restore like. Next, in the lower contour extraction step 7, the lower contours of the underline candidates obtained in the underline candidate extraction step 6 are extracted. The principle of lower contour extraction is shown in FIG. FIG. 13A shows an image in the vertical direction of Δ in FIG. First, an image (FIG. 13 (b)) obtained by shifting the image of FIG. 13 (a) upward by one pixel is created, and an image obtained by exclusive ORing with the original image (FIG. 13 (a)) ( FIG. 13C) is obtained. By this logical operation, an image in which the third and sixth pixels from the bottom of the image (FIG. 13A) are black pixels is obtained. This image (Fig. 13 (c)),
Image (Fig. 13 (d)) that is the reverse of the image (Fig. 13 (b))
The original image (Fig.
An image (FIG. 13E) in which only the third pixel from the bottom of (a) is a black pixel can be obtained. In this way, the lower contour point of the underline can be obtained from the original image only by a logical operation between the images. As described above, by applying the logical operation in combination to the image obtained in the underline candidate extraction step 6, the lower contour points of the underline indicated by a double circle in FIG. 14 are extracted. In the final underline extraction step 8, as shown in FIG. 15, first, the left edge (A) of the contour point obtained in the lower contour point extraction step 7 is detected. Next, the lower contour point is traced in the right direction (arrow direction). If there is no continuous lower contour point in the right direction, the lower contour point is traced by changing the step. In FIG. 15, the tracking step is moved up by one to track the lower contour point. Depending on the image, the lower contour point may jump two or more steps. In this case, a constant value may be provided, and lower contour points having steps within the range of the value may be continuously traced. Here, if a too large value is set, another underline that is originally different will be extracted as the same underline. When the tracking is completed, the number of continuous pixels of the lower contour point is counted, and for the objects having a certain length or more, the entire figure including the entire lower contour point is extracted as an underline.

【００１７】なお、文書中に直線状の図形、例えば罫線
などが含まれ、これをアンダラインと区別するときに
は、アンダラインを２重線化すればよい。図１６は２重
線化されたアンダラインを示している。図１６において
抽出されている下輪郭点◎の上方一定距離以内に、別の
アンダラインの下輪郭点◆があればこれをまとめて２重
アンダラインとして抽出することにより、文書中の通常
の罫線と区別することが可能である。なお、図１６に示
すように、２重に引かれたアンダラインの上の線が下の
線の幅より広い場合には、下の線からはみ出した部分、
すなわち図１６に矢印で示したように両端に向けて下輪
郭画素の探索を行えばよい。It should be noted that when a document includes a linear figure, such as a ruled line, and this is distinguished from an underline, the underline may be doubled. FIG. 16 shows a double-lined underline. If there is another lower contour point ◆ of another underline within a certain distance above the lower contour point ⊚ extracted in FIG. 16, this is extracted together as a double underline so that a normal ruled line in the document is obtained. It is possible to distinguish In addition, as shown in FIG. 16, when the upper line of the underlined doubly drawn is wider than the width of the lower line, the portion protruding from the lower line,
That is, the lower contour pixel may be searched toward both ends as indicated by the arrow in FIG.

【００１８】以上、スキャナやファクシミリ等から入力
された２値の文書画像において、文字などの下に手書き
で記入されたアンダラインの抽出方法を例にとって本発
明を説明したが、本発明は手書きのものに限定されるこ
となく、文書に印刷されたアンダライン、罫線も全く同
様の方法で抽出することができることは明らかである。
さらに、文字列の下に記入ないし印刷されたアンダライ
ンだけでなく、水平方向の罫線や、垂直方向の罫線な
ど、直線状のものであれば同様の方法で抽出できること
も明らかである。The present invention has been described above by taking the extraction method of an underline written by handwriting under a character or the like in a binary document image input from a scanner, a facsimile or the like as an example. It is obvious that the underlines and ruled lines printed on the document are not limited to the above, and can be extracted in exactly the same manner.
Further, it is obvious that not only the underline written or printed below the character string, but also a straight line such as a horizontal ruled line or a vertical ruled line can be extracted by the same method.

【００１９】[0019]

【発明の効果】以上説明したように、本発明は下記のよ
うな効果がある。（１）請求項１の発明は、文書画像中に記入されたアン
ダラインに傾きや手振れがあっても、また文字に接触し
ていても正確にアンダラインの抽出を行うことができ
る。また、入力時のかすれや破線に対してもある程度の
対応が可能である。（２）請求項３の発明は、２重のアンダラインを抽出で
きる。As described above, the present invention has the following effects. (1) According to the first aspect of the invention, the underline can be accurately extracted even if the underline entered in the document image has an inclination or camera shake or is in contact with a character. In addition, it is possible to deal with blur and broken lines at the time of input to some extent. (2) The invention of claim 3 can extract double underlines.

[Brief description of drawings]

【図１】本発明の一実施例のアンダライン抽出方法を示
すフローチャートである。FIG. 1 is a flowchart showing an underline extraction method according to an embodiment of the present invention.

【図２】２値化された文書画像の例を示す図である。FIG. 2 is a diagram showing an example of a binarized document image.

【図３】画素置換後の文書画像を示す図である。FIG. 3 is a diagram showing a document image after pixel replacement.

【図４】垂直黒ラン伸張処理を示す図である。FIG. 4 is a diagram showing vertical black run extension processing.

【図５】垂直黒ラン伸長処理後の画像を示す図である。FIG. 5 is a diagram showing an image after vertical black run expansion processing.

【図６】長さ１０以上の黒ランを抽出した例を示す図で
ある。FIG. 6 is a diagram showing an example in which a black run having a length of 10 or more is extracted.

【図７】２回の垂直黒ラン伸長処理を施し、長さ１０以
上の黒ランを抽出した例を示す図である。FIG. 7 is a diagram showing an example in which a black run having a length of 10 or more is extracted by performing vertical black run extension processing twice.

【図８】垂直黒ライン伸長処理を示す図である。FIG. 8 is a diagram showing vertical black line expansion processing.

【図９】２回の垂直黒ラン伸長・縮退処理によって抽出
された水平黒ランを示す図である。FIG. 9 is a diagram showing horizontal black runs extracted by two vertical black run expansion / reduction processes.

【図１０】互いに近接したアンダラインと罫線の例を示
す図である。FIG. 10 is a diagram showing an example of underlines and ruled lines that are close to each other.

【図１１】伸張処理を行ったアンダラインと罫線を示す
図である。FIG. 11 is a diagram showing underlines and ruled lines that have undergone decompression processing.

【図１２】縮退処理後の接触したアンダラインと罫線を
示す図である。FIG. 12 is a diagram showing contacted underlines and ruled lines after the shrinking process.

【図１３】下輪郭抽出処理を示す図である。FIG. 13 is a diagram showing a lower contour extraction process.

【図１４】アンダライン候補の抽出例を示す図である。FIG. 14 is a diagram showing an example of extracting underline candidates.

【図１５】アンダライン候補抽出処理を示す図である。FIG. 15 is a diagram showing an underline candidate extraction process.

【図１６】２重アンダラインの抽出例を示す図である。FIG. 16 is a diagram showing an example of extracting a double underline.

【図１７】ランレングスを用いたアンダライン抽出を示
す図である。FIG. 17 is a diagram showing underline extraction using run length.

【図１８】外接矩形を用いたアンダライン抽出を示す図
である。FIG. 18 is a diagram showing underline extraction using a circumscribing rectangle.

【図１９】外接矩形を用いたアンダライン抽出が困難な
例を示す図である。FIG. 19 is a diagram showing an example in which underline extraction using a circumscribed rectangle is difficult.

[Explanation of symbols]

１文書入力段階２画素置換段階３垂直黒ラン伸長段階４水平黒ラン抽出段階５垂直黒ラン縮退段階６アンダライン候補抽出段階７下輪郭抽出段階８アンダライン抽出段階 1 Document Input Stage 2 Pixel Replacement Stage 3 Vertical Black Run Expansion Stage 4 Horizontal Black Run Extraction Stage 5 Vertical Black Run Degeneration Stage 6 Underline Candidate Extraction Stage 7 Lower Contour Extraction Stage 8 Underline Extraction Stage

Claims

[Claims]

1. An underline extraction method for extracting an underline printed or handwritten below a character string in a document, the method comprising inputting a document, and sampling and quantizing the input document.
A document input step of converting into a document image represented by a numerical value, and scanning the document image obtained by the document input step,
A pixel replacement step of performing a process of replacing all white pixels with black pixels when a predetermined number or more of black pixels are included in each of a fixed number of consecutive pixels continuous in the horizontal direction; From the document image obtained in the vertical black run decompression stage, a vertical black run decompression stage in which a black run in which black pixels are continuous in the vertical direction is decompressed once or more for the document image obtained in the pixel replacement stage; A horizontal black run extraction step of extracting a horizontal black run of a certain range of length, and degeneracy of a black run in which vertical black pixels are continuous with respect to the document image obtained in the horizontal black run extraction step. A vertical black run degeneracy step in which processing is performed the same number of times as the decompression processing performed in the vertical black run decompression step; a document image obtained in the vertical black run degeneracy step; and a pixel replacement image obtained in the pixel replacement step. By taking the logical product of An underline candidate extraction step of extracting an underline candidate image; a lower contour extraction step of extracting a lower contour of the underline by performing a logical operation on the underline candidate image obtained in the underline candidate extraction step; An underline extraction step of extracting, as an underline, a series of contour points having a predetermined length or more among the lower contour points obtained in the lower contour extraction step.

2. The underline lower contour extraction processing shifts the vertical black pixel group of the candidate image of the underline in the upward direction by one pixel, and the exclusive logic of the image before the shift and the image after the shift is used. 2. The underline extraction method according to claim 1, which is performed by obtaining a summed image and obtaining a logical product of the image and an image obtained by inverting the shifted image.

3. Searching for a lower contour point of another underline existing within a certain distance above the underline extracted in the underline extraction step, thereby writing under the character string of the document image or The underline extraction method according to claim 1, further comprising a double underline extraction step of extracting the printed double underline.