JP2003317107A

JP2003317107A - Method and device for ruled-line detection

Info

Publication number: JP2003317107A
Application number: JP2002125378A
Authority: JP
Inventors: Atsuko Obara; 敦子小原; Katsuto Fujimoto; 克仁藤本; Satoshi Naoi; 聡直井
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2002-04-26
Filing date: 2002-04-26
Publication date: 2003-11-07

Abstract

<P>PROBLEM TO BE SOLVED: To provide a ruled-line extracting method for correctly extracting ruled-lines from a read-in image and reproducing them in document data. <P>SOLUTION: When a binary image is inputted, a reduced image is generated by an OR thinning-out process, connection components of black pixels are extracted through a labeling process, and a mask process for extracting straight- line components is carried out. Then segment extraction is carried out and cells are extracted; and ruled lines constituting nesting are determined and a double line is decided. Lastly, oblique lines are extracted. <P>COPYRIGHT: (C)2004,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、画像から罫線など
を正確に抽出する方法及び装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and apparatus for accurately extracting ruled lines and the like from an image.

【０００２】[0002]

【従来の技術】近年、スキャナ等の入力機器の普及に伴
って、画像データを取り扱う機会が増加しており、それ
に伴って、文書を読み取るＯＣＲ（Optical Character
Reader：文字認識装置またはソフト）や、文書復元の
需要が増加している。2. Description of the Related Art In recent years, with the spread of input devices such as scanners, the opportunities for handling image data have increased, and along with this, OCR (Optical Character) for reading a document.
Reader: Character recognition device or software) and document restoration are increasing in demand.

【０００３】二重線が文書画像中にある場合には、２本
の罫線が接近して部分的につながっている場合や、離れ
ている場合があり、従来の罫線抽出方式では、二重線の
抽出としては行っておらず、実線として抽出していたた
め、１本の罫線として抽出される場合や、２本の罫線と
して抽出される場合があり、動作が不安定であった。When a double line is present in a document image, two ruled lines may be close to each other or partially connected, or may be separated from each other. However, the operation is unstable because it may be extracted as one ruled line or may be extracted as two ruled lines.

【０００４】多重に入れ子が重なった構造になっている
場合には、予め構造が分からないため何重の入れ子状態
となっているかが分からない。そこで、入れ子部分のセ
ルを抽出する再帰処理の回数を無制限に行うと、セル内
の文字からの罫線の誤抽出があった場合には過度に入れ
子構造を抽出してしまう場合がある。また、再帰処理の
回数を固定してしまうと構造によって処理が不可能とな
ってしまう。In the case of a structure in which multiple nests are superposed, since the structure is not known in advance, it is not possible to know how many nested states are present. Therefore, if the number of times of recursive processing for extracting the cells of the nested portion is performed indefinitely, if the ruled lines are erroneously extracted from the characters in the cells, the nested structure may be excessively extracted. Further, if the number of times of recursive processing is fixed, processing becomes impossible due to the structure.

【０００５】文書画像に含まれる表中のセルの頂点間に
斜め線が引かれている場合には、セルの頂点と頂点を結
んだ線上に斜め線が存在すると見なして、頂点間を結ん
だ直線上にある黒画素が、頂点間を結んだ直線の長さの
一定割合以上である場合には、斜め線があると判断す
る。しかし、斜め線は正確に頂点間を結ぶとは限らず、
頂点よりずれている場合も多く、また、紙の歪み等によ
り線が一部歪んでいる場合がある場合には、セルの頂点
を結んだ線と斜め線が重ならなくなり、抽出が失敗して
しまう。When diagonal lines are drawn between the vertices of cells in the table included in the document image, it is considered that diagonal lines exist on the line connecting the vertices of the cells, and the vertices are connected. When the black pixels on the straight line are equal to or more than a certain ratio of the length of the straight line connecting the vertices, it is determined that there is a diagonal line. However, diagonal lines do not always connect the vertices exactly,
In many cases, the lines are misaligned from the vertices, and if the lines are partially distorted due to paper distortion, etc., the line connecting the cell vertices and the diagonal line do not overlap, and extraction fails. I will end up.

【０００６】[0006]

【発明が解決しようとする課題】以上述べたように、従
来の技術においては、表などの罫線を含む文書を読み込
み、認識する際に、罫線を正しく認識し、文書データに
正しく再現することが困難であった。As described above, in the conventional technique, when a document including ruled lines such as a table is read and recognized, the ruled lines are correctly recognized and correctly reproduced in the document data. It was difficult.

【０００７】本発明の課題は、読み込まれた画像から罫
線を正しく抽出し、文書データに再現することのできる
罫線抽出方法を提供することである。An object of the present invention is to provide a ruled line extracting method which can correctly extract ruled lines from a read image and reproduce them in document data.

【０００８】[0008]

【課題を解決するための手段】本発明の罫線抽出方法
は、入力画像の縮小画像から罫線の位置を推定する罫線
位置推定ステップと、入力画像から抽出された罫線で四
辺を囲まれた領域を抽出するステップと、該領域が入れ
子構造になっている場合に、該領域の抽出を再帰的処理
によって抽出するステップと、該再帰的処理を、該領域
の内部あるいは周辺にある文字の大きさよりも、抽出さ
れる領域の方が小さくなった場合に、再帰的処理を終了
するステップと、高解像度の入力画像を用いて、推定さ
れた罫線位置における罫線の画素密度を罫線方向に投影
し、該画素密度のピークの数により、罫線が何本の線か
らなっているかを判断する罫線判断ステップと、入力画
像から抽出された罫線で四辺を囲まれた領域の向かい合
う頂点あるいは該頂点の周囲に斜め方向成分が存在する
か否かを判断するステップと、該向かい合う頂点あるい
はその周囲にある該斜め方向成分が挟む領域に斜め線が
存在するか否かを判断するステップとを備えることを特
徴とする。A ruled line extracting method of the present invention comprises a ruled line position estimating step of estimating a position of a ruled line from a reduced image of an input image, and a ruled line extracted from the input image to determine an area surrounded by four sides. The step of extracting and the step of extracting the extraction of the area by recursive processing when the area has a nested structure, and the step of extracting the recursive processing more than the size of characters inside or around the area. , When the extracted area becomes smaller, the step of ending the recursive processing, and using the high-resolution input image, project the pixel density of the ruled line at the estimated ruled line position in the ruled line direction, A ruled line determination step of determining how many lines the ruled line is composed of, based on the number of peaks of the pixel density, and a vertex or an apex facing each other in a region surrounded by four sides by the ruled lines extracted from the input image. A step of determining whether or not there is a diagonal component in the periphery of the, and a step of determining whether or not a diagonal line exists in the facing vertex or a region sandwiched by the diagonal component in the periphery thereof. Is characterized by.

【０００９】本発明の罫線抽出装置は、入力画像の縮小
画像から罫線の位置を推定する罫線位置推定手段と、入
力画像から抽出された罫線で四辺を囲まれた領域を抽出
する手段と、該領域が入れ子構造になっている場合に、
該領域の抽出を再帰的処理によって抽出する手段と、該
再帰的処理を、該領域の内部あるいは周辺にある文字の
大きさよりも、抽出される領域の方が小さくなった場合
に、再帰的処理を終了する手段と、高解像度の入力画像
を用いて、推定された罫線位置における罫線の画素密度
を罫線方向に投影し、該画素密度のピークの数により、
罫線が何本の線からなっているかを判断する罫線判断手
段と、入力画像から抽出された罫線で四辺を囲まれた領
域の向かい合う頂点あるいは該頂点の周囲に斜め方向成
分が存在するか否かを判断する手段と、該向かい合う頂
点あるいはその周囲にある該斜め方向成分が挟む領域に
斜め線が存在するか否かを判断する手段とを備えること
を特徴とする。The ruled line extracting apparatus of the present invention comprises ruled line position estimating means for estimating the position of a ruled line from a reduced image of an input image, means for extracting a region surrounded by four sides by the ruled line extracted from the input image, and If the area has a nested structure,
Means for extracting the area by recursive processing, and the recursive processing when the area to be extracted is smaller than the size of characters inside or around the area. And a high-resolution input image to project the pixel density of the ruled line at the estimated ruled line position in the ruled line direction, and by the number of peaks of the pixel density,
A ruled line judging means for judging how many lines the ruled line consists of, and whether or not there are diagonal vertices at the opposite vertices of the area surrounded by the ruled lines extracted from the input image on the four sides or around the vertices. And a means for determining whether or not there is a diagonal line in the area between the facing vertices or the diagonal component in the periphery of the facing vertex.

【００１０】本発明によれば、スキャナなどにより取り
込まれた文書画像などに罫線が含まれていても、罫線を
抽出判断することにより、文字認識と併用することによ
って、取り込まれた画像データをワープロソフトなどで
編集可能な罫線を含む文書データとすることができる。
従って、読み取られた画像を情報処理装置上で編集する
ことなどが可能になり、帳票などを読み取った場合など
の作業効率が向上する。According to the present invention, even if a document image or the like captured by a scanner or the like includes ruled lines, the ruled lines are extracted and judged to be used in combination with character recognition, so that the captured image data is processed by a word processor. The document data can include ruled lines that can be edited with software.
Therefore, it is possible to edit the read image on the information processing apparatus, and the work efficiency when reading a form or the like is improved.

【００１１】[0011]

【発明の実施の形態】二重線に関しては、まず、入力画
像を画像がつぶれ気味になるように縮小し、この縮小画
像を用いて罫線候補の抽出を行う。縮小画像を用いるこ
とで、掠れがある場合でも罫線を抽出しやすくする。こ
の罫線抽出処理は、罫線候補の領域が抽出されれば、い
かなる方式であってもかまわない。次に、抽出された罫
線候補の領域の入力画像における位置を算出し、入力画
像を用いて罫線候補とされた領域が二重線であるか否か
を判断する。一本の罫線候補を細かく分割した小領域毎
に判定を行う。小領域内を罫線方向に黒画素投影処理を
行い、そのピーク値が２つであった場合には、その小領
域は二重線であると判断する。ピーク値の判定には、あ
る一定値ａ、ｂ（ａ≧ｂ）を用い、投影した結果、黒画
素数がａ以上の領域（線の存在する位置）の間に黒画素
数がｂ以下の領域（背景領域）が存在した場合には、二
本の線が存在するとして、この領域は二重線であるとす
る。ある罫線候補中ある一定以上の割合が二重線である
とされた場合には、その罫線候補は二重線であると判定
する。DESCRIPTION OF THE PREFERRED EMBODIMENTS Regarding the double line, first, the input image is reduced so that the image tends to be crushed, and the ruled line candidates are extracted using this reduced image. By using the reduced image, it is easy to extract the ruled line even if there is blurring. This ruled line extraction processing may be performed by any method as long as a ruled line candidate area is extracted. Next, the position of the area of the extracted ruled line candidate in the input image is calculated, and it is determined using the input image whether or not the area of the ruled line candidate is a double line. A determination is made for each small area obtained by finely dividing one ruled line candidate. Black pixel projection processing is performed in the small area in the ruled line direction, and when the peak value is two, it is determined that the small area is a double line. For the determination of the peak value, certain constant values a and b (a ≧ b) are used, and as a result of projection, the number of black pixels is b or less between regions (positions where lines exist) where the number of black pixels is a or more. If there is a region (background region), it is assumed that two lines are present, and this region is a double line. When a certain percentage or more of a certain ruled line candidate is a double line, it is determined that the ruled line candidate is a double line.

【００１２】また、二重線であると判定された場合、二
本の線と線の間が一定以上の間隔が開いている場合に
は、二重線ではなく、２本の別々の罫線であるとする。
一定以上の間隔は、周囲に書かれている文字のサイズを
算出し、そのサイズを元に決定する。If it is determined that the line is a double line, and two lines are spaced apart by a certain distance or more, it is not a double line but two separate ruled lines. Suppose there is.
For a certain interval or more, the size of the characters written in the surroundings is calculated and determined based on the size.

【００１３】多重に入れ子が重なった構造になっている
場合には、予め構造が分からないため、入れ子部分のセ
ルを抽出する再帰処理を停止する判断を行う必要があ
る。その判断は、入れ子の縦横サイズがある一定以下に
なった場合に停止を行うとする。一定サイズとは、周囲
の文字のサイズを判断し、そのサイズを基に決定するこ
とで、文字の書かれていないような小さな入れ子構造を
誤抽出することを防ぐことができる。In the case of a structure in which multiple nests are superposed, since the structure is not known in advance, it is necessary to make a decision to stop the recursive processing for extracting the cells of the nest part. The judgment is to stop when the vertical and horizontal size of the nest becomes smaller than a certain size. By determining the size of surrounding characters and determining the size based on the size, it is possible to prevent erroneous extraction of a small nested structure in which characters are not written.

【００１４】画像の中の表中のセル（周囲を罫線で囲ま
れたボックスのこと）の頂点間に引かれた斜め線を抽出
する場合には、斜め線全体を一本の線として抽出するの
ではなく、斜め方向の小さい成分の集合として抽出を行
う。まず、セルの頂点および頂点付近に斜め方向成分が
あるか否かを判断する。向かい合う頂点または頂点付近
に斜め方向成分が存在した場合には、抽出された斜め方
向成分の間を対象に斜め線があるか否かを判断する。When extracting a diagonal line drawn between the vertices of a cell (a box surrounded by ruled lines) in the table in the image, the entire diagonal line is extracted as one line. The extraction is not performed as a set of small components in the diagonal direction. First, it is determined whether or not there is a diagonal component at and near the vertex of the cell. When there is a diagonal component at or near the apexes facing each other, it is determined whether or not there is a diagonal line between the extracted diagonal components.

【００１５】以下、本発明の実施形態についてより詳細
に説明する。１．入力画像入力する画像は、極端な傾きのない２値画像であるとす
る。画像が傾きを持って入力された場合には、既知の画
像の向きの修正方法によって、ほぼ水平な向きに直され
た後に本実施形態の方法が適用されるものとする。ま
た、カラー画像やグレースケール画像は、既知の方法に
より、２値画像に変換してから以下の処理を行うものと
する。２．連結パターン抽出複数の枠が配置される位置の相対的な関係に依存するこ
となく、各パターンを安定にピックアップするために、
連結パターンの抽出では、８連結でつながっている（上
下左右、右上、右下、左上、左下の方向につながってい
る）黒画素のパターンをラベリングにより抽出する（特
願平７−２０３２５９号参照）。The embodiments of the present invention will be described in more detail below. 1. Input image It is assumed that the image to be input is a binary image having no extreme inclination. When an image is input with an inclination, the method of the present embodiment is applied after the image is corrected to a substantially horizontal direction by a known method for correcting the image direction. Further, the color image and the gray scale image are converted into a binary image by a known method, and then the following processing is performed. 2. Linked pattern extraction In order to stably pick up each pattern without depending on the relative relationship of the positions where a plurality of frames are arranged,
In the extraction of the connection pattern, a pattern of black pixels that are connected by 8 connections (connected in the directions of up, down, left, right, upper right, lower right, upper left, lower left) is extracted by labeling (see Japanese Patent Application No. 7-203259). .

【００１６】図１は、ラベリング処理を概略説明する図
である。同図（ａ）に示されるように、画像が取り込ま
れると、１ライン毎に横に操作し、黒画素の有無を判定
し、存在した黒画素に順次番号などのラベルを付けてい
く。まず、同図（ａ）のライン１を走査した場合、２つ
の独立した黒画素が発見され、それぞれ、“１”と
“２”のラベルが付けられる。次に、ライン２を走査し
た場合、やはり、独立した黒画素が２つ発見されるが、
これらの黒画素は、それぞれ、前のラインで発見し、ラ
ベル“１”と“２”が付けられた黒画素に８連結で接続
しているので、新たなラベルではなく、当該接続してい
る黒画素のラベルが付けられる。すなわち、ライン２の
黒画素もそれぞれ“１”、“２”のラベルが付けられ
る。次に、ライン３を走査すると、１つの黒画素が発見
される従って、これに新しいラベル“３”を与える。し
かし、同図（ｂ）に示されているように、ラベル“３”
の黒画素は、ラベル“１”、“２”の、８連結で接続さ
れた黒画素と接続されているので、ラベル“１”、
“２”の黒画素のラベルを“３”に書き換える。このよ
うにすることにより、同図（ｂ）に示される黒画素は、
８連結で接続された一塊りの黒画素であることが抽出さ
れる。FIG. 1 is a schematic diagram for explaining the labeling process. As shown in (a) of the figure, when an image is captured, it is operated horizontally for each line to determine the presence or absence of black pixels, and the existing black pixels are sequentially labeled with numbers or the like. First, when scanning the line 1 in the same figure (a), two independent black pixels are found and labeled with "1" and "2", respectively. Then, if we scan line 2, we still find two independent black pixels,
Each of these black pixels is found in the previous line and is connected to the black pixels with the labels "1" and "2" by eight connections, so that they are not new labels but are connected. Labeled with black pixels. That is, the black pixels in line 2 are also labeled with "1" and "2", respectively. Next, scanning line 3, one black pixel is found, thus giving it a new label "3". However, as shown in FIG. 7B, the label "3"
The black pixels of are connected to the black pixels of the labels “1” and “2”, which are connected by eight connections, so the label “1”,
The label of the black pixel of "2" is rewritten to "3". By doing so, the black pixel shown in FIG.
It is extracted that it is a block of black pixels connected by eight connections.

【００１７】以上のような処理を取り込んだ画像全体に
対して行う。３．マスク処理マスク処理は画像から極端な斜め成分を省き、枠だけに
存在する長い直線を抽出しやすくするために、縦横方向
の成分の細線化を行う。すなわち、縦方向と横方向の線
分のみを使って、画像に含まれる直線成分を表現するよ
うにする。従って、横方向の長い直線は、横方向の線分
を複数接続した形で表現される。直線成分候補を正確に
抽出するために、処理は１００ｄｐｉ相当の低解像度画
像を用いる（特願平７−２０３２５９号参照）。４．線分抽出マスク処理画像に対して隣接投影値（隣接投影：ある行
または列の投影値に、その周囲の行または列の投影値を
足しあわせ、その値を投影値とする投影法。周囲を大局
的にとらえることが可能である。（特願平５−１０３２
５７号参照））を算出し、縦横方向に関して、直線ある
いは直線の一部を矩形近似して検出する。縦横両方向同
様の処理を行う。本処理においては、マスク処理の結果
画像を使用する。The above processing is performed on the entire image. 3. Mask processing Mask processing thins the components in the vertical and horizontal directions in order to eliminate extreme diagonal components from the image and to facilitate extraction of long straight lines existing only in the frame. That is, only the vertical and horizontal line segments are used to represent the linear component included in the image. Therefore, a long horizontal line is expressed by connecting a plurality of horizontal line segments. In order to accurately extract the linear component candidates, a low resolution image equivalent to 100 dpi is used for the processing (see Japanese Patent Application No. 7-203259). 4. Line segment extraction mask processing Adjacent projection value (adjacent projection: projection value of a certain row or column is added to the projection values of the surrounding rows or columns, and that value is used as the projection value. It is possible to take a broader view (Japanese Patent Application No. 5-1032).
No. 57)) is calculated and a straight line or a part of the straight line is approximated to a rectangle and detected in the horizontal and vertical directions. The same processing is performed in both vertical and horizontal directions. In this process, the image resulting from the mask process is used.

【００１８】図２は、線分抽出処理を行った結果を説明
する図である。同図においては、僅かに傾いた直線が、
３つの矩形によって表されている。矩形は、長い直線を
１つの矩形で表現してしまわないように、比較的短く、
横に長い矩形を用いる。これにより、矩形を接続した形
で様々な横方向の直線や縦方向の直線が表現される。５．直線抽出線分抽出においては抽出された矩形近似された線分の
内、近隣の線分同士を統合して長い直線を検出する。検
出された直線は矩形近似を行う（特願平７−２０３２５
９号参照）。FIG. 2 is a diagram for explaining the result of the line segment extraction processing. In the figure, a slightly inclined straight line
It is represented by three rectangles. Rectangle is relatively short so that long straight lines are not represented by one rectangle,
Use a horizontally long rectangle. As a result, various horizontal straight lines and vertical straight lines are expressed by connecting the rectangles. 5. In the straight line extraction line segment extraction, a long straight line is detected by integrating neighboring line segments among the extracted rectangle-approximated line segments. The detected straight line is approximated to a rectangle (Japanese Patent Application No. 7-20325).
(See No. 9).

【００１９】図３は、直線抽出を説明する図である。直
線が線分抽出されると、矩形によって近似される。これ
らの矩形の内の２つが、接触しているか、あるいは、所
定の値よりも接近している場合、これらの２つの矩形
は、１つの直線の一部を構成するものであると判断す
る。このようにして判断された矩形の集合は、最終的に
は、１つの直線全体をカバーするようになる。こうして
検出された１つの直線について、全体を含む矩形を用い
て近似する。６．二重線の抽出二重線の判定方法として、異なる解像度の画像を目的毎
に使い分ける。縮小画像を用いて短い線分の候補位置を
抽出し、次に原画像を用いて線分が二重線であるか否か
を判定する。判定は投影によって行う。その後、それら
の統合結果を二重線として抽出する。FIG. 3 is a diagram for explaining straight line extraction. When a straight line is extracted as a line segment, it is approximated by a rectangle. When two of these rectangles are in contact with each other or are closer than a predetermined value, it is determined that these two rectangles form part of one straight line. The set of rectangles thus determined will eventually cover one straight line as a whole. One straight line detected in this way is approximated using a rectangle including the whole. 6. Extraction of Double Lines As a method of determining double lines, images with different resolutions are used for each purpose. A candidate position of a short line segment is extracted using the reduced image, and then the original image is used to determine whether or not the line segment is a double line. The judgment is made by projection. After that, the integration result is extracted as a double line.

【００２０】図４は、ＯＲ間引き処理を説明する図であ
る。同図（ａ）に示されているように、画像を２分の１
に縮小する場合、ＯＲ間引きにおいては、４つの画素の
塊の画素値のＯＲをとり、１つの画素に置き換える。す
なわち、４つの画素の内、１つでも黒の画素が含まれて
いる場合には、縮小後の画素は黒の画素とする。FIG. 4 is a diagram for explaining the OR thinning processing. As shown in FIG. 3A, the image is halved.
When the pixel size is reduced to, in OR thinning out, the pixel values of a group of four pixels are ORed and replaced with one pixel. That is, if at least one of the four pixels contains a black pixel, the reduced pixel is a black pixel.

【００２１】同図（ｂ）は、ＯＲ間引きの処理例を説明
する図である。同図（ｂ）左の画像が原画像であるとす
る。この画像を４つの画素毎に区分けして、それぞれの
中で、画素値のＯＲを取る。すると、４つの画素の内、
１つでも黒画素が含まれている区分は、縮小後１つの黒
画素として置き換えられる。また、４つ全ての画素が白
である場合にのみ、縮小後の１つ画素は、白とされる。
この結果、同図（ｂ）の右に示されるように、画像が粗
くなった縮小画像が得られる。FIG. 3B is a diagram for explaining an example of OR thinning processing. It is assumed that the image on the left side of FIG. This image is divided into four pixels, and the pixel values are ORed in each. Then, of the four pixels,
A section including at least one black pixel is replaced with one black pixel after reduction. Further, only when all four pixels are white, one pixel after reduction is made white.
As a result, a reduced image with a rough image is obtained as shown on the right side of FIG.

【００２２】このＯＲ間引き処理は前記線分抽出処理
で、すでに抽出された結果を用いる。次に、抽出された
候補位置に当たる原画像を調査し、その箇所が二重線で
あるか否かを詳細に判定を行う。判定は罫線を構成する
小領域（線分と呼ぶとする）毎に行い、方法は線分内の
線分に垂直な方向の黒画素の分布を線分に平行な方向に
投影することで判断する。線分内を直線方向に投影値を
取り、投影値がある一定値以上となる箇所が２個以上あ
り、かつ、その間に投影値がある一定値以下となる箇所
が存在するという状態のとき、その線分は二重線である
と判断する。This OR thinning process uses the result already extracted in the line segment extraction process. Next, the original image corresponding to the extracted candidate position is investigated, and whether or not the position is a double line is determined in detail. Judgment is performed for each of the small areas (called line segments) that form the ruled line, and the method is determined by projecting the distribution of black pixels in the line segment in the direction perpendicular to the line segment in the direction parallel to the line segment. To do. When a projection value is taken in a straight line direction within a line segment and there are two or more locations where the projection value is above a certain value, and there is a location where the projection value is below a certain value between them, The line segment is determined to be a double line.

【００２３】図５は、黒画素の投影処理について説明す
る図である。直線を区分的に近似した矩形の中で、線分
に垂直な方向の黒画素の分布を、線分に平行な方向に矩
形内で加算して、黒画素分布のヒストグラムを得る。そ
して、黒画素の分布が実験などによって決定される所定
値よりも大きい部分が直線の一部であり、黒画素の分布
が実験などによりやはり決定される所定値より小さい場
合には、そこは空白であるとされる。これにより、黒画
素の分布のピークが２つあれば、２重線であることが判
断できる。ヒストグラムのピークと谷の判断に使うそれ
ぞれの所定値（閾値）は、本実施形態を利用する当業者
によって、適宜定められるものであり、実験によって定
め得ることが想到されるであろう。しかし、閾値は、実
験以外の方法によって決定されても良い。FIG. 5 is a diagram for explaining the black pixel projection processing. The distribution of black pixels in the direction perpendicular to the line segment is added within the rectangle in the rectangle obtained by piecewise approximating the straight line, and the histogram of the black pixel distribution is obtained. Then, a part of the straight line is a portion where the distribution of black pixels is larger than a predetermined value determined through experiments, etc., and is blank when the distribution of black pixels is smaller than the predetermined value which is also determined through experiments. Is said to be. From this, if there are two peaks in the distribution of black pixels, it can be determined that there is a double line. It will be appreciated that the respective predetermined values (threshold values) used for determining the peak and the valley of the histogram are appropriately determined by those skilled in the art using the present embodiment, and can be determined by experiment. However, the threshold value may be determined by a method other than experiment.

【００２４】また、一本の直線内で線種が変わる場合も
ある。線種が変わる箇所はセルの境界であるため、罫線
が交差する位置を用いて端点位置を抽出する。ある罫線
の交点と交点の間の一部が二重線であると判断された場
合には、二重線と判断された箇所と罫線の交点間の間を
再度二重線の判定を行う。罫線の交点間の長さに対して
一定以上の長さが二重線であると判断された場合に、そ
の罫線の交点間は二重線で構成されていると判断する。Further, the line type may change within one straight line. Since the point where the line type changes is the cell boundary, the end point position is extracted using the position where the ruled lines intersect. When it is determined that a part of a certain ruled line is between the intersections of the ruled lines, a double line is again determined between the part determined as the double line and the intersection of the ruled lines. When it is determined that the length of the ruled line is equal to or greater than the length between the intersections of the ruled lines is a double line, it is determined that the line between the intersections of the ruled line is constituted by the double line.

【００２５】更に、二重線が部分的につぶれて、墨の入
ったような状態になっている場合でも、注目している罫
線の長さに対して、二重線と判断された部分が一定以上
である場合には、二重線であるとし、かつ、太さが変化
する箇所で罫線を分割することで一本の罫線内で線種が
変わる場合でも、部分的に二重線と判定できる。Further, even if the double line is partially crushed and is in a blackened state, there is a portion judged to be the double line with respect to the length of the ruled line of interest. If it is a certain value or more, it is considered as a double line, and even if the line type changes within one ruled line by dividing the ruled line at the part where the thickness changes, You can judge.

【００２６】以上で述べた二重線と判断するための一定
値は、やはり、当業者によって適宜定められるものであ
る。また、線間隔が離れている場合には、二重線は全く
分離した２本の線として抽出する必要がある。その場合
には、線間隔がある一定値以下の場合には、２本の罫線
を一本の二重線とみなすこととする。このとき、間隔の
閾値は、周囲の文字サイズから算出する。周囲の文字の
縦または横のサイズの一定割合以上で有れば、２本の
線、一定割合以下で有れば二重線とする。The constant value for determining the double line described above is also appropriately determined by those skilled in the art. Further, when the line intervals are large, the double line needs to be extracted as two completely separated lines. In that case, if the line spacing is less than a certain value, the two ruled lines are regarded as one double line. At this time, the threshold value of the interval is calculated from the surrounding character size. If the vertical or horizontal size of the surrounding characters is greater than or equal to a certain ratio, then two lines are used.

【００２７】なお、上記実施形態では、二重線の場合を
示したが、三重線以上についても同様である。７．セル抽出前記の処理で、罫線候補と判断されたものを用いて、セ
ルの抽出を行う。直線で４方を囲まれた領域であるセル
の抽出を行う方法は、如何なる方法でもかまわない（特
願平７−２０３２５９号参照）。８．入れ子構造罫線セル抽出図６、及び図７は、入れ子構造の罫線部分を抽出する方
法を説明する図である。In the above embodiment, the case of the double line is shown, but the same applies to the triple line and above. 7. Cell Extraction A cell is extracted using the ruled line candidate determined in the above process. Any method may be used for extracting cells, which are areas surrounded by straight lines on four sides (see Japanese Patent Application No. 7-203259). 8. Nested Structure Ruled Line Cell Extraction FIGS. 6 and 7 are diagrams illustrating a method of extracting a ruled line portion of a nested structure.

【００２８】入れ子部分は、セル内を表とみなして再帰
処理を行うことで対応する。従来は再帰回数に限度を設
けていた。限度を設けていた理由としては、単純に再帰
処理の回数を増加させると、文字部分から誤抽出された
罫線による小さい入れ子構造を過度に抽出してしまうた
めである。そこで、３重以上の入れ子（図６の太線の箇
所が三重入れ子）に対応しているために、注目している
セル周囲の文字サイズと、入れ子処理対象としている罫
線の長さを比較し、再帰処理停止の判断を自動で行える
ようにする。The nested portion is dealt with by treating the inside of the cell as a table and performing recursive processing. In the past, there was a limit on the number of recursion. The reason for setting the limit is that if the number of times of recursive processing is simply increased, a small nested structure due to ruled lines erroneously extracted from the character portion is excessively extracted. Therefore, since three or more nests are supported (three thick lines in FIG. 6 are nested), the character size around the cell of interest is compared with the length of the ruled line targeted for nesting, Make it possible to automatically determine whether to stop recursive processing.

【００２９】図７に例を示す。同図（ａ）、（ｂ）の図
中の矢印で示した線は一重入れ子であり、両図で同一フ
ォーマットを示している。しかし、同図（ａ）は、文字
の一部から誤抽出された罫線であり、同図（ｂ）は罫線
である。この場合、矢印で示した線が入れ子を構成する
罫線であると、同図（ａ）の場合、入れ子セルは文字サ
イズと比較して小さすぎるため不適当と見える。同図
（ｂ）では、文字が小さいため矢印で示す線で区切られ
た入れ子セルの大きさは文字サイズと比較して十分大き
く適当である。そこで、入れ子処理を行ってできる予定
のセルサイズと、周囲の文字のサイズを推定した結果の
値を比較し、セル幅が文字サイズの一定割合以上大きい
場合には入れ子として処理を行い、逆にセルが文字サイ
ズと比較して小さすぎる場合には、これ以上再帰処理で
入れ子抽出を行わないとする。An example is shown in FIG. Lines indicated by arrows in the figures (a) and (b) are single nests, and both figures show the same format. However, FIG. 9A shows a ruled line erroneously extracted from a part of a character, and FIG. 9B shows a ruled line. In this case, if the line indicated by the arrow is a ruled line forming a nest, in the case of FIG. 9A, the nest cell is too small as compared with the character size, so it seems unsuitable. In FIG. 6B, since the characters are small, the size of the nested cells separated by the line indicated by the arrow is sufficiently larger than the character size and is appropriate. Therefore, the cell size that can be created by performing the nesting process is compared with the value of the result of estimating the size of the surrounding characters, and if the cell width is larger than a certain percentage of the character size, the process is performed as a nest, If the cell is too small compared to the character size, it is assumed that the recursive process will not perform nested extraction.

【００３０】図８は、入れ子構造の別の抽出方法を説明
する図である。文字の一部などのパターンが罫線に接触
している場合、同図（ａ）のように接触している文字パ
ターンの一部が罫線を挟み、罫線と同じ方向の線である
場合には、同図（ｂ）のように（同図（ｂ）、（ｃ）は
同図（ａ）の一部を切り取ったもの）、もともとはつな
がったパターンであるとみなし文字サイズを算出する。
その結果、文字サイズは同図（ｃ）の矩形のようにな
り、その結果を用いて平均文字サイズを算出する。FIG. 8 is a diagram for explaining another extraction method of the nested structure. When a pattern such as a part of a character is in contact with a ruled line, if a part of the character pattern in contact with the ruled line is in the same direction as the ruled line as shown in FIG. As shown in FIG. 7B (FIGS. 8B and 8C, a part of FIG. 8A is cut out), and the character size is calculated assuming that the patterns are originally connected.
As a result, the character size becomes like the rectangle in FIG. 7C, and the average character size is calculated using the result.

【００３１】すなわち、このようにして得られた文字サ
イズが、罫線で作られるセルの幅よりも小さい場合に
は、その罫線は誤抽出されたものと判断する。図８の場
合には、「さ」の横線が連続したために、誤抽出された
ものと判断する。このように、入れ子構造の判断におい
ては、図７で説明した方法では、問題の罫線によるセル
の外の文字の大きさの平均から罫線が誤検出されたもの
か否かを判断していたが、これでは、問題のセルに周囲
の文字より小さい文字が含まれている場合に上手く動作
しない。そこで、図８で説明したように、問題のセルに
含まれる文字を抽出し、その大きさを用いて、罫線が誤
検出されたものか否かを判断することにより、誤検出の
判定がより正確になる。９．斜め線抽出図９〜図１１は、斜め線の抽出処理を説明する図であ
る。That is, when the character size thus obtained is smaller than the width of the cell formed by the ruled line, it is determined that the ruled line is erroneously extracted. In the case of FIG. 8, since the horizontal line of “sa” is continuous, it is determined that the line is mistakenly extracted. As described above, in the determination of the nested structure, the method described with reference to FIG. 7 determines whether or not the ruled line is erroneously detected from the average size of the characters outside the cell due to the ruled line in question. , This doesn't work if the cell in question contains less than the surrounding text. Therefore, as described with reference to FIG. 8, the character included in the cell in question is extracted, and its size is used to determine whether or not the ruled line is erroneously detected. Be accurate. 9. Diagonal Line Extraction FIGS. 9 to 11 are diagrams for explaining the oblique line extraction processing.

【００３２】斜め線とは、図９のようにセル内を斜めに
区切っている線のことである。セルの頂点座標を結ぶ傾
きと同じ傾きを持つ斜め線抽出フィルタによる抽出を行
う。斜め線抽出の処理単位は、略文字より少し大きい程
度のサイズを設定する。斜め線抽出フィルタとしては、
所定の角度だけ傾いた、細長い矩形領域とすることが考
えられる。The diagonal line is a line dividing the inside of the cell obliquely as shown in FIG. Extraction is performed by a diagonal line extraction filter having the same slope as the slope connecting the cell vertex coordinates. The diagonal line extraction processing unit is set to a size that is slightly larger than the approximate character. As a diagonal line extraction filter,
It is conceivable that the area is an elongated rectangular area inclined by a predetermined angle.

【００３３】しかし、斜め線の始点終点がセルの頂点と
一致していない場合も多いことから、セル頂点付近でフ
ィルタ位置を変動させて斜め線端部を見つけるようにす
る。頂点付近において向かい合った位置に斜め線が存在
した場合、そのセルは斜め線を含む可能性があるとし
て、セル中央部の斜め線の有無を調査し、中央部にも斜
め線が存在した場合に限り、そのセルは斜め線を含むも
のと判断する。中央部からの斜め線抽出は、図９のよう
に斜め線抽出単位で、斜め線を抽出した結果、斜め線抽
出単位が連続して長い斜め線を構成していれば、長い斜
め線が存在するとする。However, since the start and end points of the diagonal line often do not coincide with the cell vertices, the filter position is changed near the cell vertex to find the diagonal line end. If there is a diagonal line at the position facing each other near the apex, it is considered that the cell may include a diagonal line.If there is a diagonal line in the center of the cell, check if there is a diagonal line in the center. As long as the cell is judged to include diagonal lines. As for the diagonal line extraction from the central portion, as shown in FIG. 9, the diagonal line is extracted in the diagonal line extraction unit. As a result, if the diagonal line extraction unit continuously forms a long diagonal line, a long diagonal line exists. I will.

【００３４】斜め線抽出は、ある一定サイズの斜め線フ
ィルタの長さを単位として、斜め線を構成する小領域に
分割して抽出を行い、ある既知の角度を持った直線を抽
出する。抽出には、設定された角度によってある２点間
の画素探索を行い、斜め方向への投影値を求める。算出
された値がある一定以上であれば、その２点間を結ぶ斜
め線が存在すると判断する。角度については、セルの頂
点座標から、対向するセルの頂点を結んだ場合に角度が
どのくらいになるかを推定し、画素探索をその角度に沿
ってを行う。画素探索を行う各点が白画素であっても、
一定範囲内に黒画素があれば、注目画素は黒と見なす。
その結果黒画素密度が一定以上である場合、斜め線が存
在するとする。In the diagonal line extraction, the length of a diagonal line filter having a certain fixed size is used as a unit to divide into small regions forming diagonal lines for extraction, and a straight line having a certain known angle is extracted. For extraction, a pixel search between certain two points is performed according to the set angle, and a projection value in an oblique direction is obtained. If the calculated value is greater than or equal to a certain value, it is determined that there is a diagonal line connecting the two points. Regarding the angle, it is estimated from the coordinates of the vertices of the cell what the angle will be when the vertices of the opposing cells are connected, and the pixel search is performed along the angle. Even if each point for pixel search is a white pixel,
If there is a black pixel within a certain range, the pixel of interest is regarded as black.
As a result, when the black pixel density is equal to or higher than a certain level, it is assumed that a diagonal line exists.

【００３５】図１０は、セルが丸角部を有する場合の斜
め線抽出処理を説明する図である。セルには、丸角部を
持つ場合が数多くあり、斜め線の端点は丸角の円弧部分
の中心ではなく、端部にある場合も多い。そこで、丸角
を持つセルは、セルの頂点である丸角の中心だけでな
く、端部に斜め線の端点がある場合を想定し、抽出を行
う。丸角についての情報は、丸角の場所とだいたいのサ
イズを用い、丸角部の円弧の中心部を始点とする場合、
丸角部の円弧の両端部を始点とする場合の計３種の斜め
線を想定して抽出を行う。FIG. 10 is a diagram for explaining the diagonal line extraction processing when a cell has rounded corners. In many cases, cells have rounded corners, and the end points of diagonal lines are often located at the ends of the rounded corners instead of at the center. Therefore, a cell having a rounded corner is extracted on the assumption that not only the center of the rounded corner, which is the apex of the cell, but also the end point of the diagonal line at the end. For information about rounded corners, use the location and the approximate size of the rounded corner, and if the center of the rounded corner is the starting point,
Extraction is performed assuming a total of three types of diagonal lines when starting points are both ends of a circular arc of a rounded corner.

【００３６】図１０に示すように、同図では、セルの丸
角部の円弧の右端点に斜め線の端点がある場合を示して
いる。斜め線の角度は、抽出対象となっている円弧の部
分と、対向する角を結ぶ線がなす角度であると推定す
る。これにより、角部における斜め線が検出でき、斜め
線の中程の部分は前述した方法により検出する。As shown in FIG. 10, the drawing shows a case where the right end point of the circular arc of the round corner of the cell has the end point of the diagonal line. It is estimated that the angle of the diagonal line is the angle formed by the line connecting the corners of the arc to be extracted and the facing angle. Thereby, the diagonal line at the corner can be detected, and the middle part of the diagonal line is detected by the method described above.

【００３７】なお、上述の方法では、設定された２点間
の方向を持った黒画素密度を用いているため、白黒反転
セルのような黒画素密度の高いセルについて検出を行う
と、誤った抽出をしてしまう。これを防止するため、斜
め線として抽出された部分の周囲の画素密度を調査し、
周囲が塗りつぶされていない場合に、斜め線であると判
断する。斜め線と判断された領域から離れた（例、セル
の辺の長さの１／３から２／３の範囲、図１１参照）と
ころで、セル頂点間から求まる傾きと同じ角度の斜め線
の存在を判定する。その結果、画素密度が低く、斜め線
が存在しない部分がある場合には、検出された黒画素は
最終的に斜め線の黒画素であると判定し、周囲の画素密
度が高い場合には、注目しているセルは黒画素密度の高
く、斜め線を含むセルではないと判断する。In the above method, since the black pixel density having the set direction between the two points is used, if a cell having a high black pixel density such as a black-and-white inversion cell is detected, it is erroneous. It will be extracted. To prevent this, check the pixel density around the part extracted as a diagonal line,
If the surrounding area is not filled, it is determined to be a diagonal line. Existence of an oblique line having the same angle as the inclination obtained from the cell vertices, away from the area determined to be an oblique line (eg, the range of 1/3 to 2/3 of the side length of the cell, see FIG. 11) To judge. As a result, when the pixel density is low and there is a portion where there is no diagonal line, it is determined that the detected black pixel is finally a diagonal black line pixel, and when the surrounding pixel density is high, It is determined that the cell of interest has a high black pixel density and is not a cell including diagonal lines.

【００３８】本実施形態では、ある２点の間に斜め線が
存在するかどうかを判断するため、セル内で斜め線が交
わっている場合にも判断でき、また、斜め線と文字が重
なっている場合でも、斜め線部分と文字部分の黒画素密
度が異なるため斜め線の抽出は可能である。また、以下
のような特徴がある。１）白黒二値の入力画像に対して、縮小画像を作成し、
縮小画像を用いて罫線位置を推定する手段及び、より高
解像度の画像を用いて、推定された罫線位置内における
罫線方向への画素密度が山となる箇所数により、罫線の
本数を判断する手段を持つことで、掠れや部分的につぶ
れた二重線を抽出することができる。２）上記１）において、周囲の文字サイズを推定し、そ
の縦または横の長さの一定割合を、二重線とみなす罫線
の間隔の最大値として設定し、２本の罫線の間隔がその
値以上である場合には２本の罫線であるとすることで、
小さなセルなのか二重線かを判断できる。３）抽出された罫線候補から、罫線で四辺を囲まれた領
域（セル）を抽出し、罫線を決定する処理において、入
れ子構造となっている箇所を再帰的に処理を行うことを
可能とする構造を持ち、記入された文字の縦または横の
長さと、セルの辺の長さを比較し、セルの辺の長さが記
入された文字の縦または横の長さから算出された閾値よ
り小さい場合は、再帰処理を停止し、大きい場合には継
続すると判断することにより、入れ子構造となったセル
を抽出することで入れ子が多重になっている場合でも、
単純な構造である場合でも正確に罫線、セルの抽出を行
うことができる。４）白黒二値の入力画像に対して、表内のセルの互いに
向かい合う頂点間に引かれた斜め線を抽出するため、互
いに向かい合う頂点およびその周囲の領域に斜め方向成
分が存在するかを判断する手段、互いに向かいあう頂点
およびその周囲にそれぞれ斜め成分が存在している場合
には、それらの挟む領域内に斜め線が存在するか否かを
判断する手段をもつことで、斜め線が存在しないセルを
処理の早期に判断でき処理時間を短縮する事ができる。５）上記４）における、斜め方向成分の抽出手段につい
て、ある小矩形を単位領域とし、縦横比は、対象とする
セルの頂点情報から推定した傾きから算出する。単位領
域の頂点間を結ぶ直線上の画素密度を算出し、画素密度
がある一定値以上であった場合には、その単位領域には
斜め方向成分があるとし、ある２点間の間の斜め方向成
分は単位領域の集合として抽出することで、多少の歪
み、掠れがあっても処理することが可能である。In the present embodiment, since it is determined whether or not a diagonal line exists between two certain points, it can be determined even when the diagonal line intersects in the cell, and the diagonal line and the character overlap. Even if the diagonal line is present, the diagonal line can be extracted because the black pixel densities of the diagonal line portion and the character portion differ. It also has the following features. 1) Create a reduced image for a black and white binary input image,
A means for estimating the ruled line position using a reduced image, and a means for determining the number of ruled lines based on the number of places where the pixel density in the ruled line direction in the estimated ruled line position has a peak using a higher resolution image By holding, it is possible to extract a blurred or partially crushed double line. 2) In 1) above, the surrounding character size is estimated, and a certain percentage of the vertical or horizontal length is set as the maximum value of the spacing between ruled lines considered as double lines. If it is equal to or larger than the value, it is assumed that there are two ruled lines.
You can judge whether it is a small cell or a double line. 3) From the extracted ruled line candidates, a region (cell) surrounded by four sides by ruled lines is extracted, and in the process of determining the ruled lines, it is possible to recursively process a portion having a nested structure. It has a structure and compares the vertical or horizontal length of the written character with the side length of the cell, and the side length of the cell is greater than the threshold calculated from the vertical or horizontal length of the written character. If it is small, the recursive process is stopped, and if it is large, it is judged to continue, and even if nesting is multiplexed by extracting cells with a nested structure,
Even with a simple structure, ruled lines and cells can be accurately extracted. 4) For a black-and-white binary input image, the diagonal lines drawn between the vertices of the cells in the table that face each other are extracted, so it is determined whether there is a diagonal component in the vertices that face each other and the surrounding area. If there is a diagonal component at each of the vertices facing each other and their surroundings, a diagonal line does not exist by having a means for determining whether or not a diagonal line exists in the area sandwiched between them. The cells can be judged early in the processing, and the processing time can be shortened. 5) Regarding the means for extracting the diagonal component in 4) above, a certain small rectangle is used as the unit area, and the aspect ratio is calculated from the inclination estimated from the vertex information of the target cell. The pixel density on the straight line connecting the vertices of the unit area is calculated. If the pixel density is a certain value or more, it is assumed that the unit area has an oblique component, By extracting the direction component as a set of unit areas, it is possible to process even if there is some distortion or blurring.

【００３９】図１２〜図１９は、本発明の実施形態に従
った処理のフローチャートである。図１２は、全体の処
理の流れを示すフローチャートである。まず、ステップ
Ｓ１において、画像を入力する。画像は基本的に２値画
像とするが、カラー画像やグレースケール画像を２値化
したものであっても良い。次に、ステップＳ２におい
て、ＯＲ間引きなどにより、縮小画像を作成する。ステ
ップＳ３においては、ラベリング処理を行って、黒画素
の塊を検出する。次に、前述のマスク処理を行い（ステ
ップＳ４）、線分抽出を行う（ステップＳ５）。そし
て、ステップＳ６において、二重線候補を抽出し、ステ
ップＳ７において、直線抽出を行う。ステップＳ８にお
いて、セル抽出を行い、ステップＳ９において、前述の
入れ子を構成する罫線の判定を行う。そして、ステップ
Ｓ１０において、二重線の決定を行い、ステップＳ１１
において、斜め線を抽出する。12 to 19 are flowcharts of the processing according to the embodiment of the present invention. FIG. 12 is a flowchart showing the flow of overall processing. First, in step S1, an image is input. The image is basically a binary image, but a color image or a grayscale image may be binarized. Next, in step S2, a reduced image is created by OR thinning or the like. In step S3, a labeling process is performed to detect a block of black pixels. Next, the masking process described above is performed (step S4), and the line segment is extracted (step S5). Then, a double line candidate is extracted in step S6, and a straight line is extracted in step S7. In step S8, cells are extracted, and in step S9, the ruled lines forming the above-mentioned nesting are determined. Then, in step S10, the double line is determined, and step S11
In, the diagonal line is extracted.

【００４０】図１３は、二重線候補抽出処理を示すフロ
ーチャートである。まず、線分抽出が終了した後、ステ
ップＳ２０において、線分毎に罫線方向に黒画素の投影
処理を行う。ステップＳ２１において、投影結果のピー
ク値が２個所あるか否かを判断する。なお、三重線以上
を判断する場合には、ピーク値が３個以上あるか否かを
判断する。ステップＳ２１の判断がＮＯの場合には、対
象とする線分は二重線ではないとする。ステップＳ２１
の判断がＹＥＳの場合には、ステップＳ２２において、
ピーク値投影値が一定値以上であるか否かを判断する。
ステップＳ２２において、判断がＮＯの場合には、対象
とする線分は二重線ではないと判断する。ステップＳ２
２の判断がＹＥＳの場合には、ステップＳ２３におい
て、ピーク値の間の投影値が一定値以下であるか否かを
判断する。ステップＳ２３の判断がＮＯの場合には、対
象とする線分は二重線ではないとする。ステップＳ２３
の判断がＹＥＳの場合には、対象とする線分が二重線で
あるとする。FIG. 13 is a flow chart showing the double line candidate extraction processing. First, after the line segment extraction is completed, black pixel projection processing is performed in the ruled line direction for each line segment in step S20. In step S21, it is determined whether or not there are two peak values of the projection result. When determining the triple line or higher, it is determined whether or not there are three or more peak values. If the determination in step S21 is no, the target line segment is not a double line. Step S21
If the determination is YES, in step S22,
It is determined whether or not the peak value projection value is a certain value or more.
If the determination is NO in step S22, it is determined that the target line segment is not a double line. Step S2
If the determination in step 2 is YES, it is determined in step S23 whether or not the projection value between the peak values is equal to or less than a certain value. If the determination in step S23 is no, the target line segment is not a double line. Step S23
If the determination is YES, it is assumed that the target line segment is a double line.

【００４１】ここで、三重線以上の候補を抽出する場合
には、ピーク値の数のみならず、全てのピーク値の値が
一定以上か、ピーク値間の谷の値が全て一定値以下かな
どを判断する。Here, when extracting candidates of triple lines or more, not only the number of peak values but also the values of all peak values are not less than a certain value or all the valley values between the peak values are not more than a certain value. Etc.

【００４２】図１４は、二重線決定処理の第１の処理で
ある。まず、ステップＳ２５において、セルを抽出す
る。ステップＳ２６において、セルを構成する辺毎を処
理対象とする。ステップＳ２７において、辺内に一定以
上の二重線領域があるか否かを判断する。ステップＳ２
７の判断がＮＯの場合には、処理対象の辺は二重線では
ないと判断し、ＹＥＳの場合には、処理対象の辺は二重
線であると判断する。FIG. 14 shows the first processing of the double line determination processing. First, in step S25, cells are extracted. In step S26, each side of the cell is processed. In step S27, it is determined whether or not there is a certain double-line area in the side. Step S2
If the determination in 7 is NO, it is determined that the side to be processed is not a double line, and if YES is determined, the side to be processed is a double line.

【００４３】図１５は、二重線決定処理の第２の処理で
ある。ステップＳ３０において、隣り合った罫線２本を
選択する。ステップＳ３１において、各々の罫線で構成
されるセル内にある文字の平均サイズを算出する。そし
て、ステップＳ３２において、処理対象とする２本の罫
線の間隔が平均文字サイズより大きいか否かを判断す
る。ステップＳ３２の判断がＹＥＳの場合は、２本の罫
線はそれぞれ独立した罫線であると決定する。ステップ
Ｓ３２の判断がＮＯの場合には、ステップＳ３３におい
て、２本の罫線は二重線を構成すると判断し、２本の罫
線の間に作られたセルを削除する。FIG. 15 shows the second processing of the double line determination processing. In step S30, two adjacent ruled lines are selected. In step S31, the average size of the characters in the cell formed by each ruled line is calculated. Then, in step S32, it is determined whether the interval between the two ruled lines to be processed is larger than the average character size. When the determination in step S32 is YES, it is determined that the two ruled lines are independent ruled lines. If the determination in step S32 is no, in step S33, it is determined that the two ruled lines form a double line, and the cells created between the two ruled lines are deleted.

【００４４】図１６は、斜め線の抽出処理である。ま
ず、ステップＳ４０において、処理対象セルを設定し、
ステップＳ４１において、セル頂点座標から斜め線の傾
きを推定する。ステップＳ４２において、斜め線の抽出
単位を設定し、ステップＳ４３において、セル頂点にお
いて、単位長さの斜め線検出を行う。次に、ステップＳ
４４において、向かい合う頂点付近にそれぞれ斜め線成
分が存在するか否かを判断する。ステップＳ４４の判断
がＮＯの場合には、斜め線は存在しないと判断する。ス
テップＳ４４の判断がＹＥＳの場合には、ステップＳ４
５において、頂点間を結ぶ直線上にある斜め成分を抽出
する。ステップＳ４６において、頂点間を結ぶ直線上の
斜め成分が一定以上の割合を示すか否かを判断する。ス
テップＳ４６の判断がＮＯの場合には、斜め線が存在し
ないと判断する。ステップＳ４６の判断がＹＥＳの場合
には、斜め線が存在すると判断する。FIG. 16 shows an oblique line extraction process. First, in step S40, a processing target cell is set,
In step S41, the inclination of the diagonal line is estimated from the cell vertex coordinates. In step S42, the unit for extracting diagonal lines is set, and in step S43, diagonal line detection of unit length is performed at cell vertices. Next, step S
At 44, it is determined whether or not there are diagonal line components near the facing vertices. If the determination in step S44 is no, it is determined that there is no diagonal line. When the determination in step S44 is YES, step S4
In step 5, the diagonal component on the straight line connecting the vertices is extracted. In step S46, it is determined whether or not the diagonal component on the straight line connecting the vertices has a ratio higher than a certain value. If the determination in step S46 is no, it is determined that there is no diagonal line. When the determination in step S46 is YES, it is determined that a diagonal line exists.

【００４５】図１７は、斜め成分の抽出処理である。ス
テップＳ５０において、斜め線抽出単位として一定の長
さを決める。ステップＳ５１において、セルの頂点間を
結ぶ直線の傾きを抽出する斜め成分の傾きとする。ステ
ップＳ５２において、斜め方向成分の存在を調査する領
域として端点２点を設定する。ステップＳ５３におい
て、２点間の領域に対して傾き方向に黒画素の投影を行
う。ステップＳ５４において、一定以上の黒画素が投影
した結果存在することが分かったか否かを判断する。ス
テップＳ５４の判断がＹＥＳの場合には、斜め成分があ
ると判断する。ステップＳ５４の判断がＮＯの場合に
は、ステップＳ５５において、２点の位置を左右に一定
範囲内で移動させ、黒画素投影を行う。ステップＳ５６
において、投影の結果、一定以上の黒画素が存在するこ
とが分かったか否かを判断する。ステップＳ５６の判断
がＹＥＳの場合には、斜め線成分があると判断する。ス
テップＳ５６の判断がＮＯの場合には、ステップＳ５７
において、２点の位置を上下に一定範囲内で移動させ、
黒画素投影を行う。ステップＳ５８において、投影の結
果、一定以上の黒画素が存在するか否かを判断する。ス
テップＳ５８の判断がＹＥＳの場合には、斜め線成分が
存在すると判断する。ステップＳ５８の判断がＮＯの場
合には、斜め線成分が存在しないと判断する。FIG. 17 shows an oblique component extraction process. In step S50, a fixed length is determined as the diagonal line extraction unit. In step S51, the inclination of the straight line connecting the vertices of the cells is taken as the inclination of the oblique component. In step S52, two end points are set as regions for examining the presence of the diagonal component. In step S53, black pixels are projected in the tilt direction on the area between the two points. In step S54, it is determined whether or not it is found that a certain number or more of black pixels are present as a result of projection. If the determination in step S54 is yes, it is determined that there is a diagonal component. If the determination in step S54 is no, in step S55, the positions of the two points are moved left and right within a certain range, and black pixel projection is performed. Step S56
At, as a result of projection, it is determined whether or not there is a certain number of black pixels or more. If the determination in step S56 is yes, it is determined that there is a diagonal line component. When the determination in step S56 is NO, step S57
In, the position of two points is moved up and down within a certain range,
Perform black pixel projection. In step S58, as a result of the projection, it is determined whether or not there are more than a certain number of black pixels. If the determination in step S58 is yes, it is determined that there is a diagonal line component. If the determination in step S58 is no, it is determined that there is no diagonal line component.

【００４６】図１８は、入れ子を構成する罫線の判定処
理である。ステップＳ６０において、横罫線候補の選択
を行う。ステップＳ６１において、選択された横罫線候
補を中心として一定範囲内にある文字の平均サイズを算
出する。ステップＳ６１の処理は、後に図１９で説明す
る処理を行っても良い。ステップＳ６２においては、対
とされた横罫線の間隔が平均文字サイズより大きいか否
かを判断する。ステップＳ６２の判断がＮＯの場合に
は、ステップＳ６５において、次の横罫線対を選択す
る。ステップＳ６２の判断がＹＥＳの場合には、ステッ
プＳ６３において、横罫線であると判断し、次に、ステ
ップＳ６４で縦罫線について処理する。ステップＳ６４
の処理は、ステップＳ６０からステップ６３及びステッ
プＳ６５の処理を縦罫線についても行うものである。FIG. 18 shows a ruled line determining process for forming a nest. In step S60, horizontal ruled line candidates are selected. In step S61, the average size of characters within a certain range centered on the selected horizontal ruled line candidate is calculated. The process of step S61 may be the process described later with reference to FIG. In step S62, it is determined whether the interval between the paired horizontal ruled lines is larger than the average character size. If the determination in step S62 is no, in step S65, the next horizontal ruled line pair is selected. If the determination in step S62 is YES, it is determined in step S63 that it is a horizontal ruled line, and then in step S64, the vertical ruled line is processed. Step S64
The process of step S60 is to perform the processes of steps S60 to 63 and step S65 for the vertical ruled line.

【００４７】図１９は、図１８のステップＳ６１の処理
の別の方法を説明するフローチャートである。ステップ
Ｓ６６において、選択された横罫線候補に接触するパタ
ーンがあるか否かを判断する。ステップＳ６６の判断が
ＮＯの場合には、ステップＳ７０において、選択された
横罫線候補を中心として一定範囲内にある文字の平均サ
イズを算出して、図１８のステップＳ６２に進む。ステ
ップＳ６６の判断がＹＥＳの場合には、ステップＳ６７
において、横罫線候補に接触するパターンの内、罫線を
挟み同じ方向を持つ部分は同じ文字を構成するとする。
そして、ステップＳ６８において、同じ文字を構成する
部分はつながっていると仮定し、文字サイズを算出す
る。更に、ステップＳ６９において、算出された文字サ
イズを平均文字サイズとして設定し、図１８のステップ
Ｓ６２に進む。FIG. 19 is a flow chart for explaining another method of the process of step S61 of FIG. In step S66, it is determined whether or not there is a pattern in contact with the selected horizontal ruled line candidate. If the determination in step S66 is no, in step S70, the average size of characters within a certain range centered on the selected horizontal ruled line candidate is calculated, and the process proceeds to step S62 in FIG. When the determination in step S66 is YES, step S67
In, in the pattern that touches the horizontal ruled line candidate, the portions that sandwich the ruled line and have the same direction form the same character.
Then, in step S68, the character size is calculated on the assumption that parts forming the same character are connected. Further, in step S69, the calculated character size is set as the average character size, and the process proceeds to step S62 in FIG.

【００４８】図２０は、本発明の実施形態の方法をプロ
グラムで実現する場合に必要とされるハードウェア環境
を説明する図である。バス１０に接続されたＣＰＵ１１
は、情報処理装置８の起動時に、やはりバス１０に接続
されたＲＯＭ１２からＢＩＯＳなどの基本ソフトウェア
を読み込み、情報処理装置８をユーザが使用可能とす
る。本願発明の実施形態を実現する当該プログラムは、
ＲＯＭ１２に格納されていても良いが、一般には、ハー
ドディスクなどの記憶装置１７、あるいは、フレキシブ
ルディスク、ＣＤ−ＲＯＭ、ＤＶＤその他の可搬記録媒
体１９に格納される。可搬記録媒体１９に格納された当
該プログラムは、読み取り装置１８を介して、ＲＡＭ１
３にコピーされ、ＣＰＵ１１によって実行される。記憶
装置１７に格納された当該プログラムはやはり、バス１
０を介して、ＲＡＭ１３にコピーされ、ＣＰＵ１１によ
って実行される。FIG. 20 is a diagram for explaining the hardware environment required when the method of the embodiment of the present invention is implemented by a program. CPU 11 connected to bus 10
Reads the basic software such as BIOS from the ROM 12 which is also connected to the bus 10 when the information processing device 8 is activated, and makes the information processing device 8 available to the user. The program that realizes the embodiment of the present invention is
Although it may be stored in the ROM 12, it is generally stored in the storage device 17 such as a hard disk, or a portable recording medium 19 such as a flexible disk, a CD-ROM, a DVD or the like. The program stored in the portable recording medium 19 is stored in the RAM 1 via the reading device 18.
3 and is executed by the CPU 11. The program stored in the storage device 17 is still the bus 1
It is copied to the RAM 13 via 0 and executed by the CPU 11.

【００４９】入出力装置２０は、ディスプレイ、キーボ
ード、テンプレート、マウスなどの一般的な入出力装置
を含むが、情報処理装置８を画像読み込み装置として使
用するためには、入力装置としてスキャナなどを備える
必要がある。The input / output device 20 includes general input / output devices such as a display, a keyboard, a template, and a mouse, but in order to use the information processing device 8 as an image reading device, a scanner or the like is provided as an input device. There is a need.

【００５０】通信インターフェース１４は、ネットワー
ク１５を介して、情報処理装置８と情報提供者１６のデ
ータベースとをデータ交換可能なように接続する。当該
プログラムは、情報提供者１６のデータベースから情報
処理装置８にダウンロードして使用することも可能であ
るし、情報処理装置８を情報提供者１６のデータベース
やその他の情報処理装置と接続したまま、ネットワーク
環境の下で当該プログラムを実行するようにしても良
い。The communication interface 14 connects the information processing apparatus 8 and the database of the information provider 16 via the network 15 so that data can be exchanged. The program can be downloaded from the database of the information provider 16 to the information processing device 8 for use, or the program can be used while the information processing device 8 is connected to the database of the information provider 16 and other information processing devices. The program may be executed under a network environment.

【００５１】（付記１）入力画像の縮小画像から罫線の
位置を推定する罫線位置推定ステップと、高解像度の入
力画像を用いて、推定された罫線位置における罫線の画
素密度を罫線方向に投影し、該画素密度のピークの数に
より、罫線が何本の線からなっているかを判断する罫線
判断ステップと、を備える罫線抽出方法を情報処理装置
に実現させるプログラム。(Supplementary Note 1) Using the ruled line position estimation step of estimating the position of the ruled line from the reduced image of the input image and the high resolution input image, the pixel density of the ruled line at the estimated ruled line position is projected in the ruled line direction. A program for causing an information processing apparatus to implement a ruled line extraction method, which comprises a ruled line determination step of determining how many lines a ruled line is composed of, based on the number of peaks of the pixel density.

【００５２】（付記２）前記推定された罫線位置の周囲
の文字の大きさと、該罫線が複数の線からなっている場
合の線間の幅とを比較することにより、当該推定された
罫線位置にある線は、複数の線からなる罫線か、別個の
線が並んだものかを判断するステップを更に備えること
を特徴とする付記１に記載のプログラム。(Supplementary Note 2) The estimated ruled line position is calculated by comparing the size of the character around the estimated ruled line position with the width between the lines when the ruled line is composed of a plurality of lines. 2. The program according to appendix 1, further comprising a step of determining whether the line in 1 is a ruled line composed of a plurality of lines or a line in which separate lines are lined up.

【００５３】（付記３）前記推定された罫線位置におけ
る罫線が、所定の長さ以上に渡って複数の線からなる罫
線であると判断された場合には、該罫線全体が該複数の
線からなる罫線であると判断するステップを更に備える
ことを特徴とする付記１に記載のプログラム。(Supplementary Note 3) When it is determined that the ruled line at the estimated ruled line position is a ruled line composed of a plurality of lines over a predetermined length or more, the entire ruled line is separated from the plurality of lines. The program according to Appendix 1, further comprising a step of determining that the ruled line is

【００５４】（付記４）入力画像から抽出された罫線で
四辺を囲まれた領域を抽出するステップと、該領域が入
れ子構造になっている場合に、該領域の抽出を再帰的処
理によって抽出するステップと、該再帰的処理を、該領
域の内部あるいは周辺にある文字の大きさよりも、抽出
される領域の方が小さくなった場合に、再帰的処理を終
了するステップと、を備える罫線抽出方法を情報処理装
置に実現させるプログラム。(Supplementary Note 4) A step of extracting a region surrounded by four ruled lines extracted from the input image, and when the region has a nested structure, the extraction of the region is performed by recursive processing. A ruled line extracting method comprising: a step; and a step of terminating the recursive processing when the area to be extracted is smaller than the size of a character inside or around the area. A program that causes an information processing device to realize.

【００５５】（付記５）入力画像から抽出された罫線で
四辺を囲まれた領域の向かい合う頂点あるいは該頂点の
周囲に斜め方向成分が存在するか否かを判断するステッ
プと、該向かい合う頂点あるいはその周囲にある該斜め
方向成分が挟む領域に斜め線が存在するか否かを判断す
るステップと、を備える罫線抽出方法を情報処理装置に
実現させるプログラム。(Supplementary Note 5) A step of determining whether or not an apex facing the area surrounded by the ruled lines extracted from the input image on all sides or a diagonal component around the apex, and the apex facing the apex or its A program for causing an information processing apparatus to realize a ruled line extraction method, which comprises a step of determining whether or not a diagonal line exists in a region sandwiched by the diagonal direction components in the periphery.

【００５６】（付記６）前記斜め方向成分の存否判断ス
テップにおいて、ある矩形領域を単位領域として、前記
頂点間を結ぶ直線上の、当該単位領域内の画素密度を算
出するステップと、該画素密度がある一定値以上であっ
た時には、該頂点に斜め方向成分が存在すると判断する
ステップと、を更に備えることを特徴とする付記５に記
載のプログラム。(Supplementary Note 6) In the presence / absence determining step of the oblique direction component, a step of calculating a pixel density in the unit area on a straight line connecting the vertices with a certain rectangular area as a unit area, and the pixel density The program according to Supplementary Note 5, further comprising: a step of determining that a diagonal component exists at the apex when a certain value or more.

【００５７】（付記７）前記頂点に挟まれる領域内の斜
め線の抽出は、推定される斜め線上の前記単位領域の集
合の中に斜め方向成分が含まれるか否かを判断すること
によって行われることを特徴とする付記６に記載のプロ
グラム。(Supplementary Note 7) The extraction of the diagonal line in the region sandwiched by the vertices is performed by determining whether or not the diagonal direction component is included in the set of the unit regions on the estimated diagonal line. The program according to appendix 6, wherein the program is opened.

【００５８】（付記８）前記頂点は、丸角部となってお
り、丸角部の円弧の中心、両端点について、斜め方向成
分の検出を行うことを特徴とする付記５に記載のプログ
ラム。(Supplementary Note 8) The program according to Supplementary Note 5, wherein the apex is a rounded corner portion, and the diagonal direction components are detected at the center and both end points of the circular arc of the rounded corner portion.

【００５９】（付記９）入力画像の縮小画像から罫線の
位置を推定する罫線位置推定ステップと、入力画像から
抽出された罫線で四辺を囲まれた領域を抽出するステッ
プと、該領域が入れ子構造になっている場合に、該領域
の抽出を再帰的処理によって抽出するステップと、該再
帰的処理を、該領域の内部あるいは周辺にある文字の大
きさよりも、抽出される領域の方が小さくなった場合
に、再帰的処理を終了するステップと、高解像度の入力
画像を用いて、推定された罫線位置における罫線の画素
密度を罫線方向に投影し、該画素密度のピークの数によ
り、罫線が何本の線からなっているかを判断する罫線判
断ステップと、入力画像から抽出された罫線で四辺を囲
まれた領域の向かい合う頂点あるいは該頂点の周囲に斜
め方向成分が存在するか否かを判断するステップと、該
向かい合う頂点あるいはその周囲にある該斜め方向成分
が挟む領域に斜め線が存在するか否かを判断するステッ
プと、を備える罫線抽出方法を情報処理装置に実現させ
るプログラム。(Supplementary Note 9) A ruled line position estimating step of estimating the position of a ruled line from a reduced image of the input image, a step of extracting a region surrounded by four ruled lines extracted from the input image, and a nested structure of the region. , The step of extracting the region by the recursive process and the recursive process are performed so that the extracted region is smaller than the size of the character inside or around the region. In this case, the pixel density of the ruled line at the estimated ruled line position is projected in the ruled line direction using the step of ending the recursive process and the high-resolution input image, and the ruled line is determined by the number of peaks of the pixel density. There is a ruled line determination step for determining how many lines are formed, and there are diagonal components in the opposite vertices of the area surrounded by the ruled lines extracted from the input image or around the vertices. Implementing a ruled line extraction method in an information processing device, the method comprising: determining whether or not there is a diagonal line in a region sandwiched by the diagonal direction components in the opposite vertices or in the periphery thereof. Program to let.

【００６０】（付記１０）前記推定された罫線位置の周
囲の文字の大きさと、該罫線が複数の線からなっている
場合の線間の幅とを比較することにより、当該推定され
た罫線位置にある線は、複数の線からなる罫線か、別個
の線が並んだものかを判断するステップを更に備えるこ
とを特徴とする付記９に記載のプログラム。(Supplementary Note 10) The estimated ruled line position is compared by comparing the size of the character around the estimated ruled line position with the width between the lines when the ruled line is composed of a plurality of lines. 10. The program according to appendix 9, further comprising a step of determining whether the line in is a ruled line composed of a plurality of lines or a line in which separate lines are arranged.

【００６１】（付記１１）前記推定された罫線位置にお
ける罫線が、所定の長さ以上に渡って複数の線からなる
罫線であると判断された場合には、該罫線全体が該複数
の線からなる罫線であると判断するステップを更に備え
ることを特徴とする付記９に記載のプログラム。(Supplementary Note 11) When the ruled line at the estimated ruled line position is determined to be a ruled line composed of a plurality of lines over a predetermined length or more, the entire ruled line is separated from the plurality of lines. 10. The program according to note 9, further comprising a step of determining that the ruled line is

【００６２】（付記１２）前記斜め方向成分の存否判断
ステップにおいて、ある矩形領域を単位領域として、
前記頂点間を結ぶ直線上の、当該単位領域内の画素密度
を算出するステップと、該画素密度がある一定値以上で
あった時には、該頂点に斜め方向成分が存在すると判断
するステップと、を更に備えることを特徴とする付記９
に記載のプログラム。(Supplementary Note 12) In the presence / absence determining step of the diagonal component, a certain rectangular area is set as a unit area.
Calculating a pixel density in the unit area on a straight line connecting the vertices, and determining that an oblique component exists at the vertices when the pixel density is a certain value or more. Supplementary Note 9 characterized by further comprising
The program described in.

【００６３】（付記１３）前記頂点に挟まれる領域内の
斜め線の抽出は、推定される斜め線上の前記単位領域の
集合の中に斜め方向成分が含まれるか否かを判断するこ
とによって行われることを特徴とする付記１２に記載の
プログラム。(Supplementary note 13) The extraction of the diagonal line in the region sandwiched by the vertices is performed by determining whether or not the diagonal direction component is included in the set of the unit regions on the estimated diagonal line. 13. The program according to note 12, wherein the program is executed.

【００６４】（付記１４）前記頂点は、丸角部となって
おり、丸角部の円弧の中心、両端点について、斜め方向
成分の検出を行うことを特徴とする付記９に記載のプロ
グラム。(Supplementary Note 14) The program according to Supplementary Note 9, wherein the apex is a rounded corner portion, and diagonal direction components are detected at the center and both end points of the circular arc of the rounded corner portion.

【００６５】（付記１５）入力画像の縮小画像から罫線
の位置を推定する罫線位置推定ステップと、入力画像か
ら抽出された罫線で四辺を囲まれた領域を抽出するステ
ップと、該領域が入れ子構造になっている場合に、該領
域の抽出を再帰的処理によって抽出するステップと、該
再帰的処理を、該領域の内部あるいは周辺にある文字の
大きさよりも、抽出される領域の方が小さくなった場合
に、再帰的処理を終了するステップと、高解像度の入力
画像を用いて、推定された罫線位置における罫線の画素
密度を罫線方向に投影し、該画素密度のピークの数によ
り、罫線が何本の線からなっているかを判断する罫線判
断ステップと、入力画像から抽出された罫線で四辺を囲
まれた領域の向かい合う頂点あるいは該頂点の周囲に斜
め方向成分が存在するか否かを判断するステップと、該
向かい合う頂点あるいはその周囲にある該斜め方向成分
が挟む領域に斜め線が存在するか否かを判断するステッ
プと、を備える罫線抽出方法。(Supplementary Note 15) A ruled line position estimating step of estimating the position of the ruled line from the reduced image of the input image, a step of extracting a region surrounded by four ruled lines from the input image, and a nesting structure of the region. , The step of extracting the region by the recursive process and the recursive process are performed so that the extracted region is smaller than the size of the character inside or around the region. In this case, the pixel density of the ruled line at the estimated ruled line position is projected in the ruled line direction using the step of ending the recursive process and the high-resolution input image, and the ruled line is determined by the number of peaks of the pixel density. There is a ruled line judgment step for judging how many lines are formed, and a diagonal direction component exists at the opposite vertices of the area surrounded by the ruled lines extracted from the input image or around the vertices. A step of determining Luke, ruled line extraction method comprising the steps of: determining whether said facing apex or diagonal lines in areas which sandwich the the oblique direction component in the surrounding exists.

【００６６】（付記１６）前記推定された罫線位置の周
囲の文字の大きさと、該罫線が複数の線からなっている
場合の線間の幅とを比較することにより、当該推定され
た罫線位置にある線は、複数の線からなる罫線か、別個
の線が並んだものかを判断するステップを更に備えるこ
とを特徴とする付記１５に記載の罫線抽出方法。(Supplementary Note 16) The estimated ruled line position is compared by comparing the size of the character around the estimated ruled line position with the width between lines when the ruled line is composed of a plurality of lines. 16. The ruled line extracting method according to appendix 15, further comprising a step of determining whether the line in 1 is a ruled line composed of a plurality of lines or a line in which separate lines are arranged.

【００６７】（付記１７）前記推定された罫線位置にお
ける罫線が、所定の長さ以上に渡って複数の線からなる
罫線であると判断された場合には、該罫線全体が該複数
の線からなる罫線であると判断するステップを更に備え
ることを特徴とする付記１５に記載の罫線抽出方法。(Supplementary Note 17) When the ruled line at the estimated ruled line position is determined to be a ruled line composed of a plurality of lines over a predetermined length or more, the entire ruled line is separated from the plurality of lines. 15. The ruled line extracting method according to appendix 15, further comprising a step of determining that the ruled line is

【００６８】（付記１８）前記斜め方向成分の存否判断
ステップにおいて、ある矩形領域を単位領域として、
前記頂点間を結ぶ直線上の、当該単位領域内の画素密度
を算出するステップと、該画素密度がある一定値以上で
あった時には、該頂点に斜め方向成分が存在すると判断
するステップと、を更に備えることを特徴とする付記１
５に記載の罫線抽出方法。(Supplementary Note 18) In the presence / absence determining step of the oblique component, a certain rectangular area is set as a unit area.
Calculating a pixel density in the unit area on a straight line connecting the vertices, and determining that an oblique component exists at the vertices when the pixel density is a certain value or more. Additional Note 1 characterized by further comprising
The ruled line extraction method described in 5.

【００６９】（付記１９）前記頂点に挟まれる領域内の
斜め線の抽出は、推定される斜め線上の前記単位領域の
集合の中に斜め方向成分が含まれるか否かを判断するこ
とによって行われることを特徴とする付記１８に記載の
罫線抽出方法。(Supplementary Note 19) The extraction of the diagonal line in the region sandwiched by the vertices is performed by determining whether or not the diagonal direction component is included in the set of the unit regions on the estimated diagonal line. The ruled line extracting method according to appendix 18, wherein the ruled line extracting method is performed.

【００７０】（付記２０）前記頂点は、丸角部となって
おり、丸角部の円弧の中心、両端点について、斜め方向
成分の検出を行うことを特徴とする付記１５に記載の罫
線抽出方法。(Additional remark 20) The apex is a rounded corner portion, and the diagonal line component is detected at the center and both end points of the circular arc of the rounded corner portion. Method.

【００７１】（付記２１）入力画像の縮小画像から罫線
の位置を推定する罫線位置推定手段と、入力画像から抽
出された罫線で四辺を囲まれた領域を抽出する手段と、
該領域が入れ子構造になっている場合に、該領域の抽出
を再帰的処理によって抽出する手段と、該再帰的処理
を、該領域の内部あるいは周辺にある文字の大きさより
も、抽出される領域の方が小さくなった場合に、再帰的
処理を終了する手段と、高解像度の入力画像を用いて、
推定された罫線位置における罫線の画素密度を罫線方向
に投影し、該画素密度のピークの数により、罫線が何本
の線からなっているかを判断する罫線判断手段と、入力
画像から抽出された罫線で四辺を囲まれた領域の向かい
合う頂点あるいは該頂点の周囲に斜め方向成分が存在す
るか否かを判断する手段と、該向かい合う頂点あるいは
その周囲にある該斜め方向成分が挟む領域に斜め線が存
在するか否かを判断する手段と、を備える罫線抽出装
置。(Supplementary Note 21) Ruled line position estimating means for estimating the position of the ruled line from the reduced image of the input image, and means for extracting a region surrounded by four sides by the ruled line extracted from the input image,
When the area has a nested structure, means for extracting the area by recursive processing, and the area to be extracted by the recursive processing rather than the size of characters inside or around the area When is smaller than, using the means to end the recursive processing and the high-resolution input image,
The pixel density of the ruled line at the estimated ruled line position is projected in the ruled line direction, and the ruled line judgment means for judging how many lines the ruled line is made from is extracted from the input image based on the number of peaks of the pixel density. Means for determining whether or not there is an oblique direction component in the opposite vertices of the area surrounded by the ruled lines or around the vertices, and an oblique line in the area between the opposite vertices or the oblique direction component in the periphery And a means for determining whether or not there is a ruled line extraction device.

【００７２】（付記２２）前記推定された罫線位置の周
囲の文字の大きさと、該罫線が複数の線からなっている
場合の線間の幅とを比較することにより、当該推定され
た罫線位置にある線は、複数の線からなる罫線か、別個
の線が並んだものかを判断する手段を更に備えることを
特徴とする付記２１に記載の罫線抽出装置。(Supplementary Note 22) The estimated ruled line position is calculated by comparing the size of the character around the estimated ruled line position with the width between lines when the ruled line is composed of a plurality of lines. 22. The ruled line extracting apparatus according to appendix 21, further comprising means for determining whether the line in (1) is a ruled line composed of a plurality of lines or a line in which separate lines are lined up.

【００７３】（付記２３）前記推定された罫線位置にお
ける罫線が、所定の長さ以上に渡って複数の線からなる
罫線であると判断された場合には、該罫線全体が該複数
の線からなる罫線であると判断する手段を更に備えるこ
とを特徴とする付記２１に記載の罫線抽出装置。(Supplementary Note 23) When the ruled line at the estimated ruled line position is determined to be a ruled line composed of a plurality of lines over a predetermined length or more, the entire ruled line is separated from the plurality of lines. 22. The ruled line extracting apparatus according to appendix 21, further comprising means for determining that the ruled line is

【００７４】（付記２４）前記斜め方向成分の存否判断
の手段において、ある矩形領域を単位領域として、前記
頂点間を結ぶ直線上の、当該単位領域内の画素密度を算
出する手段と、該画素密度がある一定値以上であった時
には、該頂点に斜め方向成分が存在すると判断する手段
と、を更に備えることを特徴とする付記２１に記載の罫
線抽出装置。(Supplementary Note 24) In the means for determining the presence / absence of the diagonal component, a unit for calculating a pixel density in the unit area on a straight line connecting the vertices with a certain rectangular area as a unit area, and the pixel The ruled line extraction device according to appendix 21, further comprising: a unit that determines that a diagonal component exists at the vertex when the density is equal to or higher than a certain value.

【００７５】（付記２５）前記頂点に挟まれる領域内の
斜め線の抽出は、推定される斜め線上の前記単位領域の
集合の中に斜め方向成分が含まれるか否かを判断するこ
とによって行われることを特徴とする付記２４に記載の
罫線抽出装置。(Supplementary Note 25) The extraction of the diagonal line in the region sandwiched by the vertices is performed by judging whether or not the diagonal direction component is included in the set of the unit regions on the estimated diagonal line. The ruled line extracting device according to appendix 24, which is characterized in that:

【００７６】（付記２６）前記頂点は、丸角部となって
おり、丸角部の円弧の中心、両端点について、斜め方向
成分の検出を行うことを特徴とする付記２１に記載の罫
線抽出装置。(Supplementary note 26) The apex is a rounded corner portion, and the diagonal line components are detected at the center and both end points of the circular arc of the rounded corner portion. apparatus.

【００７７】[0077]

【発明の効果】本発明によれば、画像データとして取り
込まれた、表などを含む文書画像から、罫線を正確に抽
出し、文字データだけではなく、表のデータも正確に再
現し、情報処理装置で利用可能とすることができる。According to the present invention, a ruled line is accurately extracted from a document image including a table and the like, which is taken in as image data, and not only character data but also table data is accurately reproduced. Can be made available on the device.

[Brief description of drawings]

【図１】ラベリング処理を概略説明する図である。FIG. 1 is a diagram schematically illustrating a labeling process.

【図２】線分抽出処理を行った結果を説明する図であ
る。FIG. 2 is a diagram illustrating a result of performing a line segment extraction process.

【図３】直線抽出を説明する図である。FIG. 3 is a diagram illustrating straight line extraction.

【図４】ＯＲ間引き処理を説明する図である。FIG. 4 is a diagram illustrating OR thinning processing.

【図５】黒画素の投影処理について説明する図である。FIG. 5 is a diagram illustrating a black pixel projection process.

【図６】入れ子構造の罫線部分を抽出する方法を説明す
る図（その１）である。FIG. 6 is a diagram (part 1) explaining a method of extracting a ruled line portion of a nested structure.

【図７】入れ子構造の罫線部分を抽出する方法を説明す
る図（その２）である。FIG. 7 is a diagram (part 2) explaining a method of extracting a ruled line portion of a nested structure.

【図８】入れ子構造の別の抽出方法を説明する図であ
る。FIG. 8 is a diagram illustrating another extraction method of a nested structure.

【図９】斜め線の抽出処理を説明する図（その１）であ
る。FIG. 9 is a diagram (part 1) for explaining the oblique line extraction processing.

【図１０】斜め線の抽出処理を説明する図（その２）で
ある。FIG. 10 is a diagram (part 2) for explaining the oblique line extraction processing.

【図１１】斜め線の抽出処理を説明する図（その３）で
ある。FIG. 11 is a diagram (part 3) for explaining the oblique line extraction processing.

【図１２】本発明の実施形態に従った処理のフローチャ
ート（その１）である。FIG. 12 is a flowchart (part 1) of processing according to the embodiment of the present invention.

【図１３】本発明の実施形態に従った処理のフローチャ
ート（その２）である。FIG. 13 is a flowchart (part 2) of the process according to the embodiment of the present invention.

【図１４】本発明の実施形態に従った処理のフローチャ
ート（その３）である。FIG. 14 is a flowchart (No. 3) of processing according to the embodiment of the present invention.

【図１５】本発明の実施形態に従った処理のフローチャ
ート（その４）である。FIG. 15 is a flowchart (part 4) of the process according to the embodiment of the present invention.

【図１６】本発明の実施形態に従った処理のフローチャ
ート（その５）である。FIG. 16 is a flowchart (No. 5) of processing according to the embodiment of the present invention.

【図１７】本発明の実施形態に従った処理のフローチャ
ート（その６）である。FIG. 17 is a flowchart (No. 6) of processing according to the embodiment of the present invention.

【図１８】本発明の実施形態に従った処理のフローチャ
ート（その７）である。FIG. 18 is a flowchart (No. 7) of processing according to the embodiment of the present invention.

【図１９】本発明の実施形態に従った処理のフローチャ
ート（その８）である。FIG. 19 is a flowchart (No. 8) of processing according to the embodiment of the present invention.

【図２０】本発明の実施形態の方法をプログラムで実現
する場合に必要とされるハードウェア環境を説明する図
である。FIG. 20 is a diagram illustrating a hardware environment required when the method of the embodiment of the present invention is implemented by a program.

[Explanation of symbols]

１０バス１１ＣＰＵ１２ＲＯＭ１３ＲＡＭ１４通信インターフェース１５ネットワーク１６情報提供者１７記憶装置１８読み取り装置１９可搬記録媒体２０入出力装置 10 bus 11 CPU 12 ROM 13 RAM 14 Communication interface 15 network 16 Information provider 17 Storage 18 Reader 19 Portable recording medium 20 I / O device

───────────────────────────────────────────────────── フロントページの続き (72)発明者直井聡神奈川県川崎市中原区上小田中４丁目１番１号富士通株式会社内Ｆターム(参考） 5B029 BB02 CC27 EE12 EE16 EE18 5L096 AA07 EA03 EA04 FA03 FA04 FA10 FA12 FA13 FA16 FA18 FA32 FA36 FA52 FA62 FA64 FA66 FA67 FA69 GA10 GA15 GA23 GA34 GA51 GA55 ─────────────────────────────────────────────────── ─── Continued front page (72) Inventor Satoshi Naoi 4-1, Kamiodanaka, Nakahara-ku, Kawasaki-shi, Kanagawa No. 1 within Fujitsu Limited F term (reference) 5B029 BB02 CC27 EE12 EE16 EE18 5L096 AA07 EA03 EA04 FA03 FA04 FA10 FA12 FA13 FA16 FA18 FA32 FA36 FA52 FA62 FA64 FA66 FA67 FA69 GA10 GA15 GA23 GA34 GA51 GA55

Claims

[Claims]

1. A ruled line position estimating step of estimating the position of a ruled line from a reduced image of an input image, and using a high-resolution input image, projecting the pixel density of the ruled line at the estimated ruled line position in the ruled line direction, A program for causing an information processing apparatus to implement a ruled line extraction method, which comprises a ruled line determination step of determining how many lines a ruled line is composed of, based on the number of peaks of pixel density.

2. The estimated ruled line position is obtained by comparing the size of a character around the estimated ruled line position with the width between lines when the ruled line is composed of a plurality of lines. The program according to claim 1, further comprising a step of determining whether the line is a ruled line composed of a plurality of lines or a line in which separate lines are arranged.

3. The ruled line at the estimated ruled line position is
The method further comprising the step of determining that the entire ruled line is a ruled line including the plurality of lines when it is determined that the ruled line includes a plurality of lines over a predetermined length or more. The program according to item 1.

4. A step of extracting a region whose four sides are surrounded by ruled lines extracted from an input image, and a step of extracting the region by recursive processing when the region has a nested structure. , The recursive process is terminated when the region to be extracted is smaller than the size of the character inside or around the region, and the recursive process is terminated. A program to be implemented by the processing device.

5. A step of determining whether or not an oblique vertex component exists in the area surrounded by the ruled lines extracted from the input image on the four sides or around the vertex, and the facing vertex or the surrounding area of the vertex. A program for causing an information processing apparatus to realize a ruled line extraction method comprising: a step of determining whether or not a diagonal line exists in a region sandwiched by a certain diagonal component.

6. The step of determining the presence or absence of the diagonal direction component, the step of calculating the pixel density in the unit area on a straight line connecting the vertices with a certain rectangular area as the unit area, and the pixel density The program according to claim 5, further comprising: a step of determining that a diagonal component exists at the apex when the value is equal to or more than a certain value.

7. The extraction of diagonal lines in the region sandwiched by the vertices is performed by determining whether or not a diagonal component is included in the set of unit regions on the estimated diagonal line. The program according to claim 6, characterized in that.

8. A ruled line position estimating step of estimating a position of a ruled line from a reduced image of an input image, a step of extracting an area surrounded by four sides of the ruled line extracted from the input image, and the area having a nested structure. If the extracted area is smaller than the size of the character inside or around the area, the step of extracting the area by the recursive processing, and Then, the step of ending the recursive process is performed, the pixel density of the ruled line at the estimated ruled line position is projected in the ruled line direction using the high-resolution input image, and the number of ruled lines is determined according to the number of peaks of the pixel density. Ruled line determining step for determining whether or not there is a diagonal line component around the vertices of the area surrounded by the ruled lines extracted from the input image and surrounding the vertices. And determining, program for implementing the steps of determining whether the face vertices or diagonal lines in areas which sandwich the the oblique direction component in the surrounding exists, a ruled line extraction method comprising the information processing apparatus.

9. A ruled line position estimating step of estimating the position of a ruled line from a reduced image of an input image, a step of extracting a region surrounded by four sides by a ruled line extracted from the input image, and the region having a nested structure. If the extracted area is smaller than the size of the character inside or around the area, the step of extracting the area by the recursive processing, and Then, the step of ending the recursive process is performed, the pixel density of the ruled line at the estimated ruled line position is projected in the ruled line direction using the high-resolution input image, and the number of ruled lines is determined according to the number of peaks of the pixel density. Ruled line determining step for determining whether or not there is a diagonal line component around the vertices of the area surrounded by the ruled lines extracted from the input image and surrounding the vertices. Borders extraction method comprising the steps of determining, a step of determining whether the face vertices or diagonal lines in areas which sandwich the the oblique direction component in the surrounding exists, a.

10. A ruled line position estimating means for estimating a position of a ruled line from a reduced image of an input image, a means for extracting a region surrounded by four ruled lines extracted from the input image, and the region having a nested structure. When the extracted area is smaller than the size of the character inside or around the area, the means for extracting the area by the recursive processing In addition, the pixel density of the ruled line at the estimated ruled line position is projected in the ruled line direction by using the means for ending the recursive processing and the high-resolution input image, and the number of ruled lines is determined by the number of peaks of the pixel density. And a means for determining whether or not there is a diagonal component on the opposite vertices of the area surrounded by the ruled lines extracted from the input image or around the vertices. When, A ruled line extraction device comprising: a means for determining whether or not a diagonal line exists in the facing apex or in a region sandwiched by the diagonal direction components that surrounds the apex.