JPH07182459A

JPH07182459A - Table structure extracting device

Info

Publication number: JPH07182459A
Application number: JP5325054A
Authority: JP
Inventors: Shigetaka Ri; 榮隆李
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1993-12-22
Filing date: 1993-12-22
Publication date: 1995-07-21

Abstract

PURPOSE:To provide a table structure extracting device which can properly and accurately extract the ruled lines, the intersecting points, etc., out of the printed documents and also can omit the pre-processing for correction of the tilt. CONSTITUTION:A black pixel string detecting part 12 scans the compressed binary date and detects the horizontal and vertical black pixel strings. A segment extracting part 13 constructs the paths by the continuous horizontal vertical black pixel strings which are adjacent to each other and extracts the horizontal and vertical segments on both paths by a straight line deciding rule. A ruled line candidate constructing pert 21 successively detects the continuous horizontal and vertical segments adjacent to each other by a method that detects the next adjacent segments to the present segments based on the overlapping between both segments end the continuous black pixel strings adjacent to each other. Then the horizontal and vertical ruled line candidates are obtained from detected horizontal end vertical segments respectively. A table structure extracting part 31 checks the crossing between the rectangular intersecting point candidates to the vertical ruled line candidate on the basis of the horizontal ruled line candidate and vice versa. Thus the genuine intersecting points and the genuine horizontal end vertical ruled lines are acquired.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、印刷文書内の罫線及び
交差点等を抽出する表構造抽出装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a table structure extracting device for extracting ruled lines and intersections in a printed document.

【０００２】[0002]

【従来の技術】従来の表構造抽出技術は、例えば「複雑
な構造を持つ表の認識に関する基礎検討(A Study on Ta
ble Recognition with complex structure) 」（日本情
報処理学会第３７回（昭和６３年後期）全国大会論文）
に示されているものがある。図２に、この従来例のシス
テム構成を示す。本図において、１０は、入力画像の傾
き問題を解消するために、傾き補正アルゴリズムによっ
て入力された画像に対して傾きを検出し正規化する傾き
補正部である。２０は、周辺分布法(PrejectionProfil
e,画像のヒストグラムを罫線抽出の基準とする。ヒスト
グラムは主に水平ヒストグラムと垂直ヒストグラムに分
けられる。) によって用紙の傾きや斜線を排除する傾き
補正の後の画像に対して水平方向及び垂直方向の投影を
行ってヒストグラム（histogram,柱状図表）からあるし
きい値以上の高さを罫線候補とする罫線候補抽出部であ
る。３０は、罫線候補のうち、クローズ(CLOSE) 型の表
の場合には両端が外枠罫線と接するものを罫線として抽
出し、もし、外枠罫線の存在しないオープン(OPEN)型の
表の場合には、便宜的に外枠罫線を追加する罫線抽出部
である。４０は、抽出された罫線で画像を複数のブロッ
クに分割し、分割した各ブロック領域をまた罫線候補抽
出部２０へ戻って罫線抽出を行わせ、各ブロック領域に
罫線候補が無くなるまで再帰処理を行うブロック分割部
である。2. Description of the Related Art A conventional table structure extraction technique is, for example, "A Study on Tapping on Tables with Complex Structures".
ble Recognition with complex structure) "(Proceedings of the 37th National Information Processing Society of Japan (Late 1988) National Convention)
There is one shown in. FIG. 2 shows the system configuration of this conventional example. In the figure, 10 is a tilt correction unit that detects and normalizes the tilt of an image input by a tilt correction algorithm in order to solve the tilt problem of the input image. 20 is the marginal distribution method (Prejection Profile)
e, The histogram of the image is used as the reference for ruled line extraction. The histogram is mainly divided into a horizontal histogram and a vertical histogram. ) To eliminate the skew and skew of the paper, project the image after the skew correction in the horizontal direction and the vertical direction, and make the height above a certain threshold from the histogram (column chart) as a ruled line candidate. It is a ruled line candidate extraction unit. 30 is a ruled line candidate, in the case of a CLOSE type table, those whose both ends are in contact with the outer frame ruled line are extracted as ruled lines, and in the case of an OPEN type table with no outer frame ruled line. Is a ruled line extraction unit that adds an outer frame ruled line for convenience. Reference numeral 40 divides the image into a plurality of blocks by the extracted ruled lines, returns each of the divided block regions to the ruled line candidate extraction unit 20 to perform ruled line extraction, and performs recursive processing until there are no ruled line candidates in each block region. This is a block division unit.

【０００３】以上の構成を有することにより、印刷文書
内の表構造を抽出する。With the above configuration, the table structure in the print document is extracted.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、上記の
従来技術に係る装置においては、次のような問題点があ
る。第１の問題点：図３に示すように、周辺分布法によって
罫線候補を抽出する際には傾き補正（前処理）をしなけ
ればならない。そしてこの場合には、画素毎に正弦(si
n) 演算を行わなければならないので時間がかかる。な
お、ここに、画素毎の正弦演算とは、以下の演算をい
う。However, the above-mentioned conventional apparatus has the following problems. First problem: As shown in FIG. 3, when extracting ruled line candidates by the marginal distribution method, inclination correction (preprocessing) must be performed. In this case, the sine (si
n) It takes time because it needs to be calculated. Here, the sine calculation for each pixel means the following calculation.

【０００５】検出された傾き角度がＡである。（Ｘ，
Ｙ）毎にＸ’＝ＸｃｏｓＡ−ＹｓｉｎＡ，Ｙ’＝Ｘｓｉ
ｎＡ＋ＹｃｏｓＡをもとめ、（Ｘ，Ｙ）の画素値を
（Ｘ’，Ｙ’）に入れる。第２の問題点：繰り返して罫線で画像をブロックに分割
してヒストグラムからあるしきい値以上の高さを罫線候
補とする方式では、例えば図４（１）に示すような不正
規な表を正しく抽出するのは困難である。すなわち、矢
印１の指す水平ヒストグラムは矢印２の指す水平ヒスト
グラムより小さい。しかし、矢印１の部分には罫線があ
り、矢印２の部分にはない。図４（２）〜（６）は従来
例の技術によって該抽出する過程及びその限界から生じ
た誤結果を示すものである。The detected tilt angle is A. (X,
For each Y), X ′ = XcosA−YsinA, Y ′ = Xsi
Finding nA + YcosA, the pixel value of (X, Y) is put into (X ′, Y ′). Second problem: In a method in which an image is repeatedly divided into blocks by ruled lines and a height higher than a certain threshold is used as a ruled line candidate from a histogram, for example, an irregular table as shown in FIG. It is difficult to extract correctly. That is, the horizontal histogram indicated by arrow 1 is smaller than the horizontal histogram indicated by arrow 2. However, there is a ruled line in the part indicated by arrow 1, and there is no ruled line in the part indicated by arrow 2. FIGS. 4 (2) to 4 (6) show erroneous results caused by the extraction process and its limit by the conventional technique.

【０００６】第３の問題点：周辺分布法(Prejection Pr
ofile)によって罫線候補を抽出し、そのうち両端がブロ
ックの辺と接するものを罫線として抽出するのは罫線に
おけるセグメント（segment,区段）の有無及び実際的交
差状況を判断できない。例えば図５に示すような表の真
中の四角に垂直方向線がぬけている罫線の場合、概略の
抽出過程となりその結果としての誤認識となる。Third problem: marginal distribution method (Prejection Pr
It is not possible to determine the presence or absence of a segment in the ruled line and the actual crossing status by extracting ruled line candidates by (ofile) and extracting those with both ends contacting the sides of the block as ruled lines. For example, in the case of a ruled line in which a vertical direction line is missing in the center square of the table as shown in FIG. 5, a rough extraction process is performed, resulting in erroneous recognition.

【０００７】本発明は、上記問題点を解決し、正しい抽
出結果の得られる表構造抽出装置を提供する目的でなさ
れたものである。The present invention has been made for the purpose of solving the above-mentioned problems and providing a table structure extracting apparatus which can obtain a correct extraction result.

【０００８】[0008]

【課題を解決するための手段】上記目的を解決するため
に、請求項１の発明においては、黒画素連の重なり及び
隣接関係を用いて印刷文書における表構造を抽出する表
構造抽出装置において、入力された原始二値デジタルデ
ータに対してｎ＊ｎの正方形の内部におけるｎ＊ｎ画素
値を論理ＯＲの演算した結果を１画素とする画像圧縮部
と、圧縮された二値デジタルデータを水平方向及び垂直
方向に走査して水平黒画素連及び垂直黒画素連を検出す
る黒画素連検出部と、前記黒画素連検出部が検出した水
平及び垂直の黒画素連のうち連続隣接していると判断さ
れる垂直黒画素連、水平黒画素連からそれぞれ非交差点
たる経路(path)を構成し、この上で直線判別規則により
この構成した経路における水平線分及び垂直線分を抽出
する線分抽出部と、前記線分抽出部により抽出された水
平線分及び垂直線分に対して線分間における重なり及び
連続隣接している黒画素連によって次の隣接線分を検出
する方法を用いて順に連続隣接している水平線分及び垂
直線分を検出し、その上で該検出された水平線分、垂直
線分からそれぞれ水平罫線候補及び垂直罫線候補を構成
する罫線候補構成部と、前記罫線候補構成部により構成
された水平罫線候補及び垂直罫線候補に対して当該罫線
候補の前後における重なり及び連続隣接している黒画素
連で当該罫線から前後へ伸びる部分の有無を判定し、そ
して水平罫線候補を基準として垂直罫線候補に対して若
しくは垂直罫線候補を基準として水平罫線候補に対して
交差検査を行い真の水平罫線、垂直罫線及び交差点を検
出する表構造抽出部とを有していることを特徴としてい
る。In order to solve the above-mentioned problems, in the invention of claim 1, a table structure extracting device for extracting a table structure in a printed document by using the overlapping and adjacency relations of black pixel rows is provided. An image compression unit that sets the result of the logical OR of the n * n pixel values inside the n * n square to the input primitive binary digital data to 1 pixel, and the compressed binary digital data And a black pixel series detection unit that detects horizontal black pixel series and vertical black pixel series by scanning in the vertical and vertical directions, and is continuously adjacent to the horizontal and vertical black pixel series detected by the black pixel series detection unit. A line segment is constructed by constructing a path (path) that is a non-intersection point from the vertical black pixel series and the horizontal black pixel series that are determined to be each, and then extracting the horizontal line segment and the vertical line segment in this configured path by the straight line discrimination rule. Department, The horizontal line segment and the vertical line segment extracted by the line segment extraction unit are successively adjacent to each other by using the method of detecting the next adjacent line segment by overlapping and continuously adjoining black pixel lines in the line segment. A horizontal line segment and a vertical line segment are detected, and a ruled line candidate configuration unit that configures horizontal ruled line candidates and vertical ruled line candidates from the detected horizontal line segment and vertical line segment, respectively, and a horizontal line configured by the ruled line candidate configuration unit. For the ruled line candidate and the vertical ruled line candidate, it is determined whether or not the ruled line candidate overlaps before and after the ruled line and whether or not there is a portion that extends from the ruled line to the front and back in the consecutive black pixel series. On the other hand, it has a table structure extraction unit that detects a true horizontal ruled line, a vertical ruled line, and an intersection by performing an intersection inspection on a horizontal ruled line candidate with reference to the vertical ruled line candidate. It is characterized in.

【０００９】請求項２の発明においては、上記線分抽出
部は連続隣接している垂直黒画素連、水平黒画素連から
それぞれ経路を構成し、この上で該経路における黒画素
連毎に順に開始点座標値及び終点座標値を検査する直線
判別規則により、当該経路における水平線分と垂直線分
を抽出するものであることを特徴としている。請求項３
の発明においては、上記罫線候補抽出部は水平線分及び
垂直線分に対して線分間の重なり及び連続隣接している
黒画素連によって次の隣接線分を検出する方法並びに隣
接している線分との幅の比較によって文字と罫線の接触
している状況を処理する方法を用いて順に連続隣接して
いる水平線分、垂直線分を検出し、その結果検出された
水平線分、垂直線分から水平罫線候補、垂直罫線候補を
構成するものであることを特徴としている。According to the second aspect of the invention, the line segment extracting section forms a path from each of a series of vertically adjoining vertical black pixels and a series of horizontal black pixels, and the black pixel series on the path are sequentially arranged on this path. The feature is that the horizontal line segment and the vertical line segment on the route are extracted by a straight line discriminating rule for inspecting the start point coordinate value and the end point coordinate value. Claim 3
In the invention, the ruled line candidate extraction unit detects the next adjacent line segment by overlapping the horizontal line segment and the vertical line segment and the black pixel string that is continuously adjacent to the horizontal line segment and the vertical line segment, and the adjacent line segment. A horizontal line segment and a vertical line segment that are consecutively adjacent to each other in order are detected by using the method of processing the situation where characters and ruled lines are in contact with each other by comparing the widths of It is characterized in that it constitutes ruled line candidates and vertical ruled line candidates.

【００１０】請求項４の発明においては、上記表構造抽
出部は水平罫線候補及び垂直罫線候補に対して当該罫線
候補の前後における重なり及び連続隣接している黒画素
連で当該罫線から前後へ伸びる矩形交差点候補及び当該
罫線候補を構成する線分間の矩形交差点候補を求め、そ
の上で水平罫線候補を基準にして垂直罫線候補に対して
若しくは垂直罫線候補に対して水平罫線候補に対して矩
形交差点候補間の交差検出を行い真の水平罫線、垂直罫
線及び交差点を検出するものであることを特徴としてい
る。According to another aspect of the present invention, the table structure extraction unit extends frontward and rearward from the ruled line by overlapping the front and rear of the ruled line candidate with respect to the horizontal ruled line candidate and the vertical ruled line candidate and continuously adjoining black pixel lines. Rectangle intersection candidates and rectangle intersection candidates for the line segments that make up the ruled line candidates are obtained, and then, based on the horizontal ruled line candidates, vertical ruled line candidates or vertical ruled line candidates, horizontal ruled line candidates, rectangular crossing points It is characterized in that it detects true horizontal ruled lines, vertical ruled lines and intersections by detecting intersections between candidates.

【００１１】[0011]

【作用】上記構成により、請求項１の発明においては、
単一若しくは複数の黒画素の連続からなる黒画素連の重
なり及び隣接関係を用いて印刷文書における表構造を抽
出する表構造抽出装置において、以下の作用がなされ
る。画像圧縮部が、入力された原始二値デジタルデータ
に対してｎ＊ｎ（ここに、＊はＸ（エックス）との誤読
防止のため×（かける）に換えて使用するものであり、
掛け算を表す。また、ｎは２、３、４…等の整数値であ
る）の正方形の内部におけるｎ＊ｎ画素値を論理ＯＲの
演算した結果を１画素とする。（このため、ただ一の画
素が１であれば、当該ｎ＊ｎのデータは１とされる。）
黒画素連検出部が、圧縮された二値デジタルデータを水
平方向及び垂直方向に走査して水平黒画素連及び垂直黒
画素連を検出する。線分抽出部が、連続隣接している垂
直黒画素連、水平黒画素連からそれぞれ経路(path)を構
成しこれにより交差点である部分をまず除去し、直線判
別規則によりこの構成した経路における水平線分及び垂
直線分を抽出する。罫線候補構成部が、該抽出された水
平線分及び垂直線分に対して線分間における重なり及び
連続隣接している黒画素連によって次の隣接部分を検出
する方法を用いて順に連続隣接している水平線分と垂直
線分を検出する。この上で、この検出した水平線分、垂
直線分各々から水平罫線候補（最終的には、表を形成し
ない線を判断することもあるため候補）と垂直罫線候補
を構成する。表構造抽出部が、この構成された水平罫線
候補及び垂直罫線候補に対して当該罫線候補の前後にお
ける重なり及び連続隣接している黒画素連で当該罫線か
ら前後へ伸びる部分の有無を判定する。そして表構造は
横線と縦線が交差することにより形成されるため水平罫
線候補で以て垂直罫線候補に対して若しくは垂直罫線候
補で以て水平罫線候補に対して交差検査を行い、真の水
平罫線、垂直罫線及び交差点を検出する。With the above structure, in the invention of claim 1,
The following operation is performed in a table structure extraction device that extracts a table structure in a printed document by using the overlapping and adjacency relationships of a series of black pixels each including a single or a plurality of black pixels. The image compression unit uses n * n (here, * is replaced with X (multiply) to prevent misreading with X (X) for the input binary digital data,
Represents multiplication. Further, n is an integer value of 2, 3, 4, ..., Etc.), and the result of the logical OR operation of the n * n pixel values inside the square is defined as one pixel. (For this reason, if only one pixel is 1, the n * n data is 1.)
The black pixel series detection unit scans the compressed binary digital data in the horizontal and vertical directions to detect horizontal black pixel series and vertical black pixel series. The line segment extraction unit constructs a path (path) from each of a series of vertically adjoining vertical black pixels and a series of horizontal black pixels, so that the portion that is an intersection is first removed, and the horizontal line in this constructed path is determined by the straight line discrimination rule. Extract minutes and vertical line segments. The ruled line candidate constituent parts are successively adjacent to the extracted horizontal line segment and vertical line segment in order by using a method of detecting the next adjacent part by overlapping and continuous adjacent black pixel series in the line segment. Detects horizontal and vertical line segments. Then, a horizontal ruled line candidate (finally, a line that does not form a table may be determined in some cases) and a vertical ruled line candidate are constructed from each of the detected horizontal and vertical line segments. The table structure extraction unit determines whether or not the configured horizontal ruled line candidate and the vertical ruled line candidate have overlaps before and after the ruled line candidate and a portion extending continuously from the ruled line in the consecutive black pixel series. Since the table structure is formed by the crossing of horizontal and vertical lines, the intersection check is performed on the vertical ruled line candidate with the horizontal ruled line candidate or with the horizontal ruled line candidate with the vertical ruled line candidate, and the true horizontal Detects ruled lines, vertical ruled lines and intersections.

【００１２】請求項２の発明においては、線分抽出部
が、連続隣接している垂直黒画素連、水平黒画素連から
それぞれ経路を構成する。そして経路における黒画素連
毎に、順に開始点座標及び終点座標値を検査する直線判
別規則により当該経路上における水平線分と垂直線分を
抽出する。請求項３の発明においては、上記罫線候補抽
出部が、水平線分及び垂直線分に対して線分間の重なり
及び連続隣接している黒画素連によって次の隣接線分を
検出する方法並びに隣接している線分との幅比較によっ
て文字と罫線の接触している状況を処理する方法を用い
て順に連続隣接している水平線分と垂直線分を検出し、
この検出された水平線分、垂直線分からそれぞれ水平罫
線候補と垂直罫線候補を構成する。According to the second aspect of the present invention, the line segment extracting unit forms a path from each of a series of vertically adjoining vertical black pixels and a series of horizontally adjoining black pixels. Then, a horizontal line segment and a vertical line segment on the route are extracted by a straight line discriminating rule that sequentially inspects the start point coordinate and the end point coordinate value for each black pixel string on the route. According to a third aspect of the present invention, the ruled line candidate extraction unit detects the next adjacent line segment by the overlapping of the line segments with respect to the horizontal line segment and the vertical line segment and the black pixel string that is continuously adjacent, and the adjacent line segment. The horizontal line segment and the vertical line segment that are consecutively adjacent to each other in order are detected using the method of processing the situation where the character and the ruled line are in contact by comparing the width with the line segment
A horizontal ruled line candidate and a vertical ruled line candidate are configured from the detected horizontal line segment and vertical line segment, respectively.

【００１３】請求項４の発明においては、表構造抽出部
が、構成された水平罫線候補及び垂直罫線候補に対して
当該罫線候補の前後における重なり及び連続隣接してい
る黒画素連で当該罫線から前後への伸びる矩形交差点候
補及び当該罫線候補を構成する線分間の矩形交差点候補
を求め、その上で水平罫線候補を基準にして垂直罫線候
補に対して若しくは垂直罫線候補を基準にして水平罫線
候補に対して矩形交差点候補間の交差点の検査を行い真
の水平罫線、垂直罫線及び交差点を検出する。According to another aspect of the invention, the table structure extracting unit extracts the horizontal ruled line candidate and the vertical ruled line candidate from the ruled line by overlapping the consecutive ruled line candidates before and after the ruled line candidate and continuously adjoining the black pixel string. A rectangular intersection candidate that extends back and forth and a rectangular intersection candidate of the line segments that form the ruled line candidate are obtained, and then a horizontal ruled line candidate is used as a reference for a vertical ruled line candidate or a vertical ruled line candidate is used as a reference. Then, the intersections between the rectangular intersection candidates are inspected to detect true horizontal ruled lines, vertical ruled lines, and intersections.

【００１４】[0014]

【実施例】図１は、本発明に係る表構造抽出装置の一実
施例のシステム構成図である。さて、本装置での表構造
の抽出に先立って文書、すなわち、原始文書データはス
キャナによって二値デジタルデータに変換されて処理さ
れる。また、領域識別処理によって図表部分である二値
デジタルデータを切り出して本装置のバッファメモリ
（図示せず）に格納する。以下、本図をもとに各部の構
成、作用を説明する。本図において、１１は、入力され
た原始二値デジタルデータをいくつかの４＊４画素のあ
る正方形領域に分割し、そして圧縮する画像圧縮部であ
る。この分割した各正方形領域内の４＊４＝１６個の画
素値を論理ＯＲの演算を行い、その結果を１画素とす
る。１２は、圧縮された画像を水平方向及び垂直方向に
走査して水平黒画素連及び垂直黒画素連を検出する黒画
素連検出部である。１３は、連続隣接している垂直黒画
素連、水平黒画素連からそれぞれ経路を構成して後、直
線判別規則により該経路における水平線分及び垂直線分
を抽出する線分抽出部である。なお、抽出原理は後に図
７、図８を用いて説明する。２１は、上記線分抽出部に
て抽出された水平線分、垂直線分に対して線分間におけ
る重なり及び連続隣接している黒画素連をたどることに
よって次の隣接線分を検出する方法を用いて順に連続隣
接している水平線分、垂直線分を検出していき、最後に
この検出された水平線分、垂直線分から水平罫線候補と
垂直罫線候補を構成する罫線候補構成部である。なお、
その処理流れ及び処理例は後に図９、図１０を用いて説
明する。３１は、この構成された水平罫線候補、垂直罫
線候補に対して当該罫線候補の前後における重なり及び
連続隣接している黒画素連で、当該罫線から前後への伸
びる部分を定義し、水平罫線候補で以て垂直罫線候補に
対して若しくは垂直罫線候補で以て水平罫線候補に対し
て交差検査を行い、真の水平罫線、垂直罫線、交差点を
検出する表構造抽出部である。その処理流れ及び処理例
は後に図１１を用いて説明する。1 is a system configuration diagram of an embodiment of a table structure extracting apparatus according to the present invention. A document, that is, original document data, is converted into binary digital data by a scanner and processed before the table structure is extracted by this apparatus. Further, the binary digital data which is the chart portion is cut out by the area identification processing and stored in the buffer memory (not shown) of the present apparatus. Hereinafter, the configuration and operation of each part will be described based on this drawing. In the figure, reference numeral 11 denotes an image compression unit which divides the inputted original binary digital data into a square region having some 4 * 4 pixels and compresses the divided region. A logical OR operation is performed on 4 * 4 = 16 pixel values in each of the divided square areas, and the result is set to one pixel. Reference numeral 12 denotes a black pixel series detection unit that scans the compressed image in the horizontal and vertical directions to detect the horizontal black pixel series and the vertical black pixel series. Reference numeral 13 denotes a line segment extraction unit that forms a path from each of a series of vertically adjoining vertical black pixels and a series of horizontal black pixels, and then extracts a horizontal line segment and a vertical line segment on the path according to a straight line determination rule. The extraction principle will be described later with reference to FIGS. 7 and 8. 21 uses a method of detecting the next adjacent line segment by tracing the horizontal line segment and the vertical line segment extracted by the line segment extraction unit and the consecutive black pixel series that overlap each other in the line segment. The horizontal line segment and the vertical line segment which are successively adjacent to each other are sequentially detected, and finally, a ruled line candidate configuration unit that forms a horizontal ruled line candidate and a vertical ruled line candidate from the detected horizontal line segment and vertical line segment. In addition,
The processing flow and processing example will be described later with reference to FIGS. 9 and 10. Reference numeral 31 denotes a black ruled line that continuously overlaps and is adjacent to the configured horizontal ruled line candidate and vertical ruled line candidate before and after the ruled line candidate, and defines a portion extending from the ruled line to the front and back. Is a table structure extraction unit that performs true cross rule inspection on vertical ruled line candidates or horizontal ruled line candidates on vertical ruled line candidates to detect true horizontal ruled lines, vertical ruled lines, and intersections. The processing flow and processing example will be described later with reference to FIG.

【００１５】以下、順に、上記各部の作用を具体的、よ
り詳細に説明する。まず、黒画素連検出部１２の作用で
あるが、一例として、図６（１）に示す圧縮後の画像を
対象として走査する場合を例にとって説明する。この場
合、水平方向及び垂直方向に走査して図６（２）に示さ
れている水平黒画素連及び図６（３）に示されている垂
直黒画素連を検出する。そのうち、（ｙ；ｘ，ｘ’）は
水平走査線ｙにおける開始点ｘから終点ｘ’までの黒画
素連より構成された水平黒画素連を示すものであり、
（ｘ，ｘ’；ｙ）は垂直走査線ＹのところにＸ〜Ｘ’の
連続黒画素がある垂直黒画素連を示すものである。以
下、説明を簡単にするために、主として水平黒画素連を
とりあげて説明する。なお、垂直黒画素連に関する処理
も水平黒画素連の処理と原則的には同じである。In the following, the operation of each of the above parts will be specifically and more specifically described in order. First, with respect to the operation of the black pixel consecutive detection unit 12, as an example, a case of scanning the compressed image shown in FIG. 6A will be described as an example. In this case, scanning is performed in the horizontal direction and the vertical direction to detect the horizontal black pixel string shown in FIG. 6B and the vertical black pixel string shown in FIG. 6C. Among them, (y; x, x ′) indicates a horizontal black pixel string constituted by a black pixel string from the start point x to the end point x ′ on the horizontal scanning line y,
(X, x ′; y) indicates a vertical black pixel string having continuous black pixels X to X ′ at the vertical scanning line Y. Hereinafter, in order to simplify the description, the horizontal black pixel series will be mainly described. The processing for the vertical black pixel series is basically the same as the processing for the horizontal black pixel series.

【００１６】線分抽出部１３は、連続隣接している垂直
黒画素連、水平黒画素連からそれぞれ経路を構成し、直
線判別規則により該経路における水平線分及び垂直線分
を抽出するものである。まず、以下の（一）〜（三）の
規則あるいは手順により、連続隣接している垂直黒画素
連、水平黒画素連から、それぞれ経路を構成する。規則（一）Ｒ₁＝（ｙ₁；ｘ₁，ｘ₁’）とＲ₂＝（ｙ₂；ｘ₂，
ｘ₂’）に対して、もし、ｘ₁＜ｘ₂’且つｘ₁’＞ｘ
₂ならば、Ｒ₁とＲ₂が重なると定義する。図７にその
一例を示す。The line segment extraction unit 13 forms a path from each of a series of vertically adjoining vertical black pixels and a series of horizontal black pixels, and extracts a horizontal line segment and a vertical line segment on the path according to a straight line discrimination rule. . First, according to the following rules or procedures (1) to (3), a path is formed from a series of vertically adjoining vertical black pixels and a series of horizontally adjoining black pixels. Rule (1) R ₁ = (y ₁ ; x ₁ , x ₁ ') and R ₂ = (y ₂ ; x ₂ ,
x ₂ '), if x ₁ <x ₂ ' and x ₁ '> x
If _2, it is defined that R ₁ and R ₂ overlap. FIG. 7 shows an example thereof.

【００１７】規則（二）Ｒ₁＝（ｙ₁；ｘ₁，ｘ₁’）とＲ₂＝（ｙ₂；ｘ₂，
ｘ₂’）に対して、もし、ｙ₂＝ｙ₁＋₁且つＲ₁とＲ
₂が重なるならば、Ｒ₁とＲ₂が隣り合うと判断る。規則（三）連続隣接（多数の相隣合うＲ_i，Ｒ_i+1からなるためか
かるごとく言う）しているＲ₁，Ｒ₂，…Ｒ_nに対し
て、もし各Ｒ_i毎にその前後の走査線が多くても一つの
画素連と隣接しているならば、Ｒ₁，Ｒ₂，…Ｒ_nから
一つの経路を構成していると定義する。図８に、この例
として連続隣接している黒画素連から構成される二つの
経路を示す。Rule (2) R ₁ = (y ₁ ; x ₁ , x ₁ ') and R ₂ = (y ₂ ; x ₂ ,
against x ₂ '), if, y ₂ = y _₁ + ₁ and R ₁ and R
_{If 2} overlaps, it is determined that R ₁ and R ₂ are adjacent to each other. Rule (3) For R ₁ , R ₂ , ... R _n that are continuously adjacent (this is called because it consists of a large number of adjacent R _i and R _{i + 1} ), if before and after each R _i If at least one scanning line is adjacent to one pixel series, it is defined that one path is composed of R ₁ , R ₂ , ... R _n . FIG. 8 shows, as an example of this, two paths constituted by a series of black pixels that are continuously adjacent to each other.

【００１８】続いて、直線判別規則に従って、該経路に
おけるまっすぐな線分を抽出する。ここで用いる直線判
別規則は以下の規則（四）であり、各経路における各黒
画素連毎に順にその開始点座標値及び終点座標値が検査
される。規則（四）経路における一部分であるＲ₁，Ｒ₂，…Ｒ_nに対し
て、もし以下の条件が満たされた場合には、Ｒ₁，
Ｒ₂，…Ｒ_nが、まっすぐな線分を構成すると判断す
る。（１）相隣り合う黒画素連の間に、｜ｘ_i−ｘ_i+1｜＜
１、且つ｜ｘ_i'−ｘ_i+1'｜＜１が成立する。（２）順にK 個の黒画素連を一つのセグメントとして、
セグメント毎に、｜ｘ^a（^aはｘ上の横線を表す。以降
も同じ）−ｘ_j１│＜１且つ｜ｘ^a’−ｘ_j'｜＜であ
る。そのうち、Ｒ^aはあるセグメントの中におる第一番
目の黒画素連であり（（ｙ^a；ｘ^a，ｘ^a’）で表され
る。）、Ｒ_jはＲ^aのほか、当該セグメントにある黒画
素連である。（３）長さがあるしきい値より大きい（４）幅を長さで割れば、得られた値があるしきい値よ
り小さい。Then, a straight line segment on the route is extracted according to the straight line discrimination rule. The straight line discriminating rule used here is the following rule (4), and the start point coordinate value and the end point coordinate value are inspected for each black pixel string in each path in order. Rule (4) For R ₁ , R ₂ , ... R _n , which is a part of the path, if the following conditions are satisfied, R ₁ ,
It is determined that R ₂ , ... R _n form a straight line segment. (1) | x _i −x _{i + 1} | <between adjacent black pixel strings
1 and | x _{i ′} −x _{i + 1 ′} | <1 holds. (2) In sequence, K black pixel rows are treated as one segment,
For each segment, | x ^{^a (a} represents the horizontal line on the x and later also the same.) -X _j 1│ <1 and | x ^a '-x _j' | is <. Among them, R ^a is a first-th black pixel run that dwell in a certain segment ((y ^a;. Which x ^a, is represented by x ^a ')), R _j Other R ^a, in the segment It is a black pixel string. (3) The length is larger than a certain threshold value. (4) When the width is divided by the length, the obtained value is smaller than a certain threshold value.

【００１９】なお、ここに上記条件（１）と条件（２）
は、当該線分は端点が位置相当であり、一定長さである
黒画素連から構成されることを表わす。また、印刷にお
ける枠の実情を考慮の上、当該線分が曲がっていない場
合には、些細な傾きは罫線を構成しているものと判断さ
れる。例えば、Ｋ＝１４の場合、ｔａｎ^-1（１／１４）
（即ち、4 度内の傾き）の傾きであれば、罫線の可能性
ありとされる。The above condition (1) and condition (2) are given here.
Indicates that the line segment has end points corresponding to positions and is composed of a series of black pixels having a constant length. Further, in consideration of the actual condition of the frame in printing, when the line segment is not curved, it is determined that the trivial inclination constitutes a ruled line. For example, when K = 14, tan ^-1 (1/14)
If the inclination is (that is, the inclination is within 4 degrees), the ruled line may be present.

【００２０】条件（３）と条件（４）は、当該線分があ
る範囲内で細長性を持っていることを表わしたものであ
る。即ち、Ｒ₁，Ｒ₂，…Ｒ_nの長さはｎであり、幅は
一番前の黒画素連の長さ( 即ち、ｘ₁’−ｘ₁＋１）で
ある。図８を例にとるならば、右側の経路から一本の垂
直線分が抽出され、左側の経路から二本の垂直線分が抽
出されることとなる。The conditions (3) and (4) represent that the line segment has slenderness within a certain range. That is, the length of R ₁ , R ₂ , ... R _n is n, and the width is the length of the frontmost black pixel string (that is, x ₁ ′ −x ₁ +1). Taking FIG. 8 as an example, one vertical line segment is extracted from the right route, and two vertical line segments are extracted from the left route.

【００２１】罫線候補構成部２１は、上記水平線分、垂
直線分に対して線分間における重なり及び連続隣接して
いる黒画素連によって次の隣接線分を検出する方法を用
いて順に連続隣接している水平線分及び垂直線分を検出
し、この上で検出された水平線分、垂直線分各々から水
平罫線候補と垂直罫線候補を構成する。その際の動作流
れを図９に示す。以下、図９をもとに詳細に説明する。（Ｓ１）″検査していない″線分がないならば、動作は
終了する。あれば（Ｓ２）へ行く。（Ｓ２）″検査していない″線分の中から一番前の線分
の第一番目の黒画素連のｙ値を開始の線分とする。（Ｓ３）線分の間の重なり及び連続隣接している黒画素
連によって次の隣接線分を検出する。その際の規則、手
順を以下に示す。The ruled line candidate construction unit 21 successively and sequentially adjoins the horizontal line segment and the vertical line segment by using the method of detecting the next adjacent line segment by the overlap in the line segment and the consecutive black pixel sequence. The horizontal line segment and the vertical line segment that are present are detected, and the horizontal ruled line candidate and the vertical ruled line candidate are configured from the horizontal line segment and the vertical line segment detected above. The operation flow in that case is shown in FIG. Hereinafter, a detailed description will be given with reference to FIG. (S1) If there is no "not inspected" line segment, the operation ends. If there is, go to (S2). (S2) The y value of the first black pixel string of the frontmost line segment from the "non-inspected" line segments is set as the starting line segment. (S3) The next adjacent line segment is detected by the overlap between the line segments and the continuous black pixel series. The rules and procedures in that case are shown below.

【００２２】規則（五）線分Ｌ₁とＬ₂において、Ｌ₁を構成する最後の黒画素
連はＲ_lastであり、Ｌ ₂を構成する一番前の黒画素連を
Ｒ_firstとする。以下の条件が満たされた場合には線分
Ｌ₂はＬ₁の隣接線分であると判断する。（１）ｙ_last＜ｙ_first且つＲ_lastとＲ_firstが重なる（２）Ｌ₂は条件（１）を満たした中でｙ_firstが一番
小さいものである（３）Ｒ_lastから、Ｒ₁，Ｒ₂，…Ｒ_nを経てｙ_first
と連続隣接している。且つ、Ｒ₁，Ｒ₂，…Ｒ_nが全て
Ｒ_lastと重なる（４）Ｌ₂は他の線分の隣接線分ではない図１０は、以上の規則を垂直線分に応用した場合の一例
を示すものである。本図において（１）は、Ｒ₂とＲ
_lastが重ならないためＬ₂とＬ₁が隣接線分でないとさ
れる場合である。同じく（２）は、Ｌ₂がＬ₁の隣接線
分であるとされる場合である。（Ｓ４）検出しない場合、各Ｌ₁，Ｌ₂，…Ｌ_nに対し
て順次罫線候補の判断を行う。（Ｓ５）隣接している線分を検出した場合には、文字と
罫線が接触している線分を判別処理するため、当該接触
部よりも線分の方が幅が広いことに着目して当該線分の
幅と隣接線分の幅とを比較する。なお、ここに線分の幅
は当該線分を構成する一番前の黒画素連の長さである。（Ｓ６）この二本の線分の幅の差が小さければ、当該線
分を隣接線分として記録する。（Ｓ７）当該隣接線分の幅と開始線分の幅が同じか否か
を検査してから、幅が同じでない場合にはその旨の印を
付して（Ｓ３）へ戻り、次の隣接している線分を検出す
る。（Ｓ８）当該線分は隣接線分より幅が小さい場合には、
当該隣接線分は文字と罫線が接触している線分と認めら
れるので、当該線分を検査印を付した上（Ｓ３）へ戻っ
て検出し直す。（Ｓ９）当該線分は隣接線分より幅が大きい場合には、
当該線分は文字と罫線が接触している線分と認められる
ため、″非罫線″である旨を記録する。そして、（Ｓ
３）へ戻って当該記録されている″非罫線″線分の隣接
線分を検出する。（Ｓ１０）もし、（Ｓ３）で隣接している線分を検出で
きない場合、以下の規則（六）により、記録されている
連続隣接している線分から罫線候補を構成できるか否か
を判断する。Rule (5) Line segment L₁And L₂At L₁The last black pixel that makes up
R is R_lastAnd L ₂The first black pixel array that makes up
R_firstAnd A line segment if the following conditions are met:
L₂Is L₁It is determined that the line segment is adjacent to. (1) y_last<Y_firstAnd R_lastAnd R_firstOverlap (2) L₂Is y while satisfying condition (1)_firstIs the best
It is small (3) R_lastFrom R₁, R₂, ... R_nThrough y_first
And are adjacent to each other. And R₁, R₂, ... R_nIs all
R_lastOverlap with (4) L₂Is not an adjacent line segment to another line segment. FIG. 10 shows an example of applying the above rule to a vertical line segment.
Is shown. In this figure, (1) is R₂And R
_lastL do not overlap₂And L₁Is not an adjacent line segment
That is the case. Similarly (2) is L₂Is L₁Adjacent line
This is the case when it is considered to be minutes. (S4) When not detected, each L₁, L₂, ... L_nAgainst
Then, the ruled line candidates are sequentially determined. (S5) When adjacent line segments are detected, the
Since the line segment that the ruled line touches is processed,
Paying attention to the fact that the line segment is wider than the
Compare the width with the width of the adjacent line segment. The width of the line segment here
Is the length of the frontmost black pixel string forming the line segment. (S6) If the difference between the widths of these two line segments is small, the line
Record the minutes as adjacent line segments. (S7) Whether or not the width of the adjacent line segment is the same as the width of the start line segment
After inspecting, if the widths are not the same, mark that
Attach it and return to (S3) to detect the next adjacent line segment.
It (S8) If the line segment is smaller in width than the adjacent line segment,
The adjacent line segment is recognized as the line segment where the character and the ruled line are in contact.
Check the line segment and return to (S3)
And detect again. (S9) If the line segment is wider than the adjacent line segment,
The line segment is recognized as a line segment in which characters and ruled lines are in contact.
Therefore, the fact that it is a "non-ruled line" is recorded. And (S
Returning to 3), the adjacent "non-ruled line" line segment is recorded.
Detect line segment. (S10) If the adjacent line segments can be detected in (S3)
If not, it is recorded according to the following rules (6)
Whether ruled line candidates can be constructed from line segments that are consecutively adjacent
To judge.

【００２３】規則（六）連続隣接している線分Ｌ₁，Ｌ₂，…Ｌ_nに対して、も
し、以下の条件が満たされた場合には、これらＬ₁，Ｌ
₂，…Ｌ_nから罫線候補が構成されるものと判断する。（１）″幅が同じではない″或いは″非罫線″という印
が付されていない（２）長さがあるしきい値より大きい（３）幅を長さで割れば、得られた値があるしきい値よ
り小さい規則（六）において、条件（２）と条件（３）は、当該
罫線候補は細長性を持っていることを表わす。ここに、
Ｌ₁からＬ_nでの長さは、Ｌ_nを構成する最後の黒画素
連のｙの値からＬ₁を構成する一番前の黒画素連のｙの
値を引いて得られた差の値に１を足すものとする。ま
た、Ｌ₁からＬ_nでの罫線候補の幅は、Ｌ ₁を構成する
一番前の黒画素連の長さである。Rule (6) Line segments L that are continuously adjacent₁, L₂, ... L_nAgainst
However, if the following conditions are met, these L₁, L
₂, ... L_nIt is determined that the ruled line candidate is constructed from the. (1) Marks that "widths are not the same" or "non-ruled lines"
(2) The length is larger than a certain threshold (3) If the width is divided by the length, the obtained value is a certain threshold.
In Rule (6), condition (2) and condition (3) are
Ruled line candidates indicate that they have slenderness. here,
L₁To L_nThe length at is L_nThe last black pixel that makes up
L from the value of y₁The y of the front black pixel string that composes
One shall be added to the difference value obtained by subtracting the value. Well
L₁To L_nThe width of the ruled line candidate in is L ₁Make up
It is the length of the frontmost black pixel string.

【００２４】図１において、表構造抽出部３１は上記の
水平罫線候補、垂直罫線候補に対して当該罫線候補の前
後における重なり及び連続隣接している黒画素連をもと
に当該罫線から前後への伸びる矩形交差点候補及び当該
罫線候補を構成する線分間の矩形交差点候補を検出し
（定義し）、この上で水平罫線候補で以て垂直罫線候補
に対して若しくは垂直罫線候補で以て水平罫線候補に対
して矩形交差点候補間の交差検査を行い、真の水平罫
線、垂直罫線及び交差点を検出することにより、罫線を
確認する。In FIG. 1, the table structure extraction unit 31 moves forward and backward from the ruled line on the basis of the overlap of the ruled line candidate with respect to the horizontal ruled line candidate and the vertical ruled line candidate and the consecutive black pixel series. Detects (defines) the rectangular intersection candidates that extend and the rectangular intersection candidates that form the ruled line candidate, and then detects the horizontal ruled line candidates as the vertical ruled line candidates or the vertical ruled line candidates as the horizontal ruled lines. The rule is confirmed by performing an intersection inspection between the rectangular intersection candidates on the candidates and detecting a true horizontal ruled line, a vertical ruled line, and an intersection.

【００２５】その手順であるが、各水平罫線候補、垂直
罫線候補毎に、以下の規則（七）により矩形交差点候補
を検出する。なお、これは幾何計算を避けることによっ
て交差点の有無の判断の時間を短くすることを考慮した
ものでもある。規則（七）線分候補Ｌ₁，Ｌ₂，…Ｌ_i，Ｌ_i+1…，Ｌ_nから構成
された罫線候補に対して、以下の条件を充たすものから
構成された矩形を交差点候補とする。以下、図１１を参
照しながら説明する。（１）Ｌ_iとＬ_i+1の間、Ｌ_iを構成する最後の（走査
線のｙ座標における大きい方が後）黒画素連がＲ_iであ
り、且つＬ_i+1を構成する一番前（走査線のｙ座標にお
ける小さい方が前）の黒画素連はＲ_i+1であれば、即
ち、ｘ_iからｘ_i’までとｙ_iからｙ_i+1までにより構
成された矩形であると判断する。（２）Ｌ₁を構成する最前の黒画素連はＲ₁であれば、
前へＲ₁と重なり、且つ連続隣接している黒画素連を検
出し、遠くてもｙ_uで。即ち、ｘ₁からｘ₁’まで、と
ｙ_uからｙ₁までにより構成された矩形であると判断す
る。（３）Ｌ_nを構成する最後の黒画素連Ｒ_nあれば、後へ
Ｒ_nと重なり、且つ連続隣接している黒画素連を検出
し、遠くてもｙ_bまで。即ち、ｘ_nからｘ_n' まで、と
ｙ_nからｙ_bまでにより構成された矩形であると判断す
る。With respect to the procedure, a rectangular intersection candidate is detected according to the following rule (7) for each horizontal ruled line candidate and vertical ruled line candidate. It should be noted that this also considers shortening the time for determining the presence or absence of an intersection by avoiding geometric calculation. Rule (7) For a ruled line candidate composed of line segment candidates L ₁ , L ₂ , ... _Li , L _{i + 1,} ..., L _n , a rectangle composed of those satisfying the following conditions is set as an intersection candidate. To do. Hereinafter, description will be given with reference to FIG. (1) L _i and L _{i + 1} between the last (after the larger in the y-coordinate of the scan line) black pixel run constituting the L _i is R _i, one of and constituting the L _{i + 1} The last black pixel string (the smaller one in the y coordinate of the scanning line is first) is R _{i + 1} , that is, a rectangle formed by x _i to x _i ′ and y _i to y _{i + 1.} It is determined that (2) If the previous black pixel string forming L ₁ is R ₁ ,
Previously, a black pixel string which overlaps R ₁ and is continuously adjacent is detected, and at a distance y _u . That is, it is determined that the rectangle is composed of x ₁ to x ₁ ′ and _yu to y ₁ . (3) _If the last black pixel string R _n forming L _n exists, a black pixel string that overlaps with R _n and is continuously adjacent is detected, and even up to y _b . That is, it is determined that the rectangle is composed of x _n to x _n ′ and y _n to y _b .

【００２６】図１１（２）は、上記定義による垂直罫線
候補の一例を示すものである。そして、水平罫線候補で
以て垂直罫線候補に対して矩形交差点候補間の交差検査
を行う。本図において、１、２、３は各々上記規則七の
（１）、（２）、（３）に相応したものである。前記の
検査により、真の交差点が得られ、且つ真の水平罫線、
垂直罫線が得られるようになったため、表構造を獲得で
きる。FIG. 11B shows an example of vertical ruled line candidates defined as above. Then, the intersection inspection between the rectangular intersection candidates is performed on the vertical ruled line candidates by the horizontal ruled line candidates. In the figure, 1, 2 and 3 correspond to (1), (2) and (3) of Rule 7 above, respectively. The inspection gives a true intersection and a true horizontal ruled line,
Since vertical ruled lines can be obtained, a table structure can be acquired.

【００２７】以上のように構成された本実施例の表構造
抽出装置について、以下図１２の処理例をもとにその動
作を説明する。先ず、画像圧縮部１１は、入力された二
値デジタルデータに対して圧縮処理を行う。この圧縮処
理後の様子を図１２（１）に示す。黒画素連検出部１２
は、前記圧縮された二値デジタルデータを水平方向及び
垂直方向に走査して水平黒画素連及び垂直黒画素連を検
出する。この様子を図１２（２）に示す。The operation of the table structure extracting device of the present embodiment having the above-described structure will be described below with reference to the processing example of FIG. First, the image compression unit 11 performs a compression process on the input binary digital data. The state after this compression processing is shown in FIG. Black pixel consecutive detection unit 12
Scans the compressed binary digital data in the horizontal and vertical directions to detect horizontal black pixel series and vertical black pixel series. This state is shown in FIG.

【００２８】線分抽出部１３は、連続隣接している垂直
黒画素連、水平黒画素連から、それぞれ経路を構成す
る。そして、この構成した経路における黒画素連毎に、
順に開始点座標値と終点座標値を検査する直線判別規則
を適用することにより、経路上の本当の水平線分及び垂
直線分を抽出する。この様子を図１２（３）に示す。こ
こでは、線分は水平線分９本と垂直線分８本が抽出され
ている。The line segment extraction unit 13 forms a path from each of a series of vertically adjoining vertical black pixels and a series of horizontally adjoining black pixels. Then, for each black pixel string in this configured path,
The true horizontal line segment and the vertical line segment on the route are extracted by applying the straight line discriminating rule for inspecting the start point coordinate value and the end point coordinate value in order. This state is shown in FIG. Here, nine horizontal line segments and eight vertical line segments are extracted.

【００２９】罫線候補構成部２１は、水平線分、垂直線
分に対して線分間における重なり及び連続隣接している
黒画素連によって次の隣接線分を検出する方法並びに隣
接している線分との幅比較によって文字と罫線の接触し
ている状況を処理する方法を用いて、順に連続隣接して
いる罫線候補としての水平線分、垂直線分を検出する。
この様子を図１２の（４）に示す。この場合、検出され
た水平線分及び垂直線分から各々三本の水平罫線候補及
び四本の垂直罫線候補が構成される。The ruled line candidate construction unit 21 detects the next adjacent line segment by the overlap of the horizontal line segment and the vertical line segment in the line segment and the continuous black pixel series and the adjacent line segment. A horizontal line segment and a vertical line segment as ruled line candidates that are successively adjacent to each other are detected by using a method of processing a situation in which a character and a ruled line are in contact with each other by comparing the widths.
This state is shown in (4) of FIG. In this case, three horizontal ruled line candidates and four vertical ruled line candidates are configured from the detected horizontal line segment and vertical line segment, respectively.

【００３０】表構造抽出部３１は、上記の水平罫線候
補、垂直罫線候補に対して当該罫線候補の前後における
重なり及び連続隣接している黒画素連で当該罫線から前
後への伸びる部分の有無を判定し、この上で矩形交差点
候補及び当該罫線候補を構成する線分間の矩形交差点候
補を見出す。この様子を図１２（４）に示す。この場合
には２４個の中空の矩形である。そして水平罫線候補で
以て垂直罫線候補に対して若しくは垂直罫線候補で以て
水平罫線候補に対して矩形交差点候補間の交差検査を行
う。その検査により水平罫線垂直罫線及び両者の交点が
求められる。この様子を、図１２（５）に示す。この場
合には、最終的には三本の水平罫線、四本の垂直罫線及
び十二個の交差点が正しく得られる。以上の処理によ
り、図４と図５に示した表からそれぞれ図１３と図１４
に示しているように正確な表を抽出する。The table structure extraction unit 31 determines whether or not the horizontal ruled line candidate and the vertical ruled line candidate are overlapped before and after the ruled line candidate and whether or not there is a portion extending continuously from the ruled line in the consecutive black pixel series. The determination is made, and the rectangular intersection candidate and the rectangular intersection candidate of the line segment that constitutes the ruled line candidate are found on the judgment. This state is shown in FIG. In this case, there are 24 hollow rectangles. Then, the intersection inspection between the rectangular intersection candidates is performed on the vertical ruled line candidate by the horizontal ruled line candidate or the horizontal ruled line candidate by the vertical ruled line candidate. The inspection determines horizontal ruled lines and vertical ruled lines and the intersections of the two. This state is shown in FIG. In this case, finally, three horizontal ruled lines, four vertical ruled lines and twelve intersections are correctly obtained. 13 and 14 from the tables shown in FIGS. 4 and 5, respectively.
Extract the exact table as shown in.

【００３１】以上、本発明を実施例に基づいて説明して
きたが、本発明は何も上記実施例に限定されないのは勿
論である。すなわち、例えば以下のようなものも本発明
に含まれる。（１）入力されたデータは一般的二値データで、そのま
まを使って表を抽出する。（２）画像圧縮の論理ＯＲ演算は４＊４画素を単位とし
た正方形領域でなく、３＊３画素を単位とした正方形領
域である。（３）本明細書でいう「黒画素」とは、罫線を構成する
画素という意味であり、色彩学上の「黒」色に限定され
ない。例えば、「白い」紙に「緑」色で罫線が印刷され
ている場合、この「緑」色を構成する画素も「黒画素」
に該当する。（４）製造等の都合で、本発明の不可欠の一の構成要素
（要件）を複数に分割したり、逆に複数の構成要素を一
体としたり、あるいはこれらを適宜組みあわせている。Although the present invention has been described above based on the embodiments, it goes without saying that the present invention is not limited to the above embodiments. That is, for example, the following is also included in the present invention. (1) The input data is general binary data, and the table is extracted as it is. (2) The logical OR operation of image compression is not a square area in units of 4 * 4 pixels, but a square area in units of 3 * 3 pixels. (3) The term “black pixel” as used in the present specification means a pixel forming a ruled line, and is not limited to a chromatically “black” color. For example, if ruled lines are printed in “green” color on “white” paper, the pixels that make up this “green” color are also “black pixels”.
Corresponds to. (4) For the convenience of manufacturing or the like, one indispensable constituent element (requirement) of the present invention is divided into a plurality of parts, conversely, a plurality of constituent elements are integrated, or these are combined appropriately.

【００３２】[0032]

【発明の効果】以上説明してきたように、本発明によれ
ば、以下の効果を得られる。（１）線分候補抽出部、罫線候補検出部により、水平罫
線候補、垂直罫線候補を構成する。そのため、従来不可
欠な傾き補正（前処理）に時間を多く要するという課題
を解消する。（２）表構造抽出部の交差検査により、真の水平罫線、
垂直罫線及び交差点が得られる。そのため、不正規な表
を抽出できない及び罫線におけるセグメントの有無及び
実際的交差状況を判断できないという問題点を解消す
る。（３）画像圧縮部は、入力された原始二値デジタルデー
タに対してｎ＊ｎ（ｎ＝２，３，４，…）の正方形の内
部におけるｎ＊ｎ画素値を論理ＯＲの演算により、運算
する結果を１画素とする。このため、処理するデータ量
の減少が図られ、また、罫線の途切れを補正可能とな
る。As described above, according to the present invention, the following effects can be obtained. (1) The line segment candidate extraction unit and the ruled line candidate detection unit configure horizontal ruled line candidates and vertical ruled line candidates. Therefore, the problem that the tilt correction (pre-processing), which has been indispensable in the past, requires a long time is solved. (2) A true horizontal ruled line,
Vertical creases and intersections are obtained. Therefore, the problems that an irregular table cannot be extracted and the presence / absence of a segment in a ruled line and the actual intersection situation cannot be determined are solved. (3) The image compressing unit calculates the logical OR of the n * n pixel values inside the n * n (n = 2, 3, 4, ...) Square with respect to the input primitive binary digital data. The result to be calculated is one pixel. For this reason, the amount of data to be processed can be reduced, and breaks in ruled lines can be corrected.

【００３３】このため、本発明は帳票認識システム等の
ＯＡ事務機器分野に応用することが可能であり、その実
用的な効果は非常に大きい。Therefore, the present invention can be applied to the field of office automation equipment such as a form recognition system, and its practical effect is very large.

[Brief description of drawings]

【図１】本発明に係る表構造抽出装置の一実施例の構成
図である。FIG. 1 is a configuration diagram of an embodiment of a table structure extraction device according to the present invention.

【図２】従来の表構造抽出装置の構成図である。FIG. 2 is a configuration diagram of a conventional table structure extraction device.

【図３】従来の周辺分布法により傾き画像に応用する際
の例を示す図である。FIG. 3 is a diagram showing an example when applied to a tilt image by a conventional peripheral distribution method.

【図４】従来技術による表構造抽出過程及びその結果を
示す図である。FIG. 4 is a diagram showing a table structure extraction process and its result according to a conventional technique.

【図５】従来技術による表構造抽出過程及びその結果を
示す図である。FIG. 5 is a diagram showing a table structure extraction process and its result according to a conventional technique.

【図６】黒画素連を示す図である。FIG. 6 is a diagram showing a black pixel string.

【図７】黒画素連の重なりの定義の説明用の図である。FIG. 7 is a diagram for explaining the definition of overlap of a series of black pixels.

【図８】経路を構成する一例を示す図である。FIG. 8 is a diagram showing an example of configuring a route.

【図９】本発明の実施例における罫線候補構成部の動作
流れ図である。FIG. 9 is an operation flow chart of a ruled line candidate construction unit in the embodiment of the present invention.

【図１０】隣接している線分を説明するための図であ
る。FIG. 10 is a diagram for explaining adjacent line segments.

【図１１】矩形交差点候補を説明するための図である。FIG. 11 is a diagram for explaining a rectangular intersection candidate.

【図１２】本発明の実施例における表構造抽出過程及び
その結果を示す図である。FIG. 12 is a diagram showing a table structure extraction process and its result in the embodiment of the present invention.

【図１３】本発明の実施例における表構造抽出過程及び
その結果を示す図である。FIG. 13 is a diagram showing a table structure extraction process and its result in the embodiment of the present invention.

【図１４】本発明の実施例における表構造抽出過程及び
その結果を示す図である。FIG. 14 is a diagram showing a table structure extraction process and its result in the embodiment of the present invention.

[Explanation of symbols]

１１画像圧縮部１２黒画素連検出部１３線分抽出部２１罫線候補構成部３１表構造抽出部 11 Image Compressor 12 Black Pixel Consecution Detector 13 Line Segment Extractor 21 Ruled Line Candidate Constructor 31 Table Structure Extractor

─────────────────────────────────────────────────────
─────────────────────────────────────────────────── ───

【手続補正書】[Procedure amendment]

【提出日】平成６年１２月８日[Submission date] December 8, 1994

【手続補正１】[Procedure Amendment 1]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】００１７[Correction target item name] 0017

【補正方法】変更[Correction method] Change

【補正内容】[Correction content]

【００１７】規則（二）Ｒ₁＝（ｙ₁；ｘ₁，ｘ₁’）とＲ₂＝（ｙ₂；ｘ₂，
ｘ₂’）に対して、もし、ｙ₂＝ｙ₁＋１且つＲ₁とＲ
₂が重なるならば、Ｒ₁とＲ₂が隣り合うと判断する。規則（三）連続隣接（多数の相隣合うＲ_i，Ｒ_i+1からなるためか
かるごとく言う）しているＲ₁，Ｒ₂，…Ｒ_nに対し
て、もし各Ｒ_i毎にその前後の走査線が多くても一つの
画素連と隣接しているならば、Ｒ₁，Ｒ₂，…Ｒ_nから
一つの経路を構成していると定義する。図８に、この例
として連続隣接している黒画素連から構成される二つの
経路を示す。Rule (2) R ₁ = (y ₁ ; x ₁ , x ₁ ') and R ₂ = (y ₂ ; x ₂ ,
against x ₂ '), if, y ₂ = y ₁ + ₁ and R ₁ and R
If ₂ overlap, it determined that R ₁ and R ₂ are adjacent. Rule (3) For R ₁ , R ₂ , ... R _n that are continuously adjacent (this is called because it consists of a large number of adjacent R _i and R _{i + 1} ), if before and after each R _i If at least one scanning line is adjacent to one pixel series, it is defined that one path is composed of R ₁ , R ₂ , ... R _n . FIG. 8 shows, as an example of this, two paths constituted by a series of black pixels that are continuously adjacent to each other.

【手続補正２】[Procedure Amendment 2]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】００１８[Correction target item name] 0018

【補正方法】変更[Correction method] Change

【補正内容】[Correction content]

【００１８】続いて、直線判別規則に従って、該経路に
おけるまっすぐな線分を抽出する。ここで用いる直線判
別規則は以下の規則（四）であり、各経路における各黒
画素連毎に順にその開始点座標値及び終点座標値が検査
される。規則（四）経路における一部分であるＲ₁，Ｒ₂，…Ｒ_nに対し
て、もし以下の条件が満たされた場合には、Ｒ₁，
Ｒ₂，…Ｒ_nが、まっすぐな線分を構成すると判断す
る。（１）相隣り合う黒画素連の間に、｜ｘ_i−ｘ_i+1｜＜
１、且つ｜ｘ _i’−ｘ_i+ ₁’｜＜１が成立する。（２）順にK 個の黒画素連を一つのセグメントとして、
セグメント毎に、｜ｘ^a（^aはｘ上の横線を表す。以降
も同じ）−ｘ _j｜＜１且つ｜ｘ^a’−ｘ _j’｜＜１であ
る。そのうち、Ｒ^aはあるセグメントの中にある第一番
目の黒画素連であり（（ｙ^a；ｘ^a，ｘ^a’）で表され
る。）、Ｒ_jはＲ^aのほか、当該セグメントにある黒画
素連である。（３）長さがあるしきい値より大きい（４）幅を長さで割れば、得られた値があるしきい値よ
り小さい。Then, a straight line segment on the route is extracted according to the straight line discrimination rule. The straight line discriminating rule used here is the following rule (4), and the start point coordinate value and the end point coordinate value are inspected for each black pixel string in each path in order. Rule (4) For R ₁ , R ₂ , ... R _n , which is a part of the path, if the following conditions are satisfied, R ₁ ,
It is determined that R ₂ , ... R _n form a straight line segment. (1) | x _i −x _{i + 1} | <between adjacent black pixel strings
1 and | x _i ′ −x _{i +} ₁ ′ | <1 holds. (2) In sequence, K black pixel rows are treated as one segment,
For each segment, | x ^{^a (a} represents the horizontal line on the x and later also the same.) -X _j | a <1 | <1 and | x ^a '-x _j'. Among them, R ^a is a Oh Ru FIRST black pixel run in the segment ((y ^a;., Represented by ^{^{x a, x a '))}} , R j Other R ^a, the segment It is a series of black pixels. (3) The length is larger than a certain threshold value. (4) When the width is divided by the length, the obtained value is smaller than a certain threshold value.

【手続補正３】[Procedure 3]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】００２５[Name of item to be corrected] 0025

【補正方法】変更[Correction method] Change

【補正内容】[Correction content]

【００２５】その手順であるが、各水平罫線候補、垂直
罫線候補毎に、以下の規則（七）により矩形交差点候補
を検出する。なお、これは幾何計算を避けることによっ
て交差点の有無の判断の時間を短くすることを考慮した
ものでもある。規則（七）線分候補Ｌ₁，Ｌ₂，…Ｌ_i，Ｌ_i+1…，Ｌ_nから構成
された罫線候補に対して、以下の条件を充たすものから
構成された矩形を交差点候補とする。以下、図１１の
（１）を参照しながら説明する。（１）Ｌ_iとＬ_i+1の間、Ｌ_iを構成する最後の（走査
線のｙ座標における大きい方が後）黒画素連がＲ_iであ
り、且つＬ_i+1を構成する一番前（走査線のｙ座標にお
ける小さい方が前）の黒画素連はＲ_i+1であれば、即
ち、ｘ_iからｘ_i’までとｙ_iからｙ_i+1までにより構
成された矩形であると判断する。（２）Ｌ₁を構成する最前の黒画素連はＲ₁であれば、
前へＲ₁と重なり、且つ連続隣接している黒画素連を検
出し、遠くてもｙ_uで。即ち、ｘ₁からｘ₁’まで、と
ｙ_uからｙ₁までにより構成された矩形であると判断す
る。（３）Ｌ_nを構成する最後の黒画素連Ｒ_nあれば、後へ
Ｒ_nと重なり、且つ連続隣接している黒画素連を検出
し、遠くてもｙ_bまで。即ち、ｘ_nからｘ_n' まで、と
ｙ_nからｙ_bまでにより構成された矩形であると判断す
る。With respect to the procedure, a rectangular intersection candidate is detected according to the following rule (7) for each horizontal ruled line candidate and vertical ruled line candidate. It should be noted that this also considers shortening the time for determining the presence or absence of an intersection by avoiding geometric calculation. Rule (7) For a ruled line candidate composed of line segment candidates L ₁ , L ₂ , ... _Li , L _{i + 1,} ..., L _n , a rectangle composed of those satisfying the following conditions is set as an intersection candidate. To do. The following, as shown in FIG. 11
This will be described with reference to (1) . (1) L _i and L _{i + 1} between the last (after the larger in the y-coordinate of the scan line) black pixel run constituting the L _i is R _i, one of and constituting the L _{i + 1} The last black pixel string (the smaller one in the y coordinate of the scanning line is first) is R _{i + 1} , that is, a rectangle formed by x _i to x _i ′ and y _i to y _{i + 1.} It is determined that (2) If the previous black pixel string forming L ₁ is R ₁ ,
Previously, a black pixel string which overlaps R ₁ and is continuously adjacent is detected, and at a distance y _u . That is, it is determined that the rectangle is composed of x ₁ to x ₁ ′ and _yu to y ₁ . (3) _If the last black pixel string R _n forming L _n exists, a black pixel string that overlaps with R _n and is continuously adjacent is detected, and even up to y _b . That is, it is determined that the rectangle is composed of x _n to x _n ′ and y _n to y _b .

【手続補正４】[Procedure amendment 4]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】００２６[Correction target item name] 0026

【補正方法】変更[Correction method] Change

【補正内容】[Correction content]

【００２６】図１１の（２）は、上記定義による垂直罫
線候補の一例を示すものである。そして、水平罫線候補
で以て垂直罫線候補に対して矩形交差点候補間の交差検
査を行う。本図において、（１）、（２）、（３）は各
々上記規則（七）の（１）、（２）、（３）に相応した
ものである。前記の検査により、真の交差点が得られ、
且つ真の水平罫線、垂直罫線が得られるようになったた
め、表構造を獲得できる。FIG. 11 (2) shows an example of a vertical ruled line candidate as defined above. Then, the intersection inspection between the rectangular intersection candidates is performed on the vertical ruled line candidates by the horizontal ruled line candidates. In this figure, (1), (2), and (3) correspond to (1), (2), and (3) of the above rule (7) , respectively. The above inspection gives a true intersection,
Moreover, since a true horizontal ruled line and a vertical ruled line can be obtained, a table structure can be acquired.

【手続補正５】[Procedure Amendment 5]

【補正対象書類名】図面[Document name to be corrected] Drawing

【補正対象項目名】図１１[Name of item to be corrected] Fig. 11

【補正方法】変更[Correction method] Change

【補正内容】[Correction content]

【図１１】 FIG. 11

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号庁内整理番号ＦＩ技術表示箇所Ｇ０６Ｋ 9/36 9/46 Ａ 9289−5Ｌ ─────────────────────────────────────────────────── ─── Continuation of the front page (51) Int.Cl. ⁶ Identification code Internal reference number FI technical display location G06K 9/36 9/46 A 9289-5L

Claims

[Claims]

1. A table structure extraction device for extracting a table structure in a printed document by using the overlapping and adjacency relations of a series of black pixels, in an n * n square inside the inputted original binary digital data. An image compression unit that uses the result of the logical OR operation of n * n pixel values as one pixel, and detects the horizontal black pixel series and the vertical black pixel series by scanning the compressed binary digital data in the horizontal and vertical directions. And a vertical black pixel series and a horizontal black pixel series which are determined to be continuously adjacent to each other among the horizontal and vertical black pixel series detected by the black pixel series detection section. , And a line segment extraction unit for extracting the horizontal line segment and the vertical line segment in the route constructed by the straight line determination rule, and the line segment overlapped with the horizontal line segment and the vertical line segment extracted by the line segment extraction unit. A horizontal line segment and a vertical line segment that are successively adjacent to each other by using the method of detecting the next adjacent line segment by a series of consecutive black pixels that are adjacent to each other, and then the detected horizontal line segment and vertical line segment are detected. A ruled line candidate forming section that forms a horizontal ruled line candidate and a vertical ruled line candidate, respectively, and the horizontal ruled line candidate and the vertical ruled line candidate formed by the ruled line candidate forming section are overlapped and consecutively adjacent to each other before and after the ruled line candidate. The presence or absence of a portion extending from the ruled line to the front and rear is determined by a series of black pixels, and a horizontal ruled line candidate is used as a reference for a vertical ruled line candidate or a vertical ruled line candidate is used as a reference for a horizontal ruled line candidate to perform a true horizontal check. A table structure extracting device having a ruled line, a vertical ruled line, and a table structure extracting unit for detecting intersections.

2. The line segment extraction unit forms a path from a series of vertically adjoining vertical black pixels and a series of horizontally adjoining black pixels, respectively.
Further, a horizontal line segment and a vertical line segment in the route are extracted by a straight line discriminating rule for sequentially inspecting the start point coordinate value and the end point coordinate value for each black pixel string in the route. Item 1. The table structure extraction device according to item 1.

3. A method for detecting the next adjacent line segment by the ruled line candidate extraction unit by the overlap of the line segments with respect to the horizontal line segment and the vertical line segment and the black pixel string that is continuously adjacent and the adjacent line segment. A horizontal line segment and a vertical line segment that are consecutively adjacent to each other in order are detected by using the method that handles the situation where characters and ruled lines are in contact by comparing the widths with the minutes. 2. The table structure extracting device according to claim 1, wherein each of them constitutes a horizontal ruled line candidate and a vertical ruled line candidate.

4. The table structure extraction unit includes a rectangular intersection candidate, which extends forward and backward from the ruled line, in an overlap before and after the ruled line candidate with respect to the horizontal ruled line candidate and the vertical ruled line candidate and in a continuous black pixel series. Find the rectangular intersection candidates for the line segments that make up the ruled line candidate, and then use the horizontal ruled line candidate as the reference for the vertical ruled line candidate or the vertical ruled line candidate as the reference for the horizontal ruled line candidate. 2. The table structure extraction device according to claim 1, wherein the detection is performed to detect true horizontal ruled lines, vertical ruled lines, and intersections.