JPH1097588A

JPH1097588A - Ruled-line recognizing method, table processing method, and recording medium

Info

Publication number: JPH1097588A
Application number: JP8247786A
Authority: JP
Inventors: Goro Bessho; 吾朗別所
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1996-09-19
Filing date: 1996-09-19
Publication date: 1998-04-14

Abstract

PROBLEM TO BE SOLVED: To accurately extract only dotted lines even when characters in similar shapes are arranged or when there are a plurality of dotted lines with narrow line spacing. SOLUTION: A document 2 is read as a binary value image, which is stored in a memory 2. A rectangle extraction part 3 generates rectangles containing all connecting black pixels by referring to the memory 2 and stores only a rectangle of size corresponding to a dotted line in a rectangle memory 4. A black pixel rate calculation part 5 counts the black pixels in the rectangle and calculates the number of black pixels to the rectangle area as a black pixel occupation rate. A dotted-line element decision part 6 decides whether or not a dotted line is constituted on the basis of the black pixel occupation rate of the rectangle and a dotted ruled line extraction part 8 integrates only adequate rectangles among rectangles decided to be dotted-line elements as a dotted line on the basis of the intervals between the dotted-line elements to extract the dotted ruled line.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、表や帳票などの罫
線を含む文字画像の文字および罫線の認識方法、表処理
方法および記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method of recognizing a character and a ruled line of a character image including a ruled line such as a table or a form, a table processing method, and a recording medium.

【０００２】[0002]

【従来の技術】一般に、文字認識装置において文書を処
理する場合、文書画像を文字領域、表領域、図その他の
領域に分類し、それぞれの領域に応じた処理を行う場合
が多い。その中でも表を構成する罫線を認識する方法と
しては、実線からなる罫線を認識する処理に加え、点線
を認識する方法としては、黒画素連結成分を構成する矩
形を統合して点線を認識する方法が知られている（例え
ば、特開平７−２３０５２５号公報を参照）。2. Description of the Related Art In general, when a document is processed by a character recognition apparatus, a document image is often classified into a character area, a table area, a figure, and other areas, and processing according to each area is often performed. Among them, as a method of recognizing ruled lines constituting a table, in addition to a process of recognizing ruled lines composed of solid lines, a method of recognizing dotted lines is a method of recognizing dotted lines by integrating rectangles constituting black pixel connected components. Is known (for example, see Japanese Patent Application Laid-Open No. 7-230525).

【０００３】[0003]

【発明が解決しようとする課題】しかし、上記した方法
では、点線以外の文字あるいは文字の一部で、形状の類
似している矩形が等間隔に並んで印字されている場合
に、誤って点線と誤認織したり、行間の狭い点線が複数
存在すると点線と直交する方向にも誤って点線を認識す
る場合があった。However, according to the above-described method, when a rectangle or a part of a character other than a dotted line and a rectangle having a similar shape is printed at regular intervals, the dotted line is erroneously printed. When there are a plurality of narrow dotted lines between rows, there are cases where the dotted lines are erroneously recognized in a direction orthogonal to the dotted lines.

【０００４】本発明の目的は、形状の似通った文字が並
んでいたり、行間の狭い複数の点線が存在していても、
正確に点線のみを抽出することを可能にした罫線認識方
法、表処理方法および記録媒体を提供することにある。[0004] The object of the present invention is to provide a method in which even if characters having similar shapes are lined up or a plurality of dotted lines with a narrow line spacing exist.
An object of the present invention is to provide a ruled line recognition method, a table processing method, and a recording medium that enable accurate extraction of only dotted lines.

【０００５】[0005]

【課題を解決するための手段】前記目的を達成するため
に、請求項１記載の発明では、２値画像から連結する黒
画素をすべて包含する矩形を抽出し、該矩形を統合して
点線罫線を抽出する罫線認識方法であって、前記抽出さ
れた矩形内の黒画素の比率を基に点線罫線としての妥当
性を判定することを特徴としている。In order to achieve the above object, according to the first aspect of the present invention, a rectangle including all connected black pixels is extracted from a binary image, and the rectangle is integrated to form a dotted ruled line. , Wherein the validity as a dotted ruled line is determined based on the ratio of black pixels in the extracted rectangle.

【０００６】請求項２記載の発明では、２値画像から連
結する黒画素をすべて包含する矩形を抽出し、該矩形を
統合して点線罫線を抽出する罫線認識方法であって、前
記画像の主走査方向あるいは副走査方向の点線罫線を抽
出するために用いた矩形に対してラベルを付け、副走査
方向あるいは主走査方向の点線罫線を抽出するとき、該
ラベルを付与された矩形を用いないことを特徴としてい
る。According to a second aspect of the present invention, there is provided a rule recognition method for extracting a rectangle including all connected black pixels from a binary image, and integrating the rectangle to extract a dotted rule. Label the rectangle used to extract the dotted ruled line in the scanning direction or sub-scanning direction, and do not use the labeled rectangle when extracting the dotted ruled line in the sub-scanning direction or the main scanning direction. It is characterized by.

【０００７】請求項３記載の発明では、２値画像から所
定の閾値以上の長さの黒ランを抽出し、該抽出された黒
ラン同士が所定の閾値以内の距離にあるとき、該黒ラン
同士を統合して実線罫線として抽出する処理と、請求項
１または２記載の方法によって点線罫線を抽出する処理
を行うことを特徴としている。According to the third aspect of the present invention, a black run having a length equal to or longer than a predetermined threshold is extracted from the binary image, and when the extracted black runs are within a distance within a predetermined threshold, the black run is detected. The method is characterized in that a process of integrating them and extracting them as a solid ruled line and a process of extracting a dotted ruled line by the method of claim 1 or 2 are performed.

【０００８】請求項４記載の発明では、請求項３記載の
処理を主走査方向と副走査方向に対して行ない、主走査
方向の罫線および副走査方向の罫線を抽出し、該抽出さ
れた罫線を組み合わせて枠を認識することを特徴として
いる。According to a fourth aspect of the present invention, the processing according to the third aspect is performed in the main scanning direction and the sub-scanning direction, and a ruled line in the main scanning direction and a ruled line in the sub-scanning direction are extracted. Are combined to recognize the frame.

【０００９】請求項５記載の発明では、請求項４記載の
処理によって抽出された枠領域から、枠内の文字を抽出
し、文字認識することを特徴としている。According to a fifth aspect of the present invention, a character in the frame is extracted from the frame region extracted by the processing of the fourth aspect, and the character is recognized.

【００１０】請求項６記載の発明では、請求項３記載の
処理によって抽出された実線罫線および点線罫線のそれ
ぞれの座標値および罫線種類を出力し、罫線種類に応じ
た原稿を再現することを特徴としている。According to a sixth aspect of the present invention, the coordinate values and the ruled line types of the solid ruled line and the dotted lined rule extracted by the processing of the third aspect are output, and a document corresponding to the ruled line type is reproduced. And

【００１１】請求項７記載の発明では、請求項５記載の
処理によって文字認識された文字コードを出力し、また
同時に請求項６記載の処理によって罫線を出力して、原
稿を再現することを特徴としている。According to a seventh aspect of the present invention, the character code recognized by the processing of the fifth aspect is output, and at the same time, a ruled line is output by the processing of the sixth aspect to reproduce the original. And

【００１２】請求項８記載の発明では、２値画像から連
結する黒画素をすべて包含する矩形を抽出し、該矩形を
統合して点線罫線を抽出する機能を、コンピュータに実
現させるためのプログラムを記録した記録媒体であっ
て、前記抽出された矩形内の黒画素の比率を基に点線罫
線としての妥当性を判定する機能を実現させるためのプ
ログラムを記録したことを特徴としている。According to an eighth aspect of the present invention, there is provided a program for causing a computer to extract a rectangle including all connected black pixels from a binary image, and to integrate the rectangle to extract a dotted ruled line. A recording medium in which a program for realizing a function of determining validity as a dotted ruled line based on a ratio of black pixels in the extracted rectangle is recorded.

【００１３】請求項９記載の発明では、２値画像から連
結する黒画素をすべて包含する矩形を抽出し、該矩形を
統合して点線罫線を抽出する機能を、コンピュータに実
現させるためのプログラムを記録した記録媒体であっ
て、前記画像の主走査方向あるいは副走査方向の点線罫
線を抽出するために用いた矩形に対してラベルを付け、
副走査方向あるいは主走査方向の点線罫線を抽出すると
き、該ラベルを付与された矩形を用いない機能を実現さ
せるためのプログラムを記録したことを特徴としてい
る。According to the ninth aspect of the present invention, there is provided a program for causing a computer to extract a rectangle including all connected black pixels from a binary image, and to integrate the rectangle to extract a dotted ruled line. In the recording medium on which recording is performed, a label is attached to a rectangle used for extracting a dotted ruled line in the main scanning direction or the sub-scanning direction of the image,
When a dotted ruled line in the sub-scanning direction or the main scanning direction is extracted, a program for realizing a function that does not use the labeled rectangle is recorded.

【００１４】[0014]

【発明の実施の形態】以下、本発明の一実施例を図面を
用いて具体的に説明する。〈実施例１〉図１は、本発明の実施例１の構成を示す。
図２は、本発明の実施例１の処理フローチャートであ
る。スキャナ等の２値画像入力部１を用いて、文書や帳
票等の原稿を２値画像として読み取り、２値イメージメ
モリ２に格納する（ステップ１０１）。矩形抽出部３
は、２値イメージメモリ２をスキャンし、連結する黒画
素をすべて包含するような矩形を生成し、この矩形から
点線を構成する要素として妥当な大きさの矩形データ
（始点、終点の座標値など）のみを矩形メモリ４に格納
する（ステップ１０２）。DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be specifically described below with reference to the drawings. <Embodiment 1> FIG. 1 shows the structure of Embodiment 1 of the present invention.
FIG. 2 is a processing flowchart of the first embodiment of the present invention. Using a binary image input unit 1 such as a scanner, an original such as a document or a form is read as a binary image and stored in a binary image memory 2 (step 101). Rectangle extraction unit 3
Scans the binary image memory 2, generates a rectangle that includes all the connected black pixels, and generates rectangular data of an appropriate size as an element forming a dotted line from the rectangle (the coordinates of the start point, end point, etc.). ) Is stored in the rectangular memory 4 (step 102).

【００１５】黒画素比率算出部５は、矩形メモリ４から
読み出した矩形内の黒画素数を計数し、矩形面積（Ａｒ
ｅａ）に対する黒画素の数（Ｐｉｘｅ１）を黒画素占有
率（ＢｌａｃｋＲａｔｉｏ）として計算する（ステッ
プ１０３）。ＢｌａｃｋＲａｔｉｏ＝Ｐｉｘｅｌ／Ａｒ
ｅａ点線要素判定部６は、矩形の黒画素占有率を基に該矩形
が点線を構成するものか否かの判定を行い、点線要素と
判定された矩形を点線要素メモリ７に格納する（ステッ
プ１０４）。すなわち、図３（ｂ）のように点線を構成
する矩形の場合（ｒＡ〜ｒＤ）、矩形の内部の画像は黒
画素が多いと考えられる。一方、図３（ａ）のように同
一形状の文字（この例では左括弧）が並んでいるような
場合（ｒａ〜ｒｄ）には、矩形内の画像は必ずしも黒画
素が多いとは考えられない。従って、この性質を利用し
て、矩形が点線を構成するか否かを正確に判定すること
が可能になる。また、点線を構成する要素か否かの判定
には、黒画素占有率が例えば０．９以上か未満かを判定
の基準に用いることとする。なお、この他に、矩形の幅
などを基に点線要素となり得るものを抽出してもよい。The black pixel ratio calculating section 5 counts the number of black pixels in the rectangle read from the rectangular memory 4 and calculates the rectangular area (Ar
ea), the number of black pixels (Pixel1) is calculated as a black pixel occupation ratio (Black Ratio) (step 103). BlackRatio = Pixel / Ar
ea The dotted line element determination unit 6 determines whether or not the rectangle forms a dotted line based on the black pixel occupancy of the rectangle, and stores the rectangle determined to be a dotted line element in the dotted line element memory 7 (step 104). That is, in the case of a rectangle forming a dotted line as shown in FIG. 3B (rA to rD), it is considered that the image inside the rectangle has many black pixels. On the other hand, in the case where characters of the same shape (left parenthesis in this example) are arranged side by side as shown in FIG. Absent. Therefore, by utilizing this property, it is possible to accurately determine whether or not the rectangle forms the dotted line. Further, in determining whether or not the element constitutes the dotted line, whether or not the black pixel occupation ratio is, for example, 0.9 or more is used as a criterion for determination. It should be noted that, in addition to the above, an element that can be a dotted line element may be extracted based on the width of the rectangle.

【００１６】点線罫線抽出部８は、点線要素メモリ７か
ら点線要素を読み出し、点線要素どうしの間隔を基に点
線として妥当なものを統合することにより、点線罫線を
抽出する（ステップ１０５）。なお、この処理は、前掲
した公報の図１５、１６、１７に示すように、統合処理
を多段階に行なうことで、より正碓に罫線の抽出を行な
うことができる。The dotted ruled line extracting unit 8 reads out dotted line elements from the dotted line element memory 7 and integrates valid dotted lines based on the intervals between the dotted line elements to extract dotted line rules (step 105). In this process, as shown in FIGS. 15, 16, and 17 of the above-mentioned publication, by performing the integration process in multiple stages, the ruled line can be more accurately extracted.

【００１７】〈実施例２〉図４は、本発明の実施例２の
構成を示し、図５は、本発明の実施例２の処理フローチ
ャートである。２値画像入力部２１、２値イメージメモ
リ２２、矩形抽出部２３、矩形メモリ２４は実施例１と
同様であるので、その説明は省略する。また、ステップ
２０２の矩形抽出の処理までは実施例１と同一である。Embodiment 2 FIG. 4 shows the configuration of Embodiment 2 of the present invention, and FIG. 5 is a processing flowchart of Embodiment 2 of the present invention. The binary image input unit 21, the binary image memory 22, the rectangle extraction unit 23, and the rectangle memory 24 are the same as those in the first embodiment, and thus description thereof will be omitted. The processing up to the rectangle extraction processing in step 202 is the same as that of the first embodiment.

【００１８】まず、点線罫線抽出部２５は、矩形メモリ
２４を参照して主走査方向に対して、矩形の統合を行な
い、点線罫線を抽出し、点線罫線メモリ２６に格納する
（ステップ２０３）。ラベル付与部２７は、主走査方向
罫線を構成した矩形に対してラベリングを行なう（ステ
ップ２０４）。図６は、矩形に対してラベル付与を説明
する図である。ｈ１は主走査方向の実線、ｈ２からｈ４
はそれぞれ抽出された主走査方向の点線罫線である。First, the dotted ruled line extracting unit 25 refers to the rectangular memory 24, integrates rectangles in the main scanning direction, extracts dotted ruled lines, and stores them in the dotted ruled line memory 26 (step 203). The labeling unit 27 performs labeling on the rectangle forming the ruled line in the main scanning direction (step 204). FIG. 6 is a diagram illustrating labeling for a rectangle. h1 is a solid line in the main scanning direction, h2 to h4
Are the extracted dotted ruled lines in the main scanning direction.

【００１９】上記したようにｈ２ないしｈ４の主走査方
向点線罫線が抽出されるので、これらを構成する矩形
（図中の斜線で囲まれた矩形）に対して、ラベル付与部
２７は、抽出済みのラベルを付ける。As described above, the dotted ruled lines h2 to h4 in the main scanning direction are extracted, and the labeling unit 27 extracts the rectangles (rectangular rectangles enclosed by oblique lines in the figure) constituting these. Label.

【００２０】次に、点線罫線抽出部２８は、矩形メモリ
２４を参照し、副走査方向に対して、矩形の統合を行な
い、点線罫線を抽出する（ステップ２０５）。その際、
先にラベル付与部２７によってラベルが付された矩形を
統合の対象外とし、それ以外の矩形（図６では、縦方向
の格子状の矩形）どうしの間隔を基に統合を行なう。こ
の実施例２の処理によって、図６に示すように、Ｒ１の
ような疑似罫線の発生が抑えられ、ｖ１のような目的と
する、副走査方向の点線罫線のみが抽出されることにな
る。Next, the dotted ruled line extraction unit 28 refers to the rectangular memory 24, integrates rectangles in the sub-scanning direction, and extracts dotted ruled lines (step 205). that time,
The rectangles that have been previously labeled by the labeling unit 27 are excluded from integration, and integration is performed based on the intervals between the other rectangles (vertical lattices in FIG. 6). By the processing of the second embodiment, as shown in FIG. 6, the generation of a pseudo ruled line such as R1 is suppressed, and only a target dotted ruled line in the sub-scanning direction such as v1 is extracted.

【００２１】〈実施例３〉図７は、本発明の実施例３の
構成を示し、図８は本発明の実施例３の処理フローチャ
ートである。<Embodiment 3> FIG. 7 shows the configuration of Embodiment 3 of the present invention, and FIG. 8 is a processing flowchart of Embodiment 3 of the present invention.

【００２２】スキャナ等の２値画像入力部３１によっ
て、文書や帳票等の原稿を２値画像として読み取り、２
値イメージメモリ３２に格納する（ステップ３０１）。
黒ラン抽出部３３は、２値イメージメモリ３２から、予
め定められたしきい値以上の黒ランを抽出して、そのデ
ータ（始点、終点の座標値など）を黒ランメモリ３４に
格納する（ステップ３０２）。An original such as a document or a form is read as a binary image by a binary image input unit 31 such as a scanner.
It is stored in the value image memory 32 (step 301).
The black run extraction unit 33 extracts a black run equal to or larger than a predetermined threshold value from the binary image memory 32 and stores the data (the coordinate values of the start point and the end point) in the black run memory 34 ( Step 302).

【００２３】実線罫線認識部３５は、黒ランメモリ３４
に抽出された黒ラン同士が予め定められたしきい値以内
にあるか否かを調ベ、しきい値以内にある黒ランを全て
統合して、実線罫線として抽出し、実線罫線メモリ３６
に格納する（ステップ３０３）。The solid line rule recognition unit 35 includes a black run memory 34
It is checked whether or not the black runs extracted within the threshold value are within a predetermined threshold value. All the black runs within the threshold value are integrated and extracted as a solid ruled line.
(Step 303).

【００２４】以下、ステップ３０４からステップ３０７
では、実施例１または２と同様にして点線罫線を抽出
し、ステップ３０８では、抽出された実線罫線と点線罫
線を併せて出力する。Hereinafter, steps 304 to 307 will be described.
Then, a dotted ruled line is extracted in the same manner as in the first or second embodiment, and in step 308, the extracted solid ruled line and the dotted ruled line are output together.

【００２５】〈実施例４〉図９は、本発明の実施例４の
構成を示す。実線罫線認識部４１は、実施例３の２値画
像入力部、２値イメージメモリ、黒ラン抽出部、黒ラン
メモリ、実線罫線認識部から構成され、点線罫線抽出部
４２は、実施例２の矩形抽出部、矩形メモリ、点線罫線
抽出部（主走査方向）、点線罫線メモリ、ラベル付与
部、点線罫線抽出部（副走査方向）から構成されてい
る。図１０は、本発明の実施例４の処理フローチャート
である。<Embodiment 4> FIG. 9 shows the configuration of Embodiment 4 of the present invention. The solid ruled line recognizing unit 41 includes a binary image input unit, a binary image memory, a black run extracting unit, a black run memory, and a solid line ruled line recognizing unit according to the third embodiment. It is composed of a rectangle extraction unit, a rectangle memory, a dotted line extraction unit (main scanning direction), a dotted line memory, a label assigning unit, and a dotted line extraction unit (sub scanning direction). FIG. 10 is a processing flowchart according to the fourth embodiment of the present invention.

【００２６】ステップ４０１からステップ４０７（点線
罫線抽出）の処理までは実施例３と同一である。これら
の処理を主走査方向と副走査方向の両方に対して行う。
枠認識部４３は、主走査方向と副走査方向の実線罫線お
よび点線罫線を参照し、４辺に囲まれた枠領域を抽出し
て、枠領域メモリ４４に格納する（ステップ４０８）。The processing from step 401 to step 407 (dotted ruled line extraction) is the same as in the third embodiment. These processes are performed in both the main scanning direction and the sub-scanning direction.
The frame recognizing unit 43 refers to the solid ruled line and the dotted ruled line in the main scanning direction and the sub-scanning direction, extracts a frame region surrounded by four sides, and stores the frame region in the frame region memory 44 (step 408).

【００２７】〈実施例５〉図１１は、本発明の実施例５
の構成を示し、実施例４の構成に文字認識部５５を付加
して構成したものである。図１２は、本発明の実施例５
の処理フローチャートである。<Embodiment 5> FIG. 11 shows Embodiment 5 of the present invention.
This is a configuration in which a character recognition unit 55 is added to the configuration of the fourth embodiment. FIG. 12 shows Embodiment 5 of the present invention.
It is a processing flowchart of.

【００２８】ステップ５０８の枠認識の処理までは実施
例４と同一である。文字認識部５５は、枠領域メモリ５
４および枠領域に相当する２値イメージメモリを参照
し、文字認織領域を確定し、この領域に対して文字認識
を行い、文字認識結果メモリ５６に格納する（ステップ
５０９）。The processing up to the frame recognition processing in step 508 is the same as in the fourth embodiment. The character recognition unit 55 includes
The character recognition area is determined by referring to the binary image memory corresponding to 4 and the frame area, character recognition is performed on this area, and the area is stored in the character recognition result memory 56 (step 509).

【００２９】〈実施例６〉図１３は、本発明の実施例６
の構成を示し、実施例４の枠認識部を原稿再現部６３に
置き換えて構成されている。実施例４と同様の部分の説
明は省略する。図１４は、本発明の実施例６の処理フロ
ーチャートである。<Embodiment 6> FIG. 13 shows Embodiment 6 of the present invention.
In this example, the frame recognizing unit of the fourth embodiment is replaced with a document reproducing unit 63. The description of the same parts as in the fourth embodiment is omitted. FIG. 14 is a processing flowchart of the sixth embodiment of the present invention.

【００３０】実線罫線および点線罫線を求める、ステッ
プ６０７の処理までは実施例３と同じである。この処理
を主走査方向と副走査方向の両方に対して行い、主／副
走査両方向の実線／点線を抽出する。The processing up to step 607 for obtaining the solid ruled line and the dotted ruled line is the same as that of the third embodiment. This processing is performed in both the main scanning direction and the sub-scanning direction, and solid lines / dotted lines in both the main / sub-scanning directions are extracted.

【００３１】原稿再現部６３は、罫線データを実線、点
線の区別をしながら原稿の再現を行い、例えば、この罫
線データをＤＴＰ装置などに出力する。再現の方法とし
ては、抽出された罫線データ（罫線の存在範囲）をベク
トル化し、線分の太さの情報などを付ける。The document reproducing section 63 reproduces the document while distinguishing the ruled line data from solid lines and dotted lines, and outputs the ruled line data to a DTP device or the like, for example. As a reproducing method, the extracted ruled line data (range of ruled lines) is vectorized, and information such as the thickness of the line segment is added.

【００３２】〈実施例７〉図１５は、本発明の実施例７
の構成を示し、ブロック７１は図１１に示す構成からな
り、該ブロック７１に原稿再現部７２を接続してなる。
この原稿再現部７２には文字認識結果と、罫線データが
入力される。図１６は、本発明の実施例７の処理フロー
チャートである。<Embodiment 7> FIG. 15 shows Embodiment 7 of the present invention.
The block 71 has the configuration shown in FIG. 11, and is connected to the block 71 by a document reproducing section 72.
The result of character recognition and the ruled line data are input to the document reproducing section 72. FIG. 16 is a processing flowchart according to the seventh embodiment of the present invention.

【００３３】実施例５と同様の部分の説明は省略する。
実線罫線、点線罫線および文字認識結果を得る、ステッ
プ７０９の処理までは同一である。原稿再現部７２は、
抽出された罫線および文字コードを、原稿の再現のため
に利用する。罫線の再現に関しては、実施例６と同様で
あり、文字の再現に関しては認織された文字コードを２
値イメージメモリ上に存在していた絶対座標から算出し
て配置する。原稿上の文字、罫線ともにデータとしてＤ
ＴＰなどに入力する場合などに有効である。The description of the same parts as in the fifth embodiment is omitted.
The processing up to the processing of step 709 for obtaining the solid ruled line, the dotted ruled line and the character recognition result is the same. The manuscript reproducing unit 72
The extracted ruled lines and character codes are used for reproducing the document. The reproduction of the ruled line is the same as that of the sixth embodiment.
The position is calculated from the absolute coordinates existing on the value image memory. Both characters and ruled lines on the manuscript are D
This is effective when inputting to a TP or the like.

【００３４】〈実施例８〉図１７は、本発明の実施例８
の構成を示す。本実施例は、ソフトウェアによって実現
する場合の実施例であり、ＣＰＵ８１、メモリ８２、ハ
ードディスク８３、入力装置８４、ＣＤ−ＲＯＭドライ
ブ８５などからなる汎用の処理装置を用意する。ＣＤ−
ＲＯＭなどの記録媒体８６には、本発明の罫線認識方
法、表処理方法の処理機能や処理手順を実現させるため
のプログラムが記録されている。また、文書や帳票など
の原稿画像は、例えばハードディスク８３などに格納さ
れている。ＣＰＵ８１は、記録媒体８６から上記した処
理機能、手順を実現するプログラムを読み出し、逐一実
行し、罫線などを認識出力する。Embodiment 8 FIG. 17 shows Embodiment 8 of the present invention.
Is shown. This embodiment is an embodiment realized by software, and prepares a general-purpose processing device including a CPU 81, a memory 82, a hard disk 83, an input device 84, a CD-ROM drive 85 and the like. CD-
A recording medium 86 such as a ROM stores a program for realizing the processing functions and processing procedures of the ruled line recognition method and the table processing method of the present invention. Document images such as documents and forms are stored in the hard disk 83 or the like, for example. The CPU 81 reads a program for realizing the above-described processing functions and procedures from the recording medium 86, executes the program one by one, and recognizes and outputs ruled lines and the like.

【００３５】[0035]

【発明の効果】以上、説明したように、請求項１、８記
載の発明によれば、文字あるいは文字の一部が並んで印
字されている原稿に対して、誤って点線として抽出する
ことなく、点線を認識することが可能となる。As described above, according to the first and eighth aspects of the present invention, a document in which characters or a part of characters are printed side by side can be erroneously extracted as a dotted line. , The dotted line can be recognized.

【００３６】請求項２、９の発明によれば、行間の狭い
表などで点線が複数存在する原稿に対しても、直交方向
に誤って抽出することなく、本来の点線のみを正確に認
識することができる。According to the second and ninth aspects of the present invention, even for a document having a plurality of dotted lines in a table with a narrow line or the like, only the original dotted lines are accurately recognized without erroneous extraction in the orthogonal direction. be able to.

【００３７】請求項３の発明によれば、従来の実線認識
と合わせて同時に、点線を正確に認識することが可能に
なる。According to the third aspect of the present invention, the dotted line can be accurately recognized simultaneously with the conventional solid line recognition.

【００３８】請求項４の発明によれば、従来の枠認識を
行う表処理方法において、点線が含まれた表に対する処
理が可能となり、処理対象となる原稿種が増加する。According to the fourth aspect of the present invention, in the conventional table processing method for performing frame recognition, processing can be performed on a table including a dotted line, and the number of originals to be processed increases.

【００３９】請求項５の発明によれば、従来の枠認識を
行った後に、枠内に書かれている文字を認識する表処理
方法において、点線が含まれた表に対する処理が可能と
なり、処理対象となる原稿種が格段に増加する。According to the fifth aspect of the present invention, in a conventional table processing method for recognizing characters written in a frame after performing frame recognition, processing on a table including a dotted line becomes possible. The number of target document types is significantly increased.

【００４０】請求項６の発明によれば、紙に印刷された
原稿を基にして、罫線からなるフオーマット情報をＤＴ
Ｐなどに入力する場合に、従来認識できなかった点線の
罫線も認識できるため、原稿をより忠実に再現すること
ができる。According to the sixth aspect of the present invention, based on a document printed on paper, the format information consisting of ruled lines
When an input is made to P or the like, a dotted ruled line that could not be recognized conventionally can be recognized, so that the original can be reproduced more faithfully.

【００４１】請求項７の発明によれば、紙に印刷された
原稿を基にして、表などの罫線及び文字が書かれた原稿
をＤＴＰなどに入力する場合に、従来認識できなかった
点線の罫線や、点線があるために文字認識の妨げになっ
ていた原稿でも入力対象とすることが可能となり、多様
な原稿をより忠実に再現することができる。According to the seventh aspect of the present invention, when a document on which ruled lines such as a table and characters are written is input to a DTP or the like based on a document printed on paper, a dotted line which cannot be recognized conventionally can be used. It is possible to input a document that has been hindered by character recognition due to the presence of ruled lines or dotted lines, so that various documents can be reproduced more faithfully.

[Brief description of the drawings]

【図１】本発明の実施例１の構成を示す。FIG. 1 shows a configuration of a first exemplary embodiment of the present invention.

【図２】本発明の実施例１の処理フローチャートを示
す。FIG. 2 shows a processing flowchart of Embodiment 1 of the present invention.

【図３】黒画素占有率を説明する図である。FIG. 3 is a diagram illustrating a black pixel occupancy.

【図４】本発明の実施例２の構成を示す。FIG. 4 shows a configuration of a second exemplary embodiment of the present invention.

【図５】本発明の実施例２の処理フローチャートを示
す。FIG. 5 shows a processing flowchart of Embodiment 2 of the present invention.

【図６】矩形に対してラベル付与を説明する図である。FIG. 6 is a diagram illustrating labeling for a rectangle.

【図７】本発明の実施例３の構成を示す。FIG. 7 shows a configuration of a third embodiment of the present invention.

【図８】本発明の実施例３の処理フローチャートを示
す。FIG. 8 shows a processing flowchart according to a third embodiment of the present invention.

【図９】本発明の実施例４の構成を示す。FIG. 9 shows a configuration of a fourth embodiment of the present invention.

【図１０】本発明の実施例４の処理フローチャートを示
す。FIG. 10 shows a processing flowchart according to a fourth embodiment of the present invention.

【図１１】本発明の実施例５の構成を示す。FIG. 11 shows a configuration of a fifth embodiment of the present invention.

【図１２】本発明の実施例５の処理フローチャートを示
す。FIG. 12 shows a processing flowchart of Embodiment 5 of the present invention.

【図１３】本発明の実施例６の構成を示す。FIG. 13 shows a configuration of Embodiment 6 of the present invention.

【図１４】本発明の実施例６の処理フローチャートを示
す。FIG. 14 shows a processing flowchart of Embodiment 6 of the present invention.

【図１５】本発明の実施例７の構成を示す。FIG. 15 shows a configuration of Example 7 of the present invention.

【図１６】本発明の実施例７の処理フローチャートを示
す。FIG. 16 is a flowchart illustrating a process according to a seventh embodiment of the present invention.

【図１７】本発明の実施例８の構成を示す。FIG. 17 shows a configuration of Example 8 of the present invention.

[Explanation of symbols]

１２値画像入力部２２値イメージメモリ３矩形抽出部４矩形メモリ５黒画素比率算出部６点線要素判定部７点線要素メモリ８点線罫線抽出部 Reference Signs List 1 binary image input unit 2 binary image memory 3 rectangle extraction unit 4 rectangle memory 5 black pixel ratio calculation unit 6 dotted line element determination unit 7 dotted line element memory 8 dotted line ruled line extraction unit

Claims

[Claims]

1. A rule recognition method for extracting a rectangle including all connected black pixels from a binary image, and extracting a dotted rule by integrating the rectangles. A ruled line recognition method characterized by determining the validity of a dotted ruled line based on a ratio.

2. A rule recognition method for extracting a rectangle including all connected black pixels from a binary image, and extracting a dotted rule by integrating the rectangle, wherein a main scanning direction or a sub-scanning direction of the image is provided. Labeling a rectangle used for extracting the dotted ruled line, and extracting the dotted ruled line in the sub-scanning direction or the main scanning direction without using the labeled rectangle. Weaving method.

3. A black run having a length equal to or greater than a predetermined threshold is extracted from the binary image. When the extracted black runs are within a distance within a predetermined threshold, the black runs are integrated to form a solid line. 3. A ruled line recognizing method characterized by performing a process of extracting a ruled line and a process of extracting a dotted ruled line by the method according to claim 1 or 2.

4. The processing according to claim 3, which is performed in the main scanning direction and the sub-scanning direction, extracts ruled lines in the main-scanning direction and ruled lines in the sub-scanning direction, and recognizes a frame by combining the extracted ruled lines. A table processing method.

5. A table processing method comprising: extracting characters in a frame from a frame region extracted by the processing according to claim 4; and recognizing the characters.

6. A table processing method, comprising outputting coordinate values and ruled line types of a solid line ruled line and a dotted line ruled line extracted by the processing according to claim 3, and reproducing a document corresponding to the ruled line type.

7. A table processing method comprising: outputting a character code recognized by the processing according to claim 5; and simultaneously outputting a ruled line by processing according to claim 6, to reproduce a document.

8. A recording medium for recording a program for causing a computer to realize a function of extracting a rectangle including all connected black pixels from a binary image, and extracting a dotted ruled line by integrating the rectangle. A recording medium for recording a program for realizing a function of determining validity as a dotted rule line based on the ratio of black pixels in the extracted rectangle.

9. A recording medium storing a program for realizing a function of extracting a rectangle including all connected black pixels from a binary image, and extracting a dotted rule by integrating the rectangle. A label is attached to the rectangle used to extract the dotted line in the main scanning direction or the sub-scanning direction of the image, and the label is attached when extracting the dotted line in the sub-scanning direction or the main scanning direction. A recording medium on which a program for realizing a function not using a rectangle is recorded.