JPH03230289A

JPH03230289A - Method for determining ruled line and character recognizing device

Info

Publication number: JPH03230289A
Application number: JP2026281A
Authority: JP
Inventors: Hiroshi Sasaki; 央佐々木
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1990-02-06
Filing date: 1990-02-06
Publication date: 1991-10-14
Anticipated expiration: 2012-11-26
Also published as: JP2683290B2

Abstract

PURPOSE:To accurately decide ruled lines from a table type document whose format is not fixed by applying normal histogram to a line decided as a temporary ruled line candidate to decide the ruled line and converting a line decided as a real ruled line from the temporary ruled line candidate into a ruled line candidate. CONSTITUTION:A normal histogram is formed from a table type document image. Then, an OR image based upon the OR data of black picture elements of two picture elements having the same positional coordinates in the ruled line direction coordinates of two continuous lines is formed and the number of black picture elements on each line is counted up by means of the OR image to form an OR histogram. The normal histogram is furthermore applied to a line decided as a temporary ruled line candidate to decide the probability of the ruled line and the line decided as a real ruled line is set up as the final ruled line candidate. Even in a table type document whose format is not fixed, the ruled lines can be accurately recognized and accurate character recognition can be attained.

Description

【発明の詳細な説明】（概要）イメージ入力による文字認識装置に関し、フォーマット
の一定しない表形式文書より罫線を精度よく判定するこ
とを目的とし、読み取りラインの黒画素数のノーマルヒストグラムと、
連続する２つのラインの黒画素について（７）ＯＲイメ
ージにより黒画素をラインごとに数えたＯＲヒスクグラ
ムとを作成し、ＯＲヒストグラムの各ラインにおけるヒ
ストグラムデータが増加から減少に転じるときのヒスト
グラムデータのピークを判定し、任意の注目ラインのピ
ークの高さもしくは幅について隣りあうピークもしくは
前後のラインのヒストグラムデータと比較することによ
り当該注目ラインについて、真の罫線であるとする罫線
候補と、罫線らしいとする仮罫線候補と、罫線でないも
のを判定し、仮罫線候補と判定されたラインについては
、さらにノーマルヒストグラムにより罫線判定し、ノー
マルヒストグラムにより真の罫線と判定されたラインを
仮罫線候補から罫線候補に変換する構成を持つ。[Detailed Description of the Invention] (Summary) Regarding a character recognition device using image input, the present invention aims to accurately determine ruled lines from a tabular document with an inconsistent format, and a normal histogram of the number of black pixels of a reading line.
For the black pixels of two consecutive lines, (7) Create an OR histogram in which the black pixels are counted for each line using the OR image, and calculate the peak of the histogram data when the histogram data in each line of the OR histogram changes from increasing to decreasing. By comparing the height or width of the peak of any line of interest with the histogram data of adjacent peaks or the lines before and after, the line of interest is identified as a true ruled line and as a likely ruled line. For lines that are determined to be temporary ruled line candidates, the ruled line is further determined using a normal histogram, and the lines that are determined to be true ruled lines by the normal histogram are selected from the temporary ruled line candidates. It has a configuration that converts it to .

[Industrial application field]

本発明は、文字認識装置における表形式文書の罫線の認
識方法、および罫線削除手段を備えていて罫線削除後に
文字認識を行う文字認識装置に関する。The present invention relates to a method for recognizing ruled lines in a tabular document in a character recognition device, and a character recognition device that includes a ruled line deletion means and performs character recognition after deleting the ruled lines.

（従来技術）従来の文字認識装置における文字認識方法を第１８図に
示す。(Prior Art) A character recognition method in a conventional character recognition device is shown in FIG.

従来の文字認識装置においては、まずイメージスキャナ
等の文書読み取り手段９１により文書９０の文字、罫線
等をイメージとして読み取り処理を行う。In a conventional character recognition device, first, a document reading means 91 such as an image scanner performs a reading process using characters, ruled lines, etc. of a document 90 as images.

次に、文字パターン分離手段９２により、読み取ったイ
メージから１文字ずつのパターンを分離して取り出し、
文字認識手段９３が１文字ずつ取りだされたパターンに
ついて、文字認識を行う。Next, the character pattern separating means 92 separates and extracts the pattern of each character from the read image,
Character recognition means 93 performs character recognition on the pattern extracted one character at a time.

そして、文字パターン分離手段９２おける分離処理は、
まず、読み取ったイメージより、行の集合体のブロック
を認識しく９４）、次に行の集合ブロックより文字の集
合体である行を認識して、−行単位に分離する（９５）
。さらに、行を分離した後に、行内における文字を１文
字車位に分離し、分離した各文字について文字認識を行
う。The separation process in the character pattern separation means 92 is as follows:
First, from the read image, recognize a block that is a collection of lines (94), then recognize a line that is a collection of characters from the collection block of lines, and separate it into -line units (95)
. Furthermore, after the lines are separated, the characters within the line are separated into one character space, and character recognition is performed for each separated character.

［発明が解決しようとする課題〕文字認識装置において、フォーマットの一定していない
表形式文書をイメージとして読み取る場合、文字と一緒
に罫線もイメージ止して読み取られる。[Problems to be Solved by the Invention] When a character recognition device reads a tabular document with an inconsistent format as an image, the ruled lines are read as images along with the characters.

そして、従来の文字読み取り装置においては、読み取ら
れた罫線を含むイメージより、行分離、文字分離処理を
行っていたため、罫線を文字パターンの一部と混同し、
正しい文字分離を行うことが困難になり、文字認識精度
を低下させていた。Conventional character reading devices perform line separation and character separation processing based on the read image that includes the ruled lines, so the ruled lines may be confused with part of the character pattern.
It became difficult to perform correct character separation, reducing character recognition accuracy.

本発明は、フォーマットの一定しない表形式文書におい
て精度よく罫線を認識し、罫線削除後に罫線を含まない
イメージデータとして、精度のよい文字認識を行うこと
を目的とする。SUMMARY OF THE INVENTION An object of the present invention is to accurately recognize ruled lines in a tabular document whose format is not constant, and to perform highly accurate character recognition as image data that does not include ruled lines after deletion of ruled lines.

〔課題を解決するための手段］本発明は、表形式文書のイメージより、まず、文書デー
タの読み取り走査における罫線方向のラインの黒の画素
数を１ラインごとに数えたノーマルヒストグラムを作成
する。[Means for Solving the Problems] The present invention first creates a normal histogram from an image of a tabular document by counting the number of black pixels for each line in the ruled line direction during reading and scanning of document data.

次に、連続する２つのラインの罫線方向の座標における
同じ位置座標を持つ２つの画素の黒画素についてのＯＲ
データによるＯＲイメージを作成する。Next, OR for the black pixel of two pixels with the same position coordinates in the coordinates in the ruled line direction of two consecutive lines
Create an OR image using data.

そして、ＯＲイメージにより各ライン上の黒画素数を数
えてＯＲヒストグラムを作成する。Then, the number of black pixels on each line is counted using the OR image to create an OR histogram.

そして、ＯＲヒストグラムの各ラインにおけるヒストグ
ラムデータが増加から減少に転じるときのヒストグラム
データのピークを判定し、そのピークについて真の罫線
と判定される罫線候補と罫線らしいと判定される仮罫線
候補とを識別する。Then, the peak of the histogram data when the histogram data in each line of the OR histogram changes from increasing to decreasing is determined, and regarding the peak, ruled line candidates determined to be true ruled lines and temporary ruled line candidates determined to be similar to ruled lines are determined. identify

そこで、罫線候補と判定されたラインは最終的に罫線候
補とし、仮罫線候補と判定されたラインについては、さ
らにノーマルヒストグラムにより、罫線の確からしさを
判定し、そこで真の罫線と判定されたラインを最終的な
罫線候補とするようにした。Therefore, lines that are determined to be ruled line candidates are finally treated as ruled line candidates, and for lines that are determined to be temporary ruled line candidates, the certainty of the ruled line is further determined using a normal histogram, and the lines that are determined to be true ruled lines are is now used as the final ruled line candidate.

その後に、罫線候補を削除し、文字部分のみのイメージ
データにより文字認識を行うようにした。After that, the ruled line candidates were deleted and character recognition was performed using image data of only the character parts.

第１図に、本発明の罫線判定方法の基本構成を示す。FIG. 1 shows the basic configuration of the ruled line determination method of the present invention.

図において、１は表形式文書、２は読み取り手段により
読み取られた文書の元イメージ、３は元イメージ２より
、文書データの読み取り走査における罫線方向のライン
の黒の画素数を１ラインごとに数えたノーマルヒストグ
ラム（ＮＲヒストグラム）、４は元イメージにおける罫
線イメージの欠け、ズレ等を補正するため、連続する２
つのラインの罫線方向の位置座標が同じ画素の黒画素に
ついてＯＲデータを取り作成したＯＲイメージ、５はＯ
Ｒイメージにおける各ラインの黒画素数を数えて作成し
たＯＲヒストグラムである。In the figure, 1 is a tabular document, 2 is the original image of the document read by the reading means, and 3 is the original image 2. Count the number of black pixels for each line in the ruled line direction during scanning of the document data. 4 is a normal histogram (NR histogram), which is a continuous 2
OR image created by taking OR data for black pixels with the same position coordinates in the ruled line direction of two lines, 5 is O
This is an OR histogram created by counting the number of black pixels in each line in the R image.

図におけるフローは、本発明の罫線判定方法の原理を示
す。The flow in the figure shows the principle of the ruled line determination method of the present invention.

［作用］図におけるフローの番号に従って、本発明の罫線判定方
法の原理を説明する。[Operation] The principle of the ruled line determination method of the present invention will be explained according to the flow numbers in the figures.

■　表形式文書１の文字および罫線をイメージとして読
み取り、黒画素と白画素より成る元イメージ２を作成す
る。■ Read the characters and ruled lines of the tabular document 1 as an image, and create an original image 2 consisting of black pixels and white pixels.

■　読み取ったイメージデータにより文書データの読み
取り走査における罫線方向のラインの黒の画素数を１ラ
インごとに数えたＮＲヒストグラム３を作成する。(2) Using the read image data, an NR histogram 3 is created by counting the number of black pixels for each line in the ruled line direction during reading and scanning of the document data.

■　元イメージ２より連続する２つのラインの画素によ
り黒画素についてのＯＲデータによるＯＲイメージを作
成し、各ラインの黒画素数を数えて、黒画素についての
ＯＲヒストグラム５を作成する。(2) Create an OR image using OR data for black pixels from pixels in two consecutive lines from the original image 2, count the number of black pixels in each line, and create an OR histogram 5 for the black pixels.

■、■　ＯＲヒストグラムについて、ヒストグラムデー
タが増加から減少に転するピークを検出し、任意のピー
クと隣あうピークの高さの比較あるいはピークの前後の
ヒストグラム値との比較によりヒストグラムの幅等を判
定する。それにより、ＯＲヒストグラムにより得られた
ピークのあるラインが真の罫線と判定される罫線候補か
、罫線らしいと判定される仮罫線候補かあるいは罫線で
ないかの罫線判定処理を行う。■、■ For OR histograms, detect the peak where the histogram data changes from increasing to decreasing, and determine the width of the histogram by comparing the height of any peak and the adjacent peak or comparing the histogram values before and after the peak. do. Thereby, a ruled line determination process is performed to determine whether a line with a peak obtained by the OR histogram is a ruled line candidate determined to be a true ruled line, a temporary ruled line candidate determined to be a ruled line, or not a ruled line.

■、■　ＯＲヒストグラムにより仮罫線候補と判定され
たラインについては、該当するラインについてＮＲヒス
トグラム３によりピークの前後のラインのヒストグラム
値によりピークの幅を調べ、そのラインが真の罫線であ
るかないか判定する。■、■ For a line determined to be a temporary ruled line candidate by the OR histogram, the width of the peak is checked using the histogram values of the lines before and after the peak using NR histogram 3 for the corresponding line, and whether or not the line is a true ruled line is determined. judge.

■、■　その結果、真の罫ＦＡ候補と判定されたピーク
は仮罫線候補を罫線候補に変換する。(2), (2) As a result, peaks determined to be true ruled line FA candidates are converted from temporary ruled line candidates into ruled line candidates.

■、［相］　ノーマルヒストグラムのピークの判定結果
、罫線でないと判定されたピークは仮罫線候補とする。(2) [Phase] As a result of peak determination in the normal histogram, peaks that are determined not to be ruled lines are treated as temporary ruled line candidates.

■、■　ＯＲヒストグラムによるピークの判定結果、真
の罫線と判定されたピークは、その時点で、最終的に罫
線候補とする。(2), (2) As a result of the peak determination based on the OR histogram, a peak that is determined to be a true ruled line is finally determined as a ruled line candidate at that point.

■、■　ＯＲヒストグラムによるピークの判定結果にお
いて、罫線でないと判定されたピークはその時点で、最
終的に罫線でないものとす、る。(2), (2) In the peak determination result based on the OR histogram, a peak that is determined to be not a ruled line is finally determined to be not a ruled line at that point.

本発明の認識方法によれば、元イメージにおける罫線に
欠けや位置ずれがあった場合にも、ＯＲイメージにより
補正され、それらが補正されたイメージにより罫線判定
を行うことと、さらに、確実に罫線であるとは判定しが
たい仮罫線候補については元イメージのノーマルヒスト
グラムによりさらに罫線判定をおこなうようにしたため
、正確な罫線判定を行うことができる。According to the recognition method of the present invention, even if there is a missing or misaligned line in the original image, it is corrected by the OR image, and the line is determined based on the corrected image. For temporary ruled line candidates that are difficult to determine, ruled line determination is further performed using the normal histogram of the original image, so that accurate ruled line determination can be performed.

〔Example〕

第２図は、本発明の罫線判定方法により罫線判定をし、
罫線を削除した後、文字認識を行う文字認識装置の実施
例構成を示す。FIG. 2 shows ruled lines determined by the ruled line determining method of the present invention;
An example configuration of a character recognition device that performs character recognition after deleting ruled lines is shown.

図において、２１は表形式文書、２２は表形式文書２１
の文字、罫線をイメージとして読み取るイメージスキャ
ナ等の文書読み取り手段、２３は読み取ったイメージか
ら罫線部分を判定する罫線判定手段、２４は罫線判定手
段の判定結果によりイメージから罫線を削除する罫線削
除処理手段、２５は罫線を削除したイメージから文字を
認識する文字認識処理手段、２６は罫線判定プログラム
、文字認識プログラム等のプログラムを実行するマイク
ロプロセッサ、２７は読み取ったイメージ等のデータを
格納する画像データ格納共用メモリ、２８は罫線の判定
処理、文字認識処理等の作業を行うワークメモリ、３０
は読み取られたイメージより罫線部分を判定する罫線判
定処理、３１は罫線判定処理３０の判定結果のデータ、
３２は罫線を削除されたイメージより１文字ずつのパタ
ーンを分離する処理、３３は分離された各１文字のパタ
ーンについて文字認識を行う処理である。In the figure, 21 is a tabular document, 22 is a tabular document 21
document reading means such as an image scanner that reads characters and ruled lines as images, 23 a ruled line determining means for determining ruled line portions from the read image, and 24 a ruled line deletion processing means for deleting ruled lines from the image based on the determination result of the ruled line determining means. , 25 is a character recognition processing means for recognizing characters from an image from which ruled lines have been removed, 26 is a microprocessor that executes programs such as a ruled line determination program and a character recognition program, and 27 is an image data storage for storing data such as the read image. A shared memory 28 is a work memory 30 that performs operations such as ruling line determination processing and character recognition processing.
31 is the data of the judgment result of the ruled line judgment process 30;
32 is a process of separating patterns of each character from the image from which ruled lines have been removed, and 33 is a process of character recognition for each separated pattern of one character.

次に、本発明の実施例の装置構成の文字認識装置におけ
る罫線認識方法を第３図ないし第１７図により説明する
。Next, a ruled line recognition method in a character recognition device having an apparatus configuration according to an embodiment of the present invention will be explained with reference to FIGS. 3 to 17.

以下、横方向の罫線の認識方法の実施例について説明す
るが、縦方向の罫線の認識についても、同様の方法によ
り処理できる。An example of a method for recognizing horizontal ruled lines will be described below, but recognition of vertical ruled lines can also be processed using a similar method.

以下、第３図から第１７図までの図の番号の順に本発明
の罫線認識方法を説明する。The ruled line recognition method of the present invention will be explained below in the order of the figure numbers from FIG. 3 to FIG. 17.

（１）第３図（ａ）に示すように、まず、罫線認識処理
に先立って、入力した原画のイメージに対して圧縮画像
を作成する。(1) As shown in FIG. 3(a), first, prior to the ruled line recognition process, a compressed image is created for the input original image.

図（ａ）において、圧縮画像４１は、原画４０をデータ
圧縮した画像の例である。In Figure (a), a compressed image 41 is an example of an image obtained by data-compressing the original image 40.

圧縮は、白画素の部分を１部削除する等により、抽出す
る罫線に対して垂直方向（図における縦方向）の圧縮率
が、圧縮画像上で、罫線と罫線の文字との間（ｇ　　）
に必ず、１画素分以上の空隙ができる程度に行う。Compression is performed by deleting a portion of white pixels, etc., so that the compression ratio in the direction perpendicular to the extracted ruled line (vertical direction in the figure) is the distance (g) between the ruled line and the character on the ruled line on the compressed image.
Be sure to do this to the extent that there is a gap of one pixel or more.

原画のイメージを圧縮することにより、文字イメージ部
分が塊化したイメージとなり、罫線部分と区別し易くな
る。By compressing the original image, the character image portion becomes a lumped image, making it easier to distinguish it from the ruled line portion.

（２）　次に、第３図（ｂ）に示すように、圧縮画像４
２を１６６画素つに区分し、抽出する罫線方向に対して
垂直方向（図における縦方向）に、黒画素数についての
ヒストグラム４３を作成する。(2) Next, as shown in FIG. 3(b), the compressed image 4
2 into 166 pixels, and create a histogram 43 for the number of black pixels in a direction perpendicular to the direction of the ruled line to be extracted (vertical direction in the figure).

１６画画素値に、区分する理由は、罫線の認識条件とし
て、罫線の傾きが、−３°〜＋３°以内とした場合に、
罫線を精度よく読み取るようにするためである。The reason for dividing into 16 pixel values is that when the ruled line recognition condition is that the slope of the ruled line is within -3° to +3°,
This is to ensure that the ruled lines can be read accurately.

あるいは、第４図に示すように、あらかじめ文書原稿の
セットされた傾き４４を読み取ることにより、圧縮画像
４５のイメージを罫線の傾きに沿って追跡し、黒画素を
カウントすることによりヒストグラム４６を作成する。Alternatively, as shown in FIG. 4, a histogram 46 is created by reading the inclination 44 set in advance on the document, tracing the image of the compressed image 45 along the inclination of the ruled lines, and counting the black pixels. do.

本発明では、原画の圧縮イメージ（元イメージ）に対し
て、イメージを読み取ったときの走査ラインの各ライン
上の黒画素数を数えるごとに作成するノーマル（ｎｏｒ
ｍａｌ）ヒストグラム（以後、ＮＲヒストグラムと表す
）の他に、任意の画素ラインと次のラインの黒画素につ
いてのＯＲデータにより作成したＯＲイメージについて
、その各ラインの黒画素数をカウントしたＯＲヒストグ
ラムの２種類のヒストグラムを作成する。In the present invention, for a compressed image (original image) of an original image, a normal (or
mal) histogram (hereinafter referred to as NR histogram), an OR image created by OR data of black pixels of a given pixel line and the next line, and an OR histogram that counts the number of black pixels in each line. Create two types of histograms.

第５図（ａ）にＮＲヒストグラムの例を示す。FIG. 5(a) shows an example of an NR histogram.

図において、４７は原画の圧縮画像についての元イメー
ジである。In the figure, 47 is an original image of a compressed original image.

４８は元イメージ４７に対する黒画素数によるＮＲヒス
トグラムである。48 is an NR histogram based on the number of black pixels for the original image 47.

第５１１（ｂ）にＯＲヒストグラムの例を示す。511(b) shows an example of an OR histogram.

図において、４９は元イメージであり、５０は任意のラ
インの画素と次のラインの画素について、ＯＲデータに
より作成したＯＲイメージであり、５１はＯＲイメージ
の各ラインの黒画素を数えて作成したＯＲヒストグラム
である。In the figure, 49 is the original image, 50 is an OR image created using OR data for pixels of an arbitrary line and pixels of the next line, and 51 is an OR image created by counting the black pixels of each line of the OR image. This is an OR histogram.

本発明において、ＯＲヒストグラムを作成する理由を第
６図により説明する。The reason why an OR histogram is created in the present invention will be explained with reference to FIG.

図において、元イメージ５２．５３のラインｌは罫線で
あるが、図に示されるように罫線に欠けがある場合には
、そのラインのＮＲヒストグラム５４はＮＲヒストグラ
ムの■、■に示されるように、ヒストグラム値が小さく
なり、罫線としての見分けがつきにくくなる。In the figure, line l of the original image 52.53 is a ruled line, but if the ruled line is missing as shown in the figure, the NR histogram 54 of that line will be as shown in the NR histogram ■, ■. , the histogram value becomes small, and it becomes difficult to distinguish the line as a ruled line.

ラインｌについてライン２との黒画素についてのＯＲイ
メージにより作成されたＯＲヒストグラム５４′におい
ては、ヒストグラムは■″、■。In the OR histogram 54' created by the OR image of line 1 and black pixels of line 2, the histograms are ■'', ■.

となり、罫線であることが明確になる。This makes it clear that it is a ruled line.

また、元イメージ５２のライン４と元イメージのライン
５のように、罫線がずれているような場合には、ＮＲヒ
ストグラム５４は、■、■のように、それぞれ−本ずつ
孤立するが、ＯＲヒストグラムにおいては、■゛、■°
のように罫線が特徴化されるため、罫線判定処理におい
て罫線として見落としすることがなくなる。In addition, in cases where the ruled lines are misaligned, such as line 4 of the original image 52 and line 5 of the original image, the NR histogram 54 will be isolated by - lines like ■ and ■, but OR In the histogram, ■゛, ■°
Since the ruled lines are characterized as shown in FIG.

（３）　本発明では、先ず上記のように作成したＯＲヒ
ストグラムにより、先ず罫線を判定する。(3) In the present invention, ruled lines are first determined using the OR histogram created as described above.

第７図（ａ）、（ｂ）により、ＯＲヒストグラムから罫
線を判定する方法を説明する。A method for determining ruled lines from an OR histogram will be explained with reference to FIGS. 7(a) and 7(b).

先ず、図（ａ）に示すように、ヒストグラム値が増加か
ら減少に転するピークのある位置を抽出する。First, as shown in Figure (a), a position with a peak where the histogram value changes from increasing to decreasing is extracted.

図示のＯＲヒストグラムの場合、ピーク０〜ビーク３ま
での４つのピークが抽出される。In the case of the illustrated OR histogram, four peaks from peak 0 to peak 3 are extracted.

次に、ピークを降順（図における下方向）に、２つのピ
ークを１組として、ピークのあるラインが罫線であるか
ないかの判定を行う。Next, the peaks are arranged in descending order (downward in the figure), two peaks are set as a set, and it is determined whether the line with the peak is a ruled line or not.

それは、まず、２つのピークの値が共に最大値の場合（
ピーク０とピーク１）には、上位のピークのあるライン
が罫線のあるラインとなる可能性があると判断して、ピ
ークＯを罫線判定候補として抽出する。First, if the values of the two peaks are both the maximum value (
For peak 0 and peak 1), it is determined that there is a possibility that a line with an upper peak becomes a line with a ruled line, and peak O is extracted as a ruled line determination candidate.

この条件を満たさない場合には、１組の２つのピークの
うち小さい方を選択し、これをＰＫとする（ピーク１と
ピーク２の場合には、ピーク２をＰＫとする）。If this condition is not met, the smaller of the two peaks in the set is selected and set as PK (in the case of peak 1 and peak 2, peak 2 is set as PK).

そして、２つのピークの間のヒストグラム値の最小値を
とるライン（ｖｌｌｙ）を求め、その値を■０とする（
ピークｌとピーク２の場合にはｖｌｌｙ２を■０とする
）。なお、端のピーク３の場合には、ピーク３の値をＰ
Ｋとし、ｖｌｌｙ４を■０とする。Then, find the line (vlly) that takes the minimum value of the histogram values between the two peaks, and set that value to ■0 (
In the case of peak 1 and peak 2, vlly2 is set to ■0). In addition, in the case of peak 3 at the edge, the value of peak 3 is P
K, and vlly4 is ■0.

■０について、あらかじめ設定しである谷間闇値（４程
度）とくらべ、小さい方をＶＬとする。(2) Regarding 0, compare it with a preset valley darkness value (approximately 4) and set the smaller one as VL.

ＰＫと■０が次の関係式Ａを満たす場合には、１組の２
つのピークのうちの上位のピークを罫線ピークとなる可
能性があると判断して、罫線判定候補として抽出する。If PK and ■0 satisfy the following relational expression A, a set of 2
The higher peak among the two peaks is determined to have a possibility of becoming a ruled line peak, and is extracted as a ruled line determination candidate.

条件式Ａ、：　　（ＰＫ−ＶＯ）＞　　ＶＬピーク１と
ピーク２の場合選択したピーク２がこの条件を満たすの
で、ピーク１を罫線判定候補とする。Conditional expression A: (PK-VO)>VL In the case of peak 1 and peak 2, the selected peak 2 satisfies this condition, so peak 1 is selected as a ruled line determination candidate.

（４）次に、上記の方法により、罫線判定候補について
、その抽出したピークを中心に、ヒストグラムを昇順（
図における上方向）、降順に走査し、そのヒストグラム
値および幅について調べ、真の罫線候補と仮罫線候補を
抽出する。(4) Next, using the above method, the histograms of the ruled line determination candidates are arranged in ascending order (
(in the upper direction in the figure) in descending order, examine the histogram value and width, and extract true ruled line candidates and temporary ruled line candidates.

その判定条件を次に説明する。The determination conditions will be explained below.

処理パラメータとして次の値を定義する。Define the following values as processing parameters.

ピーク閾値ｌピーク闇値２対ピーク闇値率罫線幅間値１罫線幅閾稙２＝ｐｔｉ＝Ｐｔ２＝ｐｐ＝Ｌｔｌ＝Ｌｔ２（適値＝５）（適値＝２）（通値＝３）（通値＝２ｍｍ）（通値＝３ｍｍ）次に、上記の闇値について、罫線判定候補のＯＲヒスト
グラム値をピーク値として、下記の条件について判定す
る。Peak threshold l Peak dark value 2 Ratio to peak dark value 1 Line width threshold 2 = pti = Pt2 = pp = Ltl = Lt2 (Appropriate value = 5) (Appropriate value = 2) (Regular value = 3) (Normal value = 2 mm) (Normal value = 3 mm) Next, regarding the above-mentioned darkness value, the following conditions are determined using the OR histogram value of the ruled line determination candidate as the peak value.

１、ピーク値とＰｔｌとの差をｔｈｌとする。1. Let thl be the difference between the peak value and Ptl.

ピーク値−Ｐｔｌ＝ｔｈ１２、ピーク値をＰｔ２て割った商をｔｈ２とする。Peak value - Ptl = th1 2. Let th2 be the quotient obtained by dividing the peak value by Pt2.

ピーク値／Ｐｔ２＝ｔｈ２３、ｔｈｌとｔｈ２で小さい方をｔｈ３とする。Peak value/Pt2=th2 3. Set the smaller of thl and th2 as th3.

４、ピーク値をｐｐで割った商をｔｈ４とする。4. Let th4 be the quotient of the peak value divided by pp.

ピーク値／　Ｐ　ｐ　＝　ｔ　ｈ　４５、ｔｈ３とｔｈ４とで小さい方をＴＨ２とし、大きい
方をＴＨＩとする。Peak value/P p = t h 4 5, the smaller one of th3 and th4 is set as TH2, and the larger one is set as THI.

６、次に、ピークの位置から、降順および昇順に走査し
、そのヒストグラム値が、ＴＨＩより大きい範囲のヒス
トグラムを持つラインの本数を調べ、このライン数をｗ
ｌとする。6. Next, scan from the peak position in descending and ascending order, find the number of lines whose histogram values are larger than THI, and calculate this number by w.
Let it be l.

７、同様に、走査し、そのヒストグラム値が、ＴＨ２よ
り大きい範囲のヒストグラム値を持つラインの本数を調
べ、このライン数をｗ２とする。7. Similarly, scan and check the number of lines whose histogram values are in a range larger than TH2, and set this number of lines as w2.

８、上記のＷｌ、Ｗ２が次の条件判定式の何れか１つを
満たせば、そのピークは真の罫線候補と判定する。8. If the above Wl and W2 satisfy any one of the following conditional expressions, the peak is determined to be a true ruled line candidate.

条件式Ｂ　　Ｌｔｌ≧ｗ１条件式ＣＬｔ２≧ｗ２９、ピークが条件式Ｂ、条件式〇のいずれも満たさない
場合には、そのヒストグラム値を調べ、最大値（＝１６
）の場合には、そのピークを罫線らしいと判定する仮罫
線候補とする。Conditional expression B Ltl≧w1 Conditional expression CLt2≧w2 9. If the peak does not satisfy either conditional expression B or conditional expression 〇, check its histogram value and calculate the maximum value (=16
), the peak is determined to be a temporary ruled line candidate that is determined to be a ruled line.

それ以外の場合には、そのピークは真の罫線候補にも仮
罫線候補にもしない。In other cases, the peak is neither a true ruled line candidate nor a temporary ruled line candidate.

以後、真の罫線候補を単に罫線候補と称する。Hereinafter, true ruled line candidates will be simply referred to as ruled line candidates.

第７図（ｂ）に上記のＯＲヒストグラムのピーク値およ
び幅による判定方法におけるヒストグラムとＴＨＩ、Ｔ
Ｈ２、ｗｌ、ｗ２を示す。図は、ｗ１＝２、ｗ２−４の
場合を示す。Figure 7(b) shows the histogram and THI, T in the determination method based on the peak value and width of the OR histogram.
H2, wl, w2 are shown. The figure shows the case where w1=2 and w2-4.

以上で、ＯＲヒス１−グラムによるピークに対する罫線
候補、仮罫線候補の抽出処理を終わる。This completes the process of extracting ruled line candidates and temporary ruled line candidates for peaks based on the OR His 1-gram.

上記のＯＲヒストグラムによる処理の結果、罫線候補と
判定されたピークは、最終的に罫線候補とする。As a result of the above OR histogram processing, peaks determined to be ruled line candidates are finally determined to be ruled line candidates.

（５）一方、仮罫線候補については、次に、ＮＲヒスト
グラムにより、罫線判定を行う。(5) On the other hand, regarding temporary ruled line candidates, next, ruled line determination is performed using the NR histogram.

ＯＲヒストグラムによる判定の結果、仮罫線候補とされ
たラインについて、ＮＲヒストグラムにより、前記の（
４）と同様の方法でピーク値と闇値の関係を調べ、その
ラインがＮＲヒストグラムにおいて罫線候補と判定され
る場合には、その仮罫線候補を罫線候補に変換する。As a result of the judgment using the OR histogram, the line determined as a temporary ruled line candidate is determined by the NR histogram as described above (
The relationship between the peak value and the dark value is investigated in the same manner as in 4), and if the line is determined to be a ruled line candidate in the NR histogram, the temporary ruled line candidate is converted to a ruled line candidate.

その他の場合には、そのまま仮罫線候補として残す。In other cases, it is left as is as a temporary ruled line candidate.

上記の関係を第８図に示す。The above relationship is shown in FIG.

図において（ａ）はＯＲヒストグラムにおける判定結果
であり、（ｂ）はＮＲヒストグラムにより得られた判定
結果である。In the figure, (a) is the determination result obtained using the OR histogram, and (b) is the determination result obtained using the NR histogram.

ＯＲヒストグラムの判定により、仮罫線候補となったラ
インｌと４のピークに対して、ＮＲヒストグラムにより
判定する。Based on the determination of the OR histogram, the peaks of lines 1 and 4, which are temporary ruled line candidates, are determined using the NR histogram.

その結果、真の罫線であるという結果が得られたので、
ラインｌ、４を真の罫線候補に変換することを示してい
る。As a result, the result was that it was a true ruled line, so
This shows that lines 1 and 4 are converted into true ruled line candidates.

（６）次に、上記の処理により求めた異なった画素領域
にある罫線候補と仮罫線候補について、罫線同士の連続
性を判定する。(6) Next, the continuity of the ruled lines is determined for the ruled line candidates and temporary ruled line candidates located in different pixel areas obtained through the above processing.

第９図（ａ）は仮罫線候補について、罫線候補との連続
性を判定し、罫線候補と連続する仮罫線候補は真の罫線
候補に変換する処理の説明図である。FIG. 9(a) is an explanatory diagram of a process of determining the continuity of a temporary ruled line candidate with a ruled line candidate and converting a temporary ruled line candidate that is continuous with the ruled line candidate into a true ruled line candidate.

図において点線部分は仮罫線であり、黒で塗りつぶした
部分は罫線候補と判定された罫線候補を表す。In the figure, the dotted line portions are temporary ruled lines, and the blacked-out portions represent ruled line candidates determined to be ruled line candidates.

罫線の連続性は、罫線候補と同じライン上で罫線候補の
左右のいずれかに直接に接するかあるいは、罫線候補に
接している仮罫線候補に同様に同じライン上で接してい
る仮罫線は、罫線候補に連続性があるとする。Continuity of ruled lines is defined as whether a temporary ruled line is on the same line as a ruled line candidate and is in direct contact with either the left or right side of the ruled line candidate, or is in contact with a temporary ruled line candidate that is also in contact with a ruled line candidate on the same line. Assume that the ruled line candidates have continuity.

例えば、図（ａ　−１）において、罫線候補ＥＯに対し
て仮罫線候補Ｓ２およびＳｌは連続性がある。For example, in Figure (a-1), temporary ruled line candidates S2 and Sl have continuity with respect to ruled line candidate EO.

また、罫線候補の一つ斜め上または斜め下の画素領域の
仮罫線は罫線候補と連続性があるとする９例えば、図に
おける仮罫線候補Ｓ３は罫線候補ＥＯに連続性がある。Further, it is assumed that a temporary ruled line in a pixel area one diagonally above or below the ruled line candidate has continuity with the ruled line candidate9.For example, the temporary ruled line candidate S3 in the figure has continuity with the ruled line candidate EO.

又、罫線候補に連続する仮罫線候補の一つ斜め上または
下の仮罫線候補も真の罫線候補に連続性があるものとす
る。Further, it is assumed that a temporary ruled line candidate diagonally above or below a temporary ruled line candidate that is continuous with the ruled line candidate also has continuity with the true ruled line candidate.

例えば、図における仮罫線候補Ｓｏ、３４は真の罫線候
補と連続性があるとする。For example, it is assumed that the temporary ruled line candidate So, 34 in the figure has continuity with the true ruled line candidate.

罫線候補との連続性の判定の結果、連続性ありと判定さ
れた仮罫線候補は罫線候補に変換する。As a result of the determination of continuity with the ruled line candidates, temporary ruled line candidates determined to have continuity are converted into ruled line candidates.

第９図（ａ）における（ａ−２）に変換後の罫線候補と
仮罫線候補との関係を示す。(a-2) in FIG. 9(a) shows the relationship between the converted ruled line candidates and temporary ruled line candidates.

図（ａ　−１）におけるの仮罫線候補５Ｏ３Ｓ１、Ｓ２
、Ｓ３、Ｓ４は真の罫線ＥＯと連続性があるため、（ａ
−２）示すように罫線候補に変換する。Temporary ruled line candidates 5O3S1, S2 in Figure (a-1)
, S3, and S4 are continuous with the true ruled line EO, so (a
-2) Convert to ruled line candidates as shown.

図（ａ　−１）における仮罫線候補Ｓ７、Ｓ８、Ｓ９は
罫線候補との連続性がないため、仮罫線候補のまま残す
。Temporary ruled line candidates S7, S8, and S9 in FIG. (a-1) are left as temporary ruled line candidates because they have no continuity with the ruled line candidates.

（７）上記（６）の仮罫線候補を罫線候補に変換する仮
罫線候補変換処理の後に、罫線候補について、二重線の
単一線への変換処理を行う。(7) After the temporary ruled line candidate conversion process of converting the temporary ruled line candidate into a ruled line candidate in (6) above, a double line to single line conversion process is performed for the ruled line candidate.

二重線変換処理は、ｙ方向（抽出する罫線方向に垂直方
向で降順）について、並びあう罫線候補の間隔を調べ、
その間隔があらかじめ定めた二重線の間隔闇値（１ｍｍ
程度が適値）以下のものは、一方の罫線を削除する処理
である。The double line conversion process examines the interval between line candidates that line up in the y direction (in descending order perpendicular to the direction of the line to be extracted),
The distance between the double lines is determined in advance by the darkness value (1 mm).
The process below (the degree is appropriate) is the process of deleting one ruled line.

第９図（ｂ）は二重線変換処理の説明図である。FIG. 9(b) is an explanatory diagram of the double line conversion process.

図（ｂ−１）の罫線候補Ｌ１とＬ２はｙ方向の間隔が１
（１ｍｍ）で、間隔闇値以下のため、−方の罫線を削除
し、図（ｂ−２）に示すように単一線にする。なお、二
重罫線候補の削除は、二重罫線候補のうち、ＮＲヒスト
グラム値の小さい方とする。図（ｂ−１）において、他
の罫線候補は間隔が闇値よりも離れているため、削除処
理は行わない。The ruled line candidates L1 and L2 in Figure (b-1) have an interval of 1 in the y direction.
(1 mm), which is less than the distance value, delete the negative ruled line and make it a single line as shown in Figure (b-2). Note that, among the double ruled line candidates, the one with the smaller NR histogram value is deleted. In Figure (b-1), other ruled line candidates are spaced apart from each other by a distance greater than the darkness value, so deletion processing is not performed.

（８）Ｈ＆に、ＮＲヒストグラム、及びＯＲヒストグラ
ムにおいて抽出された罫線候補について、圧縮イメージ
上で、その罫線候補のラインのイメージを左右方向に追
跡し、罫線候補の左右端点を抽出する。そして、元イメ
ージにおける罫線の端点座標は、圧縮イメージ上の端点
座標から逆算する。(8) For the ruled line candidates extracted in the NR histogram and OR histogram, trace the image of the line of the ruled line candidate in the horizontal direction on the compressed image, and extract the left and right end points of the ruled line candidate. Then, the coordinates of the end points of the ruled lines in the original image are calculated backward from the coordinates of the end points on the compressed image.

圧縮イメージ上で罫線を追跡する際には、罫線の欠け、
ズレ、カスレ等により罫線が中断されている場合がある
ので、追跡しようとするラインとその１つ上位および１
つ下位のラインの黒画素についてのＯＲイメージを作成
し、そのＯＲイメージ上で黒画素のデータの連続性を判
断する。When tracing borders on a compressed image, missing borders,
Since the ruled lines may be interrupted due to misalignment, blurring, etc., please check the line you are trying to trace, the one above it, and the one above it.
An OR image is created for the black pixels of the next lower line, and the continuity of the black pixel data is determined on the OR image.

第１Ｏ図に罫線の端点を認識するための追跡処理方法を
示す。FIG. 1O shows a tracking processing method for recognizing the end points of ruled lines.

図において、５５は元イメージ、５６は元イメージ５５
により任意のラインと１つ上位および下位の３つのライ
ンの黒画素についてのＯＲイメージである。In the figure, 55 is the original image, 56 is the original image 55
This is an OR image of an arbitrary line and black pixels of three lines one above and one below.

ライン１を追跡する場合、元イメージ５５におけるライ
ン１とその上下のラインであるライン０とライン２の黒
画素のＯＲイメージ５６上のライン１上を黒画素の連続
性を判定する。連続性の判定は、連続性が切れる点を検
出することにより行う。そして、その点の座標を読み取
り、その罫線についての追跡は中止する。When tracing line 1, the continuity of black pixels on line 1 on line 1 on OR image 56 of black pixels of line 1 in original image 55 and lines 0 and 2, which are the lines above and below it, is determined. Continuity is determined by detecting a point where continuity breaks. Then, the coordinates of that point are read and tracking for that ruled line is stopped.

第１１図により、罫線の左右端点の認識方法を説明する
。A method for recognizing left and right end points of a ruled line will be explained with reference to FIG.

図におけるＡはヒストグラム情報を示し、図示のＥは罫
線候補を示す。A in the figure indicates histogram information, and E in the figure indicates a ruled line candidate.

図におけるＢは圧縮イメージである。B in the figure is a compressed image.

ヒストグラム情報Ａ上を矢印で示す縦方向に左側から順
次走査する。The histogram information A is sequentially scanned from the left in the vertical direction indicated by the arrow.

その結果、図へのイメージ上のＬｌに罫線候補を検出し
たら、それに対応する図Ｂのイメージにおける罫線イメ
ージＫｌ上を追跡する。As a result, when a ruled line candidate is detected in Ll on the image of the figure, the corresponding ruled line image Kl in the image of figure B is traced.

罫線イメージの追跡処理は、あらかじめ設定しておく定
数値で示される空隙を表す白画素闇値（点線も同時に抽
出する場合は１ｍｍ、実線のみを抽出する場合にはＱｍ
ｍが適値）よりも大きｌ、ｓ空隙（Ｇ）がある位置まで
追跡する。The tracking process for the ruled line image is performed using a white pixel darkness value representing the gap indicated by a constant value set in advance (1 mm if dotted lines are also extracted at the same time, Qm if only solid lines are extracted).
Track to the position where there is a gap (G) larger than l and s (where m is an appropriate value).

例えば、図Ｂにおいて罫線イメージＫｌについては、左
端ＬＰ　Ｌ右端ＲＰＩの間を追跡処理することになる。For example, in FIG. B, for the ruled line image Kl, tracking processing is performed between the left end LP L and the right end RPI.

この場合、空隙Ｇが（１，３）の位置で検出されたが、
このヒストグラム情報は、罫線候補となっているので、
この場合はさらに、ＬＰ３の位置から右方向に追跡し、
次の空隙が検出されるまで追跡する。In this case, the gap G was detected at the position (1, 3), but
This histogram information is a candidate for a ruled line, so
In this case, further track to the right from the position of LP3,
Track until the next void is detected.

ＬＰ３からの罫線イメージに３の追跡は、（３゜３）に
空隙が検出され、その罫線イメージに３は左端ＬＰ３、
右端ＲＰ　３となる。そして、その空隙のある（３．３
）のヒストグラム情報は罫線候補となっていないので、
このラインの罫線イメージの追跡は終了する。When tracing 3 to the ruled line image from LP3, a gap was detected at (3°3), and 3 was traced to the left edge of LP3 in the ruled line image.
The right end is RP 3. And with that void (3.3
) is not a ruled line candidate, so
Tracing of the ruled line image of this line ends.

上記のように、罫線イメージの追跡にあたっては、ヒス
トグラム情報（第１１図Ａ）における罫線候補となって
いる位置について追跡するだけでなく、罫線候補となっ
ていない位置についても、罫線イメージが罫線候補のイ
メージと連続している場合には、空隙が存在する位置ま
では追跡する。As mentioned above, when tracing a ruled line image, not only the positions that are ruled line candidates in the histogram information (Fig. 11A) are traced, but also the positions that are not ruled line candidates are traced. If the image is continuous with the image, it is tracked to the position where the gap exists.

同様に、ヒストグラム情報Ａ上で、Ｌ２も罫線候補であ
るので、それを検出したら、イメージＢにおける対応す
る罫線イメージに２について追跡処理を行う。そして、
その端点として、左端ＬＰ２と右端ＲＰ２の座標を読み
取る。Similarly, since L2 is also a ruled line candidate on histogram information A, when it is detected, tracking processing is performed on the corresponding ruled line image 2 in image B. and,
As the end points, the coordinates of the left end LP2 and right end RP2 are read.

以上の処理により、ＬＰＩ−ＲＰＩ、ＬＰ２〜ＲＰ２　
　ＬＰ３〜ＲＰ３の３本の罫線の左右端点座標を検出す
る。Through the above processing, LPI-RPI, LP2-RP2
The coordinates of the left and right end points of the three ruled lines LP3 to RP3 are detected.

本発明の実施例における罫線認識方法のフローを第１２
図ないし第１７図に示す。The flow of the ruled line recognition method in the embodiment of the present invention is shown in the 12th section.
This is shown in FIGS.

第１２図は、文書の読み取りからヒストグラムのピーク
の抽出までの処理のフローを示す。FIG. 12 shows the flow of processing from reading a document to extracting a histogram peak.

第１３図は、ＯＲヒストグラムから抽出したピークにつ
いて、罫線候補を判定する処理がらＮＲヒストグラムに
より罫線候補を判定する処理までのフローを示す。FIG. 13 shows the flow from the process of determining ruled line candidates to the process of determining ruled line candidates using the NR histogram regarding the peaks extracted from the OR histogram.

第１４図は、ＮＲヒストグラムによる罫線候補判定処理
の途中から連続性判定処理までのフローを示す。FIG. 14 shows the flow from the middle of the ruled line candidate determination process using the NR histogram to the continuity determination process.

第１５図は、二重線判定処理のフローを示す。FIG. 15 shows the flow of double line determination processing.

第１６図は、罫線候補の端点検出のための追跡夕を作成
するまでの処理のフローを示す。FIG. 16 shows the flow of processing up to creating a tracking pattern for detecting end points of ruled line candidates.

第１７図は、ＯＲイメージによる罫線候補の追跡のため
の処理の途中から、罫線削除に続いて、文字認識し、処
理を終了するまでのフローを示す。FIG. 17 shows a flow from the middle of processing for tracing candidate ruled lines using an OR image, following deletion of ruled lines, to character recognition, and the end of processing.

第１２図から第１７図までのフローを図に示す番号の順
番に従って概略説明する。The flow from FIG. 12 to FIG. 17 will be briefly explained in accordance with the order of numbers shown in the figures.

■　イメージスキャナ等により、文書原稿を読み取る。■ Read the original document using an image scanner, etc.

■　読み取った文書のイメージから圧縮画像を作成する
。■ Create a compressed image from the scanned document image.

■　圧縮画像から各ラインの黒画素のデータを数えてＮ
Ｒヒストグラムを作成する。■ Count the black pixel data of each line from the compressed image and calculate N
Create an R histogram.

■　圧縮画像から任意のラインと次のラインの黒画素に
ついてのＯＲデータによりＯＲイメージを作成し、黒画
素を数えてＯＲヒストグラムを作成する。(2) Create an OR image from the compressed image using OR data for black pixels of an arbitrary line and the next line, count the black pixels, and create an OR histogram.

■　ＯＲヒストグラムにおいて、ＯＲヒストグラム値を
罫線方向に垂直に走査し、ヒストグラム値が増加から減
少に転するピークを抽出する。(2) In the OR histogram, scan the OR histogram value vertically in the ruled line direction and extract the peak where the histogram value changes from increasing to decreasing.

■　抽出したピークからピークの谷間の値、ピークの高
さ、ピークの幅を調べ、ピークが罫線の特徴を示すピー
クであるがどうが調べる。■ Examine the peak valley value, peak height, and peak width from the extracted peaks to determine whether the peak is a peak that exhibits the characteristics of a ruled line.

■　ＯＲヒストグラムにより仮罫線候補となったピーク
のあるラインについて、ＮＲヒストグラムにより、罫線
候補となるが判定する。(2) A line with a peak that is a temporary ruled line candidate based on the OR histogram is determined to be a ruled line candidate based on the NR histogram.

■　抽出した罫線候補について、連続性を判定する。■ Determine the continuity of the extracted ruled line candidates.

■　連続性のある罫線について、二重罫線を抽出し、二
重罫線の一方を削除する。■ For continuous ruled lines, extract double ruled lines and delete one side of the double ruled lines.

［相］　罫線の端点を読み取るための罫線を追跡処理を
行う罫線を検出する。[Phase] Detects a ruled line that performs ruled line tracing processing to read the end points of the ruled line.

データを作成する。Create data.

■　同じライン上の罫線でギャップをはさんで隣あう次
の罫線について、左端の座標を求め、罫線の追跡を開始
する。■ Calculate the coordinates of the left end of the next ruled line on the same line that is adjacent to it across the gap, and start tracing the ruled line.

■　新たに追跡するラインについてその上下のラインと
の黒画素のＯＲイメージを作成する。■ Create an OR image of the black pixels of the newly traced line with the lines above and below it.

［相］　抽出した真の罫線を読み取った文書のイメージ
上より削除する。[Phase] Delete the extracted true ruled lines from the read image of the document.

■　罫線を削除したイメージにより１文字ずつの文字パ
ターンを分離し、文字認識を行う。■ Separate character patterns for each character using the image with ruled lines removed and perform character recognition.

〔Effect of the invention〕

本発明によれば、任意のラインと次のラインとの画素の
ＯＲイメージにより、罫線認識を行うようにしたので、
罫線の欠け、ずれ等により、罫線を見落とすことがなく
、また、ＯＲデータにより仮の罫線候補に判定されたも
のは、さらに元イメージにより罫線の判定をするように
したため、罫線の読み取り精度がすぐれている。According to the present invention, ruled line recognition is performed using the OR image of pixels of an arbitrary line and the next line.
No ruled lines are overlooked due to missing or misaligned lines, and if a ruled line is determined to be a temporary ruled line candidate based on OR data, the ruled line is further determined based on the original image, so the reading accuracy of ruled lines is excellent. ing.

また、罫線を認識した後に罫線、を削除し、文字につい
てのパターンのみにより文字認識を行うようにしたため
、表形式文書においても高精度の文字認識が可能になる
。Further, since the ruled lines are deleted after the ruled lines are recognized and character recognition is performed only by character patterns, highly accurate character recognition is possible even in tabular documents.

[Brief explanation of drawings]

第１図は、本発明の罫線判定方法の基本構成を示す図で
ある。第２図は、本発明の実施例装置の構成を示す図である。第３図（ａ）は、圧縮画像の実施例を示す図である。第３図（ｂ）は、ヒストグラムの実施例を示す図である
。第４図は、原稿文書を傾けてセットした場合の説明図で
ある。第５図（ａ）はＮＲヒストグラムの実施例を示す図であ
る。第５図（ｂ）は、ＯＲヒストグラムの実施例を示す図で
ある。第６図は、ＯＲヒストグラムの説明図である。第７図（ａ）は、ＯＲヒストグラムのピークの説明方法
の実施例を示す図である。第８図（ａ）は、ＯＲヒストグラムによる判定結果を示
す図である。第８図（ｂ）は、ＮＲヒストグラムによる判定結果を示
す図である。第９図（ａ）は、仮罫線候補変換処理を説明する図であ
る。第９図（ｂ）は、二重線変換処理の説明図である。第１０図は、端点を認識するための罫線追跡方法を示す
図である。第１１図は、罫線の端点認識方法を示す図である。第１２図は、本発明の認識方法のフローを示す図である
。第１３図は、本発明の認識方法の第１２図に続く部分の
フローを示す図である。第１４図は、本発明の認識方法の第１３図に続く部分の
フローを示す図である。第１５図は、本発明の認識方法の第１４図に続く部分の
フローを示す図である。第１６図は、本発明の認識方法の第１５図に続く部分の
フローを示す図である。第１７図は、本発明の認識方法の第１６図に続く部分の
フローを示す図である。第１８図は、従来の文字認識方法を示す図である。図において、に表形式文書、二元イメージ、：ＮＲヒストグラム、；ＯＲイメージ、：ＯＲヒストグラム。FIG. 1 is a diagram showing the basic configuration of the ruled line determination method of the present invention. FIG. 2 is a diagram showing the configuration of an apparatus according to an embodiment of the present invention. FIG. 3(a) is a diagram showing an example of a compressed image. FIG. 3(b) is a diagram showing an example of a histogram. FIG. 4 is an explanatory diagram when the original document is set tilted. FIG. 5(a) is a diagram showing an example of an NR histogram. FIG. 5(b) is a diagram showing an example of an OR histogram. FIG. 6 is an explanatory diagram of an OR histogram. FIG. 7(a) is a diagram showing an example of a method for explaining peaks in an OR histogram. FIG. 8(a) is a diagram showing the determination result based on the OR histogram. FIG. 8(b) is a diagram showing the determination result based on the NR histogram. FIG. 9(a) is a diagram illustrating temporary ruled line candidate conversion processing. FIG. 9(b) is an explanatory diagram of the double line conversion process. FIG. 10 is a diagram showing a ruled line tracing method for recognizing end points. FIG. 11 is a diagram showing a method for recognizing the end points of ruled lines. FIG. 12 is a diagram showing the flow of the recognition method of the present invention. FIG. 13 is a diagram showing the flow of the part following FIG. 12 of the recognition method of the present invention. FIG. 14 is a diagram showing the flow of the part following FIG. 13 of the recognition method of the present invention. FIG. 15 is a diagram showing the flow of the part following FIG. 14 of the recognition method of the present invention. FIG. 16 is a diagram showing the flow of the part following FIG. 15 of the recognition method of the present invention. FIG. 17 is a diagram showing the flow of the part following FIG. 16 of the recognition method of the present invention. FIG. 18 is a diagram showing a conventional character recognition method. In the figure, there is a tabular document, a binary image, :NR histogram, ;OR image, :OR histogram.

Claims

[Claims]

(1) In a method for recognizing ruled line data in a character recognition device that inputs document data including ruled lines as image data, a normal histogram is used to count the number of black pixels for each line in the ruled line direction during scanning of document data. , O for the black pixel of the pixel on each line with the same coordinates in the ruled line direction in two consecutive lines
Create an OR histogram by counting the black pixels of an arbitrary line line by line using the OR image created by the R data, and determine the peak of the histogram data when the histogram data in each line of the OR histogram changes from increasing to decreasing. Then, by comparing the peak height or width of any line of interest with the histogram data of adjacent peaks or lines before and after, the line of interest is divided into candidate ruled lines that are true ruled lines and hypothetical lines that are likely to be ruled lines. Line candidates and non-ruled lines are determined, and lines that are determined to be true ruled lines by the OR histogram are finally considered as ruled line candidates. Lines that are determined to be temporary ruled line candidates are further confirmed as ruled lines using a normal histogram. 1. A ruled line determination method characterized by determining the similarity of a line and converting a line determined to be a true ruled line from a temporary ruled line candidate to a ruled line candidate based on a normal histogram.

(2) Character recognition characterized by comprising a ruled line deletion means for deleting a ruled line candidate determined to be a true ruled line by the ruled line determination method according to claim (1), and character recognition is performed after deleting ruled line data. Device.