JPH10334184A - Ruled line erasing method and device, table processing method and device, character recognition method and device and recording medium - Google Patents

Ruled line erasing method and device, table processing method and device, character recognition method and device and recording medium

Info

Publication number
JPH10334184A
JPH10334184A JP9141356A JP14135697A JPH10334184A JP H10334184 A JPH10334184 A JP H10334184A JP 9141356 A JP9141356 A JP 9141356A JP 14135697 A JP14135697 A JP 14135697A JP H10334184 A JPH10334184 A JP H10334184A
Authority
JP
Japan
Prior art keywords
ruled line
black
extracting
extracted
binary image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP9141356A
Other languages
Japanese (ja)
Inventor
Goro Bessho
吾朗 別所
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ricoh Co Ltd
Original Assignee
Ricoh Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ricoh Co Ltd filed Critical Ricoh Co Ltd
Priority to JP9141356A priority Critical patent/JPH10334184A/en
Publication of JPH10334184A publication Critical patent/JPH10334184A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Input (AREA)

Abstract

PROBLEM TO BE SOLVED: To perform accurate ruled line recognition and character recognition even in the case that a character and a ruled line are in contact or the like by extracting a rectangle for which the black runs of more than a prescribed threshold length are extracted from binary images and integrated as the ruled line and converting the black picture elements of the binary images for constituting the ruled line to white picture elements. SOLUTION: The original of a document or the like is read in a binary image input part 1 and is stored in a binary image memory 2. A black run extraction part 3 extracts the black runs of a length more than a fixed value from the binary image memory 2 and stores them in a black run memory 4. A ruled line extraction part 5 integrates the black runs present within a predetermined distance together, extracts them as the ruled lines and stores them in a ruled line memory 6. A ruled line thickness detection part 7 reads the ruled lines from the ruled line memory 6 and calculates the thickness of the ruled lines and a frame extraction part 8 combines the ruled lines, extracts them as a frame area and stores them in a frame memory 9. Then, a ruled line erasing part 10 refers to the binary image memory 2 and the ruled line memory 6, replaces the black picture elements equivalent to the ruled line from the binary images with the white picture elements and erases the ruled line.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【0001】[0001]

【発明の属する技術分野】本発明は、罫線やアンダーラ
インを含む文書画像における罫線消去方法、装置、表処
理方法、装置、文字認識方法、装置および記録媒体に関
する。
[0001] 1. Field of the Invention [0002] The present invention relates to a method, an apparatus, a table processing method, an apparatus, a character recognition method, an apparatus, and a recording medium for eliminating ruled lines in a document image containing ruled lines and underlines.

【0002】[0002]

【従来の技術】一般に、文書認識装置において文書を処
理する場合に、文書画像を文字領域、表領域、図その他
の領域に分類して、それぞれの領域に適した処理を行な
うことが多い。その中でも、表領域を処理する方法とし
ては、表を構成する罫線を認識して、罫線に囲まれた文
字を抽出してから認識する表処理方法がある(特開平3
−172984号公報を参照)。
2. Description of the Related Art In general, when processing a document in a document recognition apparatus, a document image is often classified into a character area, a table area, a figure, and other areas, and processing suitable for each area is often performed. Among them, as a method of processing a table area, there is a table processing method of recognizing a ruled line constituting a table and extracting a character surrounded by the ruled line before recognizing the character (Japanese Patent Laid-Open No.
1729884).

【0003】[0003]

【発明が解決しようとする課題】しかし、従来の方法で
は、文字と罫線が接触していたり、あるいは罫線枠を突
き抜けて文字が記載されているような場合には、文字の
一部が消去され、このために文字認識を正確に行なえな
いという問題があった。
However, according to the conventional method, when a character is in contact with a ruled line or when a character is written through a ruled line frame, a part of the character is erased. Therefore, there is a problem that character recognition cannot be performed accurately.

【0004】本発明の目的は、文字と罫線が接触してい
たり、あるいは罫線枠を突き抜けて文字が記載されてい
るような場合にも正確な罫線の認識および文字認識を行
うことができる罫線消去方法、装置、表処理方法、装
置、文字認識方法、装置および記録媒体を提供すること
にある。
An object of the present invention is to provide a ruled line erasing device capable of performing accurate ruled line recognition and character recognition even when a character is in contact with a ruled line or when a character is written through a ruled line frame. A method, a device, a table processing method, a device, a character recognition method, a device, and a recording medium are provided.

【0005】[0005]

【課題を解決するための手段】前記目的を達成するため
に、請求項1記載の発明では、2値画像から所定の閾値
以上の長さの黒ランを抽出し、該抽出された黒ランにつ
いて所定の距離内にある黒ラン同士を統合することによ
り、黒ランすべてを包含する矩形を罫線として抽出し、
該罫線を構成している前記2値画像の黒画素を白画素に
変換することにより前記罫線を消去することを特徴とし
ている。
In order to achieve the above object, according to the present invention, a black run having a length equal to or more than a predetermined threshold value is extracted from a binary image, and the black run is extracted. By integrating black runs within a predetermined distance, a rectangle including all the black runs is extracted as a ruled line,
The ruled line is erased by converting black pixels of the binary image forming the ruled line into white pixels.

【0006】請求項2記載の発明では、文書画像を2値
画像として入力する手段と、該2値画像から所定の閾値
以上の長さの黒ランを抽出する手段と、該抽出された黒
ランについて所定の距離内にある黒ラン同士を統合する
ことにより、黒ランすべてを包含する矩形を罫線として
抽出する手段と、該罫線を構成している前記2値画像の
黒画素を白画素に変換することにより前記罫線を消去す
る手段を備えたことを特徴としている。
According to the second aspect of the present invention, a means for inputting a document image as a binary image, a means for extracting a black run having a length equal to or more than a predetermined threshold from the binary image, Means for extracting a rectangle including all the black runs as ruled lines by integrating black runs within a predetermined distance from each other, and converting black pixels of the binary image forming the ruled lines into white pixels. And means for erasing the ruled line.

【0007】請求項3記載の発明では、2値画像から所
定の閾値以上の長さの黒ランを抽出し、該抽出された黒
ランについて所定の距離内にある黒ラン同士を統合する
ことにより、黒ランすべてを包含する矩形を罫線として
抽出し、該罫線を組み合わせて枠を抽出し、前記罫線を
構成している前記2値画像の黒画素を白画素に変換する
ことにより前記罫線を消去し、前記罫線を消去したと
き、罫線の太さと同じ幅の白画素の間隔が生成された場
合に、該白画素の間隔を黒画素に変換し、前記抽出され
た枠領域より広い枠領域を設定し、該設定された枠内か
ら文字を抽出することを特徴としている。
According to the third aspect of the present invention, a black run having a length equal to or longer than a predetermined threshold is extracted from the binary image, and black runs within a predetermined distance of the extracted black run are integrated. And extracting a rectangle including all the black runs as ruled lines, combining the ruled lines to extract a frame, and converting the black pixels of the binary image forming the ruled lines into white pixels to erase the ruled lines. Then, when the ruled line is deleted, if an interval between white pixels having the same width as the thickness of the ruled line is generated, the interval between the white pixels is converted into a black pixel, and a frame area wider than the extracted frame area is converted. It is characterized by setting and extracting characters from within the set frame.

【0008】請求項4記載の発明では、文書画像を2値
画像として入力する手段と、該2値画像から所定の閾値
以上の長さの黒ランを抽出する手段と、該抽出された黒
ランについて所定の距離内にある黒ラン同士を統合する
ことにより、黒ランすべてを包含する矩形を罫線として
抽出する手段と、該罫線を組み合わせて枠を抽出する手
段と、前記罫線を消去する手段と、前記罫線を消去した
とき、罫線の太さと同じ幅の白画素の間隔が文字画像中
に生成された場合に、該白画素の間隔を黒画素に変換す
ることにより文字画像を復元する手段と、前記抽出され
た枠領域より広い枠領域を設定する手段と、該設定され
た枠内から文字を抽出する手段を備えたことを特徴とし
ている。
According to a fourth aspect of the present invention, a means for inputting a document image as a binary image, a means for extracting a black run having a length equal to or more than a predetermined threshold from the binary image, Means for extracting a rectangle including all black runs as ruled lines by integrating black runs within a predetermined distance with respect to each other, means for extracting a frame by combining the ruled lines, and means for erasing the ruled lines. Means for restoring the character image by converting the interval between white pixels to black pixels when an interval between white pixels having the same width as the thickness of the rule line is generated in the character image when the ruled line is deleted. And means for setting a frame area wider than the extracted frame area, and means for extracting characters from within the set frame.

【0009】請求項5記載の発明では、請求項3記載の
方法によって抽出された文字を認識することを特徴とし
ている。
According to a fifth aspect of the present invention, a character extracted by the method of the third aspect is recognized.

【0010】請求項6記載の発明では、文書画像を2値
画像として入力する手段と、該2値画像から所定の閾値
以上の長さの黒ランを抽出する手段と、該抽出された黒
ランについて所定の距離内にある黒ラン同士を統合する
ことにより、黒ランすべてを包含する矩形を罫線として
抽出する手段と、該罫線を組み合わせて枠を抽出する手
段と、前記罫線を消去する手段と、前記罫線を消去した
とき、罫線の太さと同じ幅の白画素の間隔が文字画像中
に生成された場合に、該白画素の間隔を黒画素に変換す
ることにより文字画像を復元する手段と、前記抽出され
た枠領域より広い枠領域を設定する手段と、該設定され
た枠内から文字を抽出する手段と、該抽出した文字を認
識する手段を備えたことを特徴としている。
According to the present invention, a means for inputting a document image as a binary image, a means for extracting a black run having a length equal to or more than a predetermined threshold from the binary image, Means for extracting a rectangle including all black runs as ruled lines by integrating black runs within a predetermined distance with respect to each other, means for extracting a frame by combining the ruled lines, and means for erasing the ruled lines. Means for restoring the character image by converting the interval between white pixels to black pixels when an interval between white pixels having the same width as the thickness of the rule line is generated in the character image when the ruled line is deleted. Means for setting a frame area wider than the extracted frame area, means for extracting a character from the set frame, and means for recognizing the extracted character.

【0011】請求項7記載の発明では、文書画像を2値
画像として入力する機能と、該2値画像から所定の閾値
以上の長さの黒ランを抽出する機能と、該抽出された黒
ランについて所定の距離内にある黒ラン同士を統合する
ことにより、黒ランすべてを包含する矩形を罫線として
抽出する機能と、該罫線を組み合わせて枠を抽出する機
能と、前記罫線を消去する機能と、前記罫線を消去した
とき、罫線の太さと同じ幅の白画素の間隔が文字画像中
に生成された場合に、該白画素の間隔を黒画素に変換す
ることにより文字画像を復元する機能と、前記抽出され
た枠領域より広い枠領域を設定する機能と、該設定され
た枠内から文字を抽出する機能と、該抽出した文字を認
識する機能をコンピュータに実現させるためのプログラ
ムを記録したコンピュータ読み取り可能な記録媒体であ
ることを特徴としている。
According to the present invention, a function of inputting a document image as a binary image, a function of extracting a black run having a length equal to or more than a predetermined threshold from the binary image, and a function of extracting the black run A function of extracting a rectangle including all black runs as ruled lines by integrating black runs within a predetermined distance, a function of extracting a frame by combining the ruled lines, and a function of deleting the ruled line. A function of restoring a character image by converting the interval between white pixels to a black pixel when an interval between white pixels having the same width as the thickness of the ruled line is generated in the character image when the ruled line is deleted. A program for causing a computer to realize a function of setting a frame area wider than the extracted frame area, a function of extracting characters from the set frame, and a function of recognizing the extracted characters. Con Is characterized in that Yuta is readable recording medium.

【0012】[0012]

【発明の実施の形態】以下、本発明の一実施例を図面を
用いて具体的に説明する。 〈実施例1〉図1は、本発明の実施例1の構成を示し、
図2は、実施例1の処理フローチャートを示す。以下、
図1、2を参照して本発明の処理動作を説明する。
DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be specifically described below with reference to the drawings. <Embodiment 1> FIG. 1 shows the structure of Embodiment 1 of the present invention.
FIG. 2 shows a processing flowchart of the first embodiment. Less than,
The processing operation of the present invention will be described with reference to FIGS.

【0013】スキャナ等の2値画像入力部1によって、
文書や帳票等の原稿を2値画像として読み取り、2値画
像メモリ2に格納する(ステップ101)。黒ラン抽出
部2は、2値画像メモリ2から一定値以上の長さを持つ
黒ランを抽出し、黒ランメモリ4に格納する(ステップ
102)。次いで、罫線抽出部5は、黒ランメモリ4か
ら黒ラン同士が予め定められた距離内にあるものは、同
一の罫線を構成するものとして統合し、この黒ランをす
べて包含する矩形を罫線として抽出し、罫線メモリ6に
格納する(ステップ103)。
By a binary image input unit 1 such as a scanner,
An original such as a document or a form is read as a binary image and stored in the binary image memory 2 (step 101). The black run extraction unit 2 extracts a black run having a length equal to or more than a predetermined value from the binary image memory 2 and stores it in the black run memory 4 (step 102). Next, the ruled line extraction unit 5 integrates black runs within a predetermined distance from the black run memory 4 as constituting the same ruled line, and defines a rectangle including all the black runs as a ruled line. It is extracted and stored in the ruled line memory 6 (step 103).

【0014】続いて、罫線太さ検出部7は、罫線メモリ
6から罫線を読み出し、罫線を構成する黒ランから黒画
素数(black)を計数し、それを罫線の長さ(le
ngth)で除したものを罫線の太さ(thickne
ss)として算出する(ステップ104)。
Subsequently, the ruled line thickness detecting section 7 reads the ruled line from the ruled line memory 6, counts the number of black pixels (black) from the black runs constituting the ruled line, and calculates the ruled line length (le).
ngth) and the ruled line thickness (thickne)
ss) (step 104).

【0015】 thickness=black/length 上記した処理を主走査方向だけでなく、副走査方向に対
しても行なう。
Thickness = black / length The above processing is performed not only in the main scanning direction but also in the sub-scanning direction.

【0016】枠抽出部8は、主走査方向と副走査方向の
罫線を組み合わせて4辺が囲まれるものを枠領域として
抽出し、枠メモリ9に格納する(ステップ105)。次
いで、罫線消去部10は、2値画像メモリ2と罫線メモ
リ6を参照して、2値画像から罫線に相当する黒画素を
白画素に置き換えることによって、罫線を消去する(ス
テップ106)。
The frame extracting section 8 combines ruled lines in the main scanning direction and the sub-scanning direction, extracts an area surrounded by four sides as a frame area, and stores the frame area in the frame memory 9 (step 105). Next, the ruled line erasing unit 10 erases the ruled line by referring to the binary image memory 2 and the ruled line memory 6 and replacing black pixels corresponding to the ruled lines with white pixels from the binary image (step 106).

【0017】次に、文字画素補間部11は、2値画像メ
モリ2を参照して、罫線として消去された領域の上下に
接している黒画素のうち、間隔がある一定値以内のもの
があれば、黒画素の補間を行う(ステップ107)。
Next, the character pixel interpolating unit 11 refers to the binary image memory 2 and finds any black pixels which are in contact with the upper and lower parts of the area erased as the ruled line and whose intervals are within a certain value. For example, black pixel interpolation is performed (step 107).

【0018】これを図3の例で説明する。(a)は、罫
線消去後の画像例を示す。消去された罫線領域の上側に
接している黒画素位置(Xus,Xue)と下側に接し
ている黒画素位置(Xds,Xde)が |Xus−Xde|<Th |Xue−Xds|<Th の条件のどちらかを満足する場合(Thは所定の閾
値)、幅が(Min(Xus,Xds)−Max(Xu
e,Xde))で、高さがthicknessの矩形で
該当する座標のところを黒画素に置き換える。(b)は
補間後の画像を示す。これによって、罫線消去によって
分断された文字を復元することができる。
This will be described with reference to the example of FIG. (A) shows an example of an image after ruled line deletion. The black pixel position (Xus, Xue) in contact with the upper side of the erased ruled line area and the black pixel position (Xds, Xde) in contact with the lower side are | Xus−Xde | <Th | Xue−Xds | <Th If either of the conditions is satisfied (Th is a predetermined threshold), the width is (Min (Xus, Xds) -Max (Xu
e, Xde)), and replace the corresponding coordinates in a rectangle with a height of “thickness” with black pixels. (B) shows an image after interpolation. As a result, it is possible to restore the character divided by the ruled line erasure.

【0019】文字抽出部12は、文字が枠領域内に収ま
るように、ステップ105で抽出した枠領域よりも広め
の領域を設定し、文字復元が行われた画像に対して、そ
の領域内で矩形抽出を行い、枠内の文字を抽出し、文字
画像メモリ13に格納する(ステップ108)。そし
て、文字認識部14は、抽出された文字を認識し、テキ
スト出力を得る(ステップ109)。
The character extracting unit 12 sets an area wider than the frame area extracted in step 105 so that the character fits within the frame area. A rectangle is extracted, characters in the frame are extracted, and stored in the character image memory 13 (step 108). Then, the character recognition unit 14 recognizes the extracted character and obtains a text output (step 109).

【0020】〈実施例2〉本発明は上記した実施例に限
定されず、ソフトウェアによっても実現することができ
る。本発明をソフトウェアによって実現する場合には、
図4に示すように、CPU、ROM、RAM、表示装
置、ハードディスク、キーボード、CD−ROMドライ
ブ、スキャナなどからなる汎用の処理装置を用意し、C
D−ROMなどのコンピュータ読み取り可能な記録媒体
には、本発明の表処理、文字認識処理機能を実現するプ
ログラムが記録されている。また、スキャナなどから入
力された文書などの画像は一時的にハードディスクなど
に格納される。そして、該プログラムが起動されると、
一時保存された画像データが読み込まれて、表処理、文
字認識処理を実行し、その結果をディスプレイ、プリン
タなどに出力する。
<Embodiment 2> The present invention is not limited to the above-described embodiment, but can be realized by software. When the present invention is realized by software,
As shown in FIG. 4, a general-purpose processing device including a CPU, a ROM, a RAM, a display device, a hard disk, a keyboard, a CD-ROM drive, and a scanner is prepared.
On a computer-readable recording medium such as a D-ROM, a program for realizing the table processing and character recognition processing functions of the present invention is recorded. Images such as documents input from a scanner or the like are temporarily stored on a hard disk or the like. Then, when the program is started,
The temporarily stored image data is read, a table process and a character recognition process are executed, and the results are output to a display, a printer, or the like.

【0021】[0021]

【発明の効果】以上、説明したように、本発明によれ
ば、罫線と文字が接触していたり、あるいは罫線枠を突
き抜けて文字が記載されている場合でも、正確に文字を
認識することが可能となる。
As described above, according to the present invention, even if a character is in contact with a ruled line or a character is written through a ruled line frame, the character can be accurately recognized. It becomes possible.

【図面の簡単な説明】[Brief description of the drawings]

【図1】本発明の実施例1の構成を示す。FIG. 1 shows a configuration of a first exemplary embodiment of the present invention.

【図2】実施例1の処理フローチャートを示す。FIG. 2 shows a processing flowchart of the first embodiment.

【図3】(a)は、罫線消去後の画像例を示し、(b)
は黒画素補間後の画像を示す。
FIG. 3A shows an example of an image after ruled line erasure, and FIG.
Indicates an image after black pixel interpolation.

【図4】本発明の実施例2の構成を示す。FIG. 4 shows a configuration of a second exemplary embodiment of the present invention.

【符号の説明】[Explanation of symbols]

1 2値画像入力部 2 2値画像メモリ 3 黒ラン抽出部 4 黒ランメモリ 5 罫線抽出部 6 罫線メモリ 7 罫線太さ検出部 8 枠抽出部 9 枠メモリ 10 罫線消去部 11 文字画素補間部 12 文字抽出部 13 文字画像メモリ 14 文字認識部 Reference Signs List 1 binary image input unit 2 binary image memory 3 black run extraction unit 4 black run memory 5 ruled line extraction unit 6 ruled line memory 7 ruled line thickness detection unit 8 frame extraction unit 9 frame memory 10 ruled line elimination unit 11 character pixel interpolation unit 12 Character extraction unit 13 Character image memory 14 Character recognition unit

Claims (7)

【特許請求の範囲】[Claims] 【請求項1】 2値画像から所定の閾値以上の長さの黒
ランを抽出し、該抽出された黒ランについて所定の距離
内にある黒ラン同士を統合することにより、黒ランすべ
てを包含する矩形を罫線として抽出し、該罫線を構成し
ている前記2値画像の黒画素を白画素に変換することに
より前記罫線を消去することを特徴とする罫線消去方
法。
1. A black run having a length equal to or greater than a predetermined threshold is extracted from a binary image, and black runs within a predetermined distance of the extracted black runs are integrated to include all the black runs. A rectangle to be extracted is extracted as a ruled line, and the ruled line is erased by converting black pixels of the binary image forming the ruled line into white pixels.
【請求項2】 文書画像を2値画像として入力する手段
と、該2値画像から所定の閾値以上の長さの黒ランを抽
出する手段と、該抽出された黒ランについて所定の距離
内にある黒ラン同士を統合することにより、黒ランすべ
てを包含する矩形を罫線として抽出する手段と、該罫線
を構成している前記2値画像の黒画素を白画素に変換す
ることにより前記罫線を消去する手段を備えたことを特
徴とする罫線消去装置。
2. A means for inputting a document image as a binary image, a means for extracting a black run having a length equal to or more than a predetermined threshold value from the binary image, and Means for extracting a rectangle including all the black runs as ruled lines by integrating certain black runs, and converting the black pixels of the binary image forming the ruled lines to white pixels to convert the ruled lines into white pixels A ruled line erasing apparatus comprising means for erasing.
【請求項3】 2値画像から所定の閾値以上の長さの黒
ランを抽出し、該抽出された黒ランについて所定の距離
内にある黒ラン同士を統合することにより、黒ランすべ
てを包含する矩形を罫線として抽出し、該罫線を組み合
わせて枠を抽出し、前記罫線を構成している前記2値画
像の黒画素を白画素に変換することにより前記罫線を消
去し、前記罫線を消去したとき、罫線の太さと同じ幅の
白画素の間隔が生成された場合に、該白画素の間隔を黒
画素に変換し、前記抽出された枠領域より広い枠領域を
設定し、該設定された枠内から文字を抽出することを特
徴とする表処理方法。
3. A black run having a length equal to or greater than a predetermined threshold is extracted from a binary image, and black runs within a predetermined distance of the extracted black runs are integrated to include all the black runs. A rectangle to be extracted is extracted as a ruled line, a frame is extracted by combining the ruled line, and the ruled line is erased by converting black pixels of the binary image forming the ruled line into white pixels, thereby erasing the ruled line. Then, if an interval between white pixels having the same width as the ruled line thickness is generated, the interval between the white pixels is converted into a black pixel, and a frame area wider than the extracted frame area is set. A table processing method characterized by extracting characters from inside a frame.
【請求項4】 文書画像を2値画像として入力する手段
と、該2値画像から所定の閾値以上の長さの黒ランを抽
出する手段と、該抽出された黒ランについて所定の距離
内にある黒ラン同士を統合することにより、黒ランすべ
てを包含する矩形を罫線として抽出する手段と、該罫線
を組み合わせて枠を抽出する手段と、前記罫線を消去す
る手段と、前記罫線を消去したとき、罫線の太さと同じ
幅の白画素の間隔が文字画像中に生成された場合に、該
白画素の間隔を黒画素に変換することにより文字画像を
復元する手段と、前記抽出された枠領域より広い枠領域
を設定する手段と、該設定された枠内から文字を抽出す
る手段を備えたことを特徴とする表処理装置。
4. A means for inputting a document image as a binary image, a means for extracting a black run having a length equal to or greater than a predetermined threshold value from the binary image, and By integrating certain black runs, a means for extracting a rectangle including all the black runs as a ruled line, a means for extracting a frame by combining the ruled lines, a means for deleting the ruled line, and a method for deleting the ruled line When an interval between white pixels having the same width as the ruled line thickness is generated in the character image, a unit for restoring the character image by converting the interval between the white pixels into black pixels, and the extracted frame A table processing apparatus comprising: means for setting a frame area wider than the area; and means for extracting characters from within the set frame.
【請求項5】 請求項3記載の方法によって抽出された
文字を認識することを特徴とする文字認識方法。
5. A character recognition method for recognizing a character extracted by the method according to claim 3.
【請求項6】 文書画像を2値画像として入力する手段
と、該2値画像から所定の閾値以上の長さの黒ランを抽
出する手段と、該抽出された黒ランについて所定の距離
内にある黒ラン同士を統合することにより、黒ランすべ
てを包含する矩形を罫線として抽出する手段と、該罫線
を組み合わせて枠を抽出する手段と、前記罫線を消去す
る手段と、前記罫線を消去したとき、罫線の太さと同じ
幅の白画素の間隔が文字画像中に生成された場合に、該
白画素の間隔を黒画素に変換することにより文字画像を
復元する手段と、前記抽出された枠領域より広い枠領域
を設定する手段と、該設定された枠内から文字を抽出す
る手段と、該抽出した文字を認識する手段を備えたこと
を特徴とする文字認識装置。
6. A means for inputting a document image as a binary image, a means for extracting a black run having a length equal to or greater than a predetermined threshold value from the binary image, and By integrating certain black runs, a means for extracting a rectangle including all the black runs as a ruled line, a means for extracting a frame by combining the ruled lines, a means for deleting the ruled line, and a method for deleting the ruled line When an interval between white pixels having the same width as the ruled line thickness is generated in the character image, a unit for restoring the character image by converting the interval between the white pixels into black pixels, and the extracted frame A character recognition device comprising: means for setting a frame area wider than the area; means for extracting characters from within the set frame; and means for recognizing the extracted characters.
【請求項7】 文書画像を2値画像として入力する機能
と、該2値画像から所定の閾値以上の長さの黒ランを抽
出する機能と、該抽出された黒ランについて所定の距離
内にある黒ラン同士を統合することにより、黒ランすべ
てを包含する矩形を罫線として抽出する機能と、該罫線
を組み合わせて枠を抽出する機能と、前記罫線を消去す
る機能と、前記罫線を消去したとき、罫線の太さと同じ
幅の白画素の間隔が文字画像中に生成された場合に、該
白画素の間隔を黒画素に変換することにより文字画像を
復元する機能と、前記抽出された枠領域より広い枠領域
を設定する機能と、該設定された枠内から文字を抽出す
る機能と、該抽出した文字を認識する機能をコンピュー
タに実現させるためのプログラムを記録したコンピュー
タ読み取り可能な記録媒体。
7. A function for inputting a document image as a binary image, a function for extracting a black run having a length equal to or greater than a predetermined threshold from the binary image, and a function for extracting the black run within a predetermined distance. By integrating certain black runs, a function of extracting a rectangle including all black runs as a ruled line, a function of extracting a frame by combining the ruled lines, a function of deleting the ruled line, and a function of deleting the ruled line When an interval between white pixels having the same width as the ruled line thickness is generated in the character image, the function of restoring the character image by converting the interval between the white pixels to black pixels, and A computer readable recording medium storing a program for causing a computer to implement a function of setting a frame area wider than the area, a function of extracting characters from the set frame, and a function of recognizing the extracted characters. Recording medium.
JP9141356A 1997-05-30 1997-05-30 Ruled line erasing method and device, table processing method and device, character recognition method and device and recording medium Pending JPH10334184A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP9141356A JPH10334184A (en) 1997-05-30 1997-05-30 Ruled line erasing method and device, table processing method and device, character recognition method and device and recording medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP9141356A JPH10334184A (en) 1997-05-30 1997-05-30 Ruled line erasing method and device, table processing method and device, character recognition method and device and recording medium

Publications (1)

Publication Number Publication Date
JPH10334184A true JPH10334184A (en) 1998-12-18

Family

ID=15290082

Family Applications (1)

Application Number Title Priority Date Filing Date
JP9141356A Pending JPH10334184A (en) 1997-05-30 1997-05-30 Ruled line erasing method and device, table processing method and device, character recognition method and device and recording medium

Country Status (1)

Country Link
JP (1) JPH10334184A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004088587A1 (en) * 2003-03-28 2004-10-14 National Institute Of Information And Communications Technology, Independent Administrative Agency Image processing method and image processing device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004088587A1 (en) * 2003-03-28 2004-10-14 National Institute Of Information And Communications Technology, Independent Administrative Agency Image processing method and image processing device

Similar Documents

Publication Publication Date Title
US20070237394A1 (en) Image processor for character recognition
JP4655335B2 (en) Image recognition apparatus, image recognition method, and computer-readable recording medium on which image recognition program is recorded
JP2001297303A (en) Method and device for recognizing document image and computer readable recording medium
JP2000207489A (en) Character extracting method and device and record medium
JP5049922B2 (en) Image processing apparatus and image processing method
JP2002015280A (en) Device and method for image recognition, and computer- readable recording medium with recorded image recognizing program
JP2007156741A (en) Character extraction method, character extraction device, and program
JPH10334184A (en) Ruled line erasing method and device, table processing method and device, character recognition method and device and recording medium
JP4420440B2 (en) Image processing apparatus, image processing method, character recognition apparatus, program, and recording medium
JP2003046746A (en) Method and apparatus for processing image
JP2000082110A (en) Ruled line deletion device, character picture extraction device, ruled line deletion method, character picture extraction method and storage medium
JP4040231B2 (en) Character extraction method and apparatus, and storage medium
JP2796561B2 (en) Tabular document recognition method
JP3391987B2 (en) Form recognition device
JPH08237404A (en) Selection of optical character recognition mode
JPH1049676A (en) Method for recognizing ruled line
JPH10177621A (en) Method for processing document and method for recognizing ruled line and recording medium
JP2003317107A (en) Method and device for ruled-line detection
JP4974367B2 (en) Region dividing method and apparatus, and program
JP4129902B2 (en) Ruled line erasing method, ruled line erasing apparatus, and recording medium
JP2001236464A (en) Method and device for character extraction and storage medium
JP3162414B2 (en) Ruled line recognition method and table processing method
JPH0271379A (en) Picture processor
JP2931041B2 (en) Character recognition method in table
JPH11242716A (en) Image processing method and storage medium

Legal Events

Date Code Title Description
A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20060414

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20060419

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20060619

A02 Decision of refusal

Free format text: JAPANESE INTERMEDIATE CODE: A02

Effective date: 20060816