JPH03172984A - Table processing method - Google Patents

Table processing method

Info

Publication number
JPH03172984A
JPH03172984A JP1314519A JP31451989A JPH03172984A JP H03172984 A JPH03172984 A JP H03172984A JP 1314519 A JP1314519 A JP 1314519A JP 31451989 A JP31451989 A JP 31451989A JP H03172984 A JPH03172984 A JP H03172984A
Authority
JP
Japan
Prior art keywords
frame
scanning direction
image
rectangle
black
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP1314519A
Other languages
Japanese (ja)
Other versions
JP2851089B2 (en
Inventor
Goro Bessho
吾朗 別所
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ricoh Co Ltd
Original Assignee
Ricoh Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ricoh Co Ltd filed Critical Ricoh Co Ltd
Priority to JP1314519A priority Critical patent/JP2851089B2/en
Publication of JPH03172984A publication Critical patent/JPH03172984A/en
Application granted granted Critical
Publication of JP2851089B2 publication Critical patent/JP2851089B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Abstract

PURPOSE:To accurately segment a character in a frame even when an image is inclined by removing a ruled line for a remarked frame or a black-connected circumscribed rectangle contacting with an adjacent frame in the frame. CONSTITUTION:A main scanning direction segment extracting part 15 traces black elements connected in the main scanning direction in a table area image, extracts a rectangle surrounding a ruled line in the main scanning direction and stores the start point coordinates (Xs, Ys) and end point coordinates (Xe, Ye) of the extracted rectangle in a main scanning direction segment coordinate memory 16. A subscanning direction segment extracting part 17 similarly stores the start and end point coordinates in the subscanning direction in a subscanning direction segment coordinate memory 18. A black-connected circumscribed rectangle extracting part 23 extracts a black-connected circumscribed rectangle from an intra-frame image and stores the coordinates of the extracted contents in a circumscribed rectangle memory 24. A line segmenting part 25 regards the black-connected circumscribed rectangle in contact with the frame as a part of a ruled line constituting the frame and removes it. A character segmenting/recognizing part 27 segments a character from a character line image and recognizes the segmented character.

Description

【発明の詳細な説明】 〔産業上の利用分野〕 本発明は1文字認識装置等において1文書や帳票等の画
像中の表の処理方法に関する。
DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a method for processing a table in an image of a single document, form, etc. in a single character recognition device or the like.

(従来の技術〕 文字認識装置において文書画像を処理する場合。(Conventional technology) When processing document images in a character recognition device.

画像を文字領域、写真や図等のイメージ領域1表領域等
に分割し、それぞれ別の処理を行うことが多い。
An image is often divided into a text area, an image area such as a photograph or diagram, one table area, etc., and different processing is performed on each area.

表領域に関しては、罫線の位IR座標を用いて。For table areas, use IR coordinates for ruled lines.

表中の各枠を認識し、各枠内の画像より文字を切り出し
て認識する方法がとられている。
The method used is to recognize each frame in a table, cut out characters from the image within each frame, and recognize them.

このような表の処理において、従来は、枠を構成する主
走査方向及び副走査方向の罫線の矩形を抽出し、この矩
形の内側(枠内より見て)の座標を用いて枠を定義して
いる。
Conventionally, in processing such tables, the rectangles of the ruled lines in the main scanning direction and sub-scanning direction that make up the frame are extracted, and the coordinates inside this rectangle (viewed from inside the frame) are used to define the frame. ing.

〔発明が解決しようとする課題〕[Problem to be solved by the invention]

しかし、文書画像が傾いて入力された場合に枠内の文字
を正しく切り出すことができなくなるという問題があっ
た。
However, there is a problem in that when a document image is input at an angle, the characters within the frame cannot be correctly extracted.

例えば第4図に示すような画像の場合について説明する
と、実線は罫線(線分)であるが、これは破線のような
矩形として抽出される。第5図は主走査方向(X方向)
の罫線の矩形の説明図であり、51が入力された実際の
罫線、52はその矩形である。そして、罫線に囲まれた
枠を認識するが、その際に従来は罫線の矩形の内側の座
標を用いる0例えば枠の上辺を構成する主走査方向の矩
形の場合、第5図に示す座標Yeにより枠の上辺のY座
標を定義する。したがって、第4図に示す傾いた画像の
場合、ハツチングを施した領域が枠の領域として抽出さ
れることになり、枠が実際よりも狭くなってしまう。そ
の結果、枠内の罫線に接近した文字が枠からはみ出し、
正常に切り出すことができなくなる場合がある。
For example, in the case of an image as shown in FIG. 4, solid lines are ruled lines (line segments), but they are extracted as rectangles like broken lines. Figure 5 shows the main scanning direction (X direction)
is an explanatory diagram of the rectangle of the ruled line, 51 is the actual inputted ruled line, and 52 is its rectangle. Then, when recognizing a frame surrounded by ruled lines, conventionally the coordinates inside the rectangle of the ruled lines are used. For example, in the case of a rectangle in the main scanning direction that forms the upper side of the frame, the coordinates Ye Define the Y coordinate of the top side of the frame. Therefore, in the case of the tilted image shown in FIG. 4, the hatched area will be extracted as the frame area, making the frame narrower than it actually is. As a result, characters that are close to the ruled lines within the frame may protrude from the frame.
It may not be possible to cut it out properly.

本発明の目的は、文書画像が傾いて人力された場合にお
いても、表中の文字を正しく切り出し認識することが可
能な表処理方法を提供することである。
An object of the present invention is to provide a table processing method that allows characters in a table to be correctly extracted and recognized even when a document image is tilted manually.

〔課題を解決するための手段〕[Means to solve the problem]

本発明の表処理方法は、表領域において主走査方向及び
層走査方向の罫線を包含する矩形を抽出し、罫線によっ
て囲まれる枠を罫線の矩形の外側(枠内より見て)の座
標を用いて認識し、枠内の黒連結の外接矩形を求め、枠
に接している外接矩形を除去し、残った外接矩形を用い
て枠内の文字を切り出すことを特徴とするものである。
The table processing method of the present invention extracts a rectangle that includes ruled lines in the main scanning direction and layer scanning direction in a table area, and creates a frame surrounded by the ruled lines using coordinates outside the ruled line rectangle (viewed from inside the frame). This method is characterized by recognizing the black connected circumscribing rectangles within the frame, removing the circumscribing rectangles touching the frame, and using the remaining circumscribing rectangles to cut out the characters within the frame.

〔作 用〕 罫線の矩形の外側の座標を用いて枠を認識するため、文
書が傾いている場合においても、認識される枠の幅が実
際より極端に狭くなることがなくなるので1文字欠けを
防止できる。
[Operation] Since the frame is recognized using the coordinates outside the ruled rectangle, even if the document is tilted, the width of the recognized frame will not be extremely narrower than the actual width, so there will be no missing characters. It can be prevented.

例えば第4図に示した画像と同じ傾いた表の画像の場合
、本発明によれば、認識される枠は第3図に示すように
広くなる。
For example, in the case of an image of a tilted table similar to the image shown in FIG. 4, according to the present invention, the recognized frame becomes wider as shown in FIG.

他方、罫線の矩形の外側の座標を用いるため。On the other hand, to use coordinates outside the ruled rectangle.

認識された枠の内部に、枠を構成する罫線の一部が含ま
れる。しかし、枠内の黒連結の外接矩形の中で枠に接触
したものを排除することにより、そのような罫線の線分
部分を除去できる。
A portion of the ruled lines forming the frame is included inside the recognized frame. However, such line segment portions of ruled lines can be removed by excluding those that touch the frame among the black connected circumscribed rectangles within the frame.

したがって、傾いた文書画像の場合においても。Therefore, even in the case of a tilted document image.

表中の文字を正しく切り出して認識することが可能とな
る。
It becomes possible to correctly extract and recognize the characters in the table.

〔実施例〕〔Example〕

第1図は本発明の一実施例を示すブロック図、第2図は
処理のフローチャートである。
FIG. 1 is a block diagram showing an embodiment of the present invention, and FIG. 2 is a flowchart of processing.

スキャナ等の2値画像入力部11によって文書を読取り
、その2値の画像を2値イメージメモリ12に格納する
(処理ステップ31)、この文書画像に対して、表領域
認識部13は表領域を認識し、そのイメージを表領域イ
メージメモリ14に格納する(処理ステップ32)、こ
の表領域認識は、ランレングス分布等を利用して自動的
に認識する方法と、マウス等を用いて外部より領域を指
定することによってH!&iする方法とがある。
The document is read by the binary image input unit 11 such as a scanner, and the binary image is stored in the binary image memory 12 (processing step 31). The image is recognized and stored in the table area image memory 14 (processing step 32).This table area recognition can be performed by automatically recognizing the area using run length distribution or by using a mouse or the like from outside. By specifying H! There is a way to do &i.

主走査方向線分抽出部15において、表領域イメージに
対して、主走査方向に連結した黒画素を追跡して主走査
方向の罫線を囲む矩形を抽出し、その始点座標(Xs、
Ys)及び終点座標(Xe。
The main scanning direction line segment extraction unit 15 traces the black pixels connected in the main scanning direction with respect to the table area image, extracts a rectangle surrounding the ruled line in the main scanning direction, and extracts a rectangle surrounding the ruled line in the main scanning direction, and its starting point coordinates (Xs,
Ys) and end point coordinates (Xe.

Ye)を主走査方向線分座標メモリ16に格納する(第
5図参照、処理ステップ33)。
Ye) is stored in the main scanning direction line segment coordinate memory 16 (see FIG. 5, processing step 33).

副走査方向線分抽出部17において、表領域イメージに
対し、同様に副走査方向の罫線の矩形を抽出し、その始
終点座標を副走査方向線分座標メモリ18に格納する(
処理ステップ34)。
The sub-scanning direction line segment extraction unit 17 similarly extracts the rectangles of the ruled lines in the sub-scanning direction from the table area image, and stores the coordinates of the start and end points in the sub-scanning direction line segment coordinate memory 18 (
Process step 34).

次に枠認識部19において、主走査方向罫線と副走査方
向罫線の組合せにより枠の認識を行い、この枠の例えば
対角頂点の座標を枠座標メモリ20に格納する(ステッ
プ35)。この際、前、述の如く、枠を構成する罫線の
矩形の外側の座標(上辺の罫線では上側、下辺の罫線で
は下側、左辺の罫線では左側、右辺の罫線では下側の座
標)を用いて、枠の対角頂点の座標を求める。このよう
にして、第3図にハツチング領域として示したような枠
の領域が認識される。
Next, the frame recognition unit 19 recognizes a frame based on a combination of the ruled lines in the main scanning direction and the ruled lines in the sub-scanning direction, and stores the coordinates of, for example, the diagonal vertices of this frame in the frame coordinate memory 20 (step 35). At this time, as mentioned above, the coordinates of the outside of the rectangle of the ruled lines that make up the frame (the coordinates of the top side for the top ruled line, the bottom side for the bottom ruled line, the left side for the left side ruled line, and the bottom coordinate for the right side ruled line) to find the coordinates of the diagonal vertices of the frame. In this way, the frame area shown as the hatched area in FIG. 3 is recognized.

次に枠領域抽出部21において、枠座標に従って表領域
イメージより枠内のイメージを切り出し、それを枠内イ
メージメモリ22に格納する(処理ステップ36)。
Next, the frame area extraction unit 21 cuts out an image within the frame from the table area image according to the frame coordinates, and stores it in the frame image memory 22 (processing step 36).

黒連結外接矩形抽出部23において、枠内イメージより
黒連結の外接矩形を抽出し、その座標を外接矩形メモリ
24に格納する(処理ステップ37)。
The black connected circumscribed rectangle extraction unit 23 extracts a black connected circumscribed rectangle from the frame image and stores its coordinates in the circumscribed rectangle memory 24 (processing step 37).

行切出し部25において、抽出された黒連結の外接矩形
の座標と枠の座標との比較により、外接矩形と枠との接
触を調べ、枠に接した黒連結外接矩形を枠を構成する罫
線の一部であるとみなし除去する(処理ステップ38)
、そして、残った枠内の黒連結外接矩形について、文字
サイズの推定、統合を行って枠内の文字行(文字素を構
成する外接矩形の統合矩形)を生成し、また、その必要
な修正または削除を行い、最終的な文字行のイメージを
枠内イメージメモリ22より切り出して行イメージメモ
リ26に格納する(処理ステップ39゜40.41,4
2) 。
The line cutting unit 25 checks the contact between the circumscribed rectangle and the frame by comparing the coordinates of the extracted black-connected circumscribed rectangle with the coordinates of the frame, and extracts the black-connected circumscribed rectangle that is in contact with the frame with the border of the ruled line constituting the frame. It is considered to be a part and removed (processing step 38).
Then, for the remaining black connected circumscribed rectangles in the frame, the character size is estimated and integrated to generate a character line within the frame (integrated rectangle of the circumscribed rectangles that make up the grapheme), and any necessary corrections are made. Alternatively, the image of the final character line is cut out from the frame image memory 22 and stored in the line image memory 26 (processing steps 39, 40, 41, 4).
2).

次に文字切出し・認識部27において1文字行イメージ
より文字を切出して認識する(処理ステップ44)。
Next, the character cutout/recognition unit 27 cuts out and recognizes characters from the one character line image (processing step 44).

〔発明の効果〕〔Effect of the invention〕

以上の説明から明らかな如く、本発明によれば。 As is clear from the above description, according to the present invention.

文書画像が傾いて人力された場合においても、認識され
る枠が不適当なほど実際より狭くなることがないため1
文字画像の欠落を防止することができ、また認識した枠
内の黒連結により矩形のうちの枠に接したものを除去す
ることによる、枠を構成する罫線を文字切出しに先立っ
て除去するンめ。
Even if the document image is tilted manually, the recognized frame will not become inappropriately narrower than it actually is.
It is possible to prevent missing character images, and it is also possible to remove the ruled lines that make up the frame before cutting out characters by removing the rectangles that touch the frame by connecting black in the recognized frame. .

文字の切出し及び認識を正確に行うことができる。It is possible to accurately cut out and recognize characters.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明の一実施例を示すブロック図。 第2図は処理のフローチャート、第3図は枠認識の説明
図、第4図は従来方法による枠認識の問題点の説明図、
第5図は罫線の矩形の説明図である。 11・・・2値画像入力部、 13・・・表領域認識部
。 15・・・主走査方向線分抽出部。 17・・・副走査方向線分抽出部。 19・・・枠認識部、 21・・・枠領域抽出部。 23・・・黒連結外接矩形抽出部。 25・・・行切出し部、 27・・・文字切出し・認識部。 第2 篇3図 富で鐵しn某巨形 #1職し旧肩1丹 ロ===コ粋領成・ 第5図 →−×
FIG. 1 is a block diagram showing one embodiment of the present invention. Fig. 2 is a flowchart of the process, Fig. 3 is an explanatory diagram of frame recognition, Fig. 4 is an explanatory diagram of problems in frame recognition by the conventional method,
FIG. 5 is an explanatory diagram of a ruled line rectangle. 11... Binary image input unit, 13... Table area recognition unit. 15...Main scanning direction line segment extraction unit. 17... Sub-scanning direction line segment extraction unit. 19... Frame recognition unit, 21... Frame area extraction unit. 23...Black connected circumscribed rectangle extraction part. 25...Line cutting section, 27...Character cutting/recognition section. Part 2 Figure 3 Ironworks with wealth n a certain giant #1 job old shoulder 1 tanro ===Ko style territory・Figure 5→-×

Claims (1)

【特許請求の範囲】[Claims] (1)表領域において、主走査方向及び副走査方向の罫
線の矩形を抽出し、罫線によって囲まれる枠を罫線の矩
形の外側の座標を用いて認識し、枠内の黒連結の外接矩
形を求め、枠に接している外接矩形を除去し、残った外
接矩形を用いて枠内の文字を切り出すことを特徴とする
表処理方法。
(1) In the table area, extract the ruled rectangles in the main scanning direction and the sub-scanning direction, recognize the frame surrounded by the ruled lines using the coordinates outside the ruled line rectangle, and calculate the black connected circumscribed rectangle within the frame. A table processing method characterized by removing a circumscribed rectangle touching the frame, and using the remaining circumscribed rectangle to cut out characters within the frame.
JP1314519A 1989-11-30 1989-11-30 Table processing method Expired - Fee Related JP2851089B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP1314519A JP2851089B2 (en) 1989-11-30 1989-11-30 Table processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP1314519A JP2851089B2 (en) 1989-11-30 1989-11-30 Table processing method

Publications (2)

Publication Number Publication Date
JPH03172984A true JPH03172984A (en) 1991-07-26
JP2851089B2 JP2851089B2 (en) 1999-01-27

Family

ID=18054260

Family Applications (1)

Application Number Title Priority Date Filing Date
JP1314519A Expired - Fee Related JP2851089B2 (en) 1989-11-30 1989-11-30 Table processing method

Country Status (1)

Country Link
JP (1) JP2851089B2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5848186A (en) * 1995-08-11 1998-12-08 Canon Kabushiki Kaisha Feature extraction system for identifying text within a table image
US5898795A (en) * 1995-12-08 1999-04-27 Ricoh Company, Ltd. Character recognition method using a method for deleting ruled lines
US7660014B2 (en) 2006-01-17 2010-02-09 Konica Minolta Business Technologies, Inc. Image processing apparatus capable of extracting rule from document image with high precision
US8208744B2 (en) 2006-01-23 2012-06-26 Konica Minolta Business Technologies, Inc. Image processing apparatus capable of accurately and quickly determining character part included in image

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5848186A (en) * 1995-08-11 1998-12-08 Canon Kabushiki Kaisha Feature extraction system for identifying text within a table image
US5898795A (en) * 1995-12-08 1999-04-27 Ricoh Company, Ltd. Character recognition method using a method for deleting ruled lines
US7660014B2 (en) 2006-01-17 2010-02-09 Konica Minolta Business Technologies, Inc. Image processing apparatus capable of extracting rule from document image with high precision
US8208744B2 (en) 2006-01-23 2012-06-26 Konica Minolta Business Technologies, Inc. Image processing apparatus capable of accurately and quickly determining character part included in image

Also Published As

Publication number Publication date
JP2851089B2 (en) 1999-01-27

Similar Documents

Publication Publication Date Title
JP2940936B2 (en) Tablespace identification method
US5075895A (en) Method and apparatus for recognizing table area formed in binary image of document
JP4189506B2 (en) Apparatus, method and recording medium for image processing
JPH03122773A (en) Image forming device
JP4077094B2 (en) Color document image recognition device
JPH03172984A (en) Table processing method
JPH0991453A (en) Image processing method and its unit
JP2851087B2 (en) Table processing method
JP2890306B2 (en) Table space separation apparatus and table space separation method
JP2800192B2 (en) High-speed character / graphic separation device
JP4040231B2 (en) Character extraction method and apparatus, and storage medium
JP3140079B2 (en) Ruled line recognition method and table processing method
JPH0468481A (en) Character segmenting device
JPS61190679A (en) Character data processing device
JPH0728934A (en) Document image processor
JP2887803B2 (en) Document image processing device
JPH09161007A (en) Method for recognizing character in table area
JPH04106670A (en) Document picture processor
JPH05128305A (en) Area dividing method
JPH0728933A (en) Character recognition device
JP3566738B2 (en) Shaded area processing method and shaded area processing apparatus
JPH05174178A (en) Character recognizing method
JPH02253386A (en) Character recognizing device
JPH04167194A (en) Table processing system
JPH0266681A (en) Drawing processor

Legal Events

Date Code Title Description
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20071113

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20081113

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20081113

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20091113

Year of fee payment: 11

LAPS Cancellation because of no payment of annual fees