JP2851089B2 - Table processing method - Google Patents

Table processing method

Info

Publication number
JP2851089B2
JP2851089B2 JP1314519A JP31451989A JP2851089B2 JP 2851089 B2 JP2851089 B2 JP 2851089B2 JP 1314519 A JP1314519 A JP 1314519A JP 31451989 A JP31451989 A JP 31451989A JP 2851089 B2 JP2851089 B2 JP 2851089B2
Authority
JP
Japan
Prior art keywords
frame
rectangle
image
scanning direction
ruled line
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
JP1314519A
Other languages
Japanese (ja)
Other versions
JPH03172984A (en
Inventor
吾朗 別所
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ricoh Co Ltd
Original Assignee
Ricoh Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ricoh Co Ltd filed Critical Ricoh Co Ltd
Priority to JP1314519A priority Critical patent/JP2851089B2/en
Publication of JPH03172984A publication Critical patent/JPH03172984A/en
Application granted granted Critical
Publication of JP2851089B2 publication Critical patent/JP2851089B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Description

【発明の詳細な説明】 〔産業上の利用分野〕 本発明は、文字認識装置等において、文書や帳票等の
画像中の表の処理方法に関する。
Description: TECHNICAL FIELD The present invention relates to a method of processing a table in an image such as a document or a form in a character recognition device or the like.

〔従来の技術〕[Conventional technology]

文字認識装置において文書画像を処理する場合、画像
を文字領域、写真や図等のイメージ領域、表領域等に分
割し、それぞれ別の処理を行うことが多い。
When a document image is processed by a character recognition device, the image is often divided into a character area, an image area such as a photograph or a drawing, a table area, and the like, and each processing is performed separately.

表領域に関しては、罫線の位置座標を用いて、表中の
各枠を認識し、各枠内の画像より文字を切り出して認識
する方法がとられている。
With respect to the table area, a method is used in which each frame in the table is recognized using the position coordinates of the ruled line, and characters are cut out from the image in each frame.

このような表の処理において、従来は、枠を構成する
主走査方向及び副走査方向の罫線の矩形を抽出し、この
矩形の内側(枠内より見て)の座標を用いて枠を定義し
ている。
In the processing of such a table, conventionally, a rectangle of a ruled line in the main scanning direction and the sub-scanning direction which constitutes a frame is extracted, and a frame is defined using coordinates inside the rectangle (as viewed from the inside of the frame). ing.

〔発明が解決しようとする課題〕[Problems to be solved by the invention]

しかし、文書画像が傾いて入力された場合に枠内の文
字を正しく切り出すことができなくなるという問題があ
った。
However, there has been a problem that when a document image is input with an inclination, characters in the frame cannot be cut out correctly.

例えば第4図に示すような画像の場合について説明す
ると、実線は罫線(線分)であるが、これは破線のよう
な矩形として抽出される。第5図は主走査方向(X方
向)の罫線の矩形の説明図であり、51が入力された実際
の罫線、52はその矩形である。そして、罫線に囲まれた
枠を認識するが、その際に従来は罫線の矩形の内側の座
標を用いる。例えば枠の上辺を構成する主走査方向の矩
形の場合、第5図に示す座標Yeにより枠の上辺のY座標
を定義する。したがって、第4図に示す傾いた画像の場
合、ハッチングを施した領域が枠の領域として抽出され
ることになり、枠が実際よりも狭くなってしまう。その
結果、枠内の罫線に接近した文字が枠からはみ出し、正
常に切り出すことができなくなる場合がある。
For example, in the case of an image as shown in FIG. 4, a solid line is a ruled line (line segment), which is extracted as a rectangle like a broken line. FIG. 5 is an explanatory diagram of a rectangle of a ruled line in the main scanning direction (X direction), where 51 is an actual ruled line input, and 52 is the rectangle. Then, a frame surrounded by the ruled line is recognized. At that time, the coordinates inside the rectangle of the ruled line are conventionally used. For example, in the case of a rectangle in the main scanning direction constituting the upper side of the frame, the Y coordinate of the upper side of the frame is defined by the coordinates Ye shown in FIG. Therefore, in the case of the inclined image shown in FIG. 4, the hatched area is extracted as a frame area, and the frame becomes narrower than it actually is. As a result, characters approaching the ruled line in the frame may protrude from the frame and may not be cut out normally.

本発明の目的は、文書画像が傾いて入力された場合に
おいても、表中の文字を正しく切り出し認識することが
可能な表処理方法を提供することである。
SUMMARY OF THE INVENTION It is an object of the present invention to provide a table processing method capable of correctly cutting out and recognizing characters in a table even when a document image is input at an angle.

〔課題を解決するための手段〕[Means for solving the problem]

本発明の表処理方法は、表領域において主走査方向及
び層走査方向の罫線を包含する矩形を抽出し、罫線によ
って囲まれる枠を罫線の矩形の外側(枠内より見て)の
座標を用いて認識し、枠内の黒連結の外接矩形を求め、
枠に接している外接矩形を除去し、残った外接矩形を用
いて枠内の文字を切り出すことを特徴とするものであ
る。
According to the table processing method of the present invention, a rectangle including ruled lines in the main scanning direction and the layer scanning direction is extracted in a table area, and a frame surrounded by the ruled line is used by using coordinates outside the rectangle of the ruled line (as viewed from within the frame). And find the circumscribed rectangle of the black connection in the frame,
The present invention is characterized in that a circumscribed rectangle in contact with a frame is removed, and characters in the frame are cut out using the remaining circumscribed rectangle.

〔作 用〕(Operation)

罫線の矩形の外側の座標を用いて枠を認識するため、
文書が傾いている場合においても、認識される枠の幅が
実際より極端に狭くなることがなくなるので、文字欠け
を防止できる。
To recognize the frame using the coordinates outside the rectangle of the ruled line,
Even when the document is inclined, the width of the recognized frame does not become extremely narrower than it actually is, so that character missing can be prevented.

例えば第4図に示した画像と同じ傾いた表の画像の場
合、本発明によれば、認識されら枠は第3図に示すよう
に広くなる。
For example, in the case of the image of the same inclined table as the image shown in FIG. 4, according to the present invention, the recognized frame is widened as shown in FIG.

他方、罫線の矩形の外側の座標を用いるため、認識さ
れた枠の内部に、枠を構成する罫線の一部が含まれる。
しかし、枠内の黒連結の外接矩形の中で枠に接触したも
のを排除することにより、そのような罫線の線分部分を
除去できる。
On the other hand, since the coordinates outside the rectangle of the ruled line are used, a part of the ruled line constituting the frame is included in the recognized frame.
However, by excluding the circumscribed rectangle of the black connection in the frame that touches the frame, the line segment portion of such a ruled line can be removed.

したがって、傾いた文書画像の場合においても、表中
の文字を正しく切り出して認識することが可能となる。
Therefore, even in the case of a tilted document image, characters in the table can be correctly cut out and recognized.

〔実施例〕〔Example〕

第1図は本発明の一実施例を示すブロック図、第2図
は処理のフローチャートである。
FIG. 1 is a block diagram showing an embodiment of the present invention, and FIG. 2 is a flowchart of a process.

スキャナ等の2値画像入力部11によって文書を読取
り、その2値の画像を2値イメージメモリ12に格納する
(処理ステップ31)。この文書画像に対して、表領域認
識部13は表領域を認識し、そのイメージを表領域イメー
ジメモリ14に格納する(処理ステップ32)。この表領域
認識は、ランレングス分布等を利用して自動的に認識す
る方法と、マウス等を用いて外部より領域を指定するこ
とによって認識する方法とがある。
The document is read by the binary image input unit 11 such as a scanner, and the binary image is stored in the binary image memory 12 (processing step 31). For this document image, the table area recognition unit 13 recognizes the table area and stores the image in the table area image memory 14 (processing step 32). The table area recognition includes a method of automatically recognizing the area using a run-length distribution or the like, and a method of recognizing the area by designating the area from outside using a mouse or the like.

主走査方向線分抽出部15において、表領域イメージに
対して、主走査方向に連結した黒画素を追跡して主走査
方向の罫線を囲む矩形を抽出し、その始点座標(Xs,Y
s)及び終点座標(Xe,Ye)を主走査方向線分座標メモリ
16に格納する(第5図参照、処理ステップ33)。
The main scanning direction line segment extraction unit 15 traces black pixels connected in the main scanning direction with respect to the table area image, extracts a rectangle surrounding the ruled line in the main scanning direction, and sets its starting point coordinates (Xs, Y
s) and end point coordinates (Xe, Ye) in the main scanning direction line segment memory
16 (see FIG. 5, processing step 33).

副走査方向線分抽出部17において、表領域イメージに
対し、同様に副走査方向の罫線の矩形を抽出し、その始
終点座標を副走査方向線分座標メモリ18に格納する(処
理ステップ34)。
The sub-scanning direction line segment extraction unit 17 similarly extracts a ruled line rectangle in the sub-scanning direction from the table area image, and stores the start and end point coordinates in the sub-scanning direction line segment coordinate memory 18 (processing step 34). .

次に枠認識部19において、主走査方向罫線と副走査方
向罫線の組合せにより枠の認識を行い、この枠の例えば
対角頂点の座標を枠座標メモリ20に格納する(ステップ
35)。この際、前述の如く、枠を構成する罫線の矩形の
外側の座標(上辺の罫線では上側、下辺の罫線では下
側、左辺の罫線では外側、右辺の罫線では下側の座標)
を用いて、枠の対角頂点の座標を求める。このようにし
て、第3図にハッチング領域として示したような枠の領
域が認識される。
Next, the frame recognizing unit 19 recognizes the frame based on a combination of the ruled line in the main scanning direction and the ruled line in the sub-scanning direction, and stores, for example, the coordinates of a diagonal vertex of the frame in the frame coordinate memory 20 (step
35). At this time, as described above, the coordinates outside the rectangle of the ruled line forming the frame (upper ruled line, lower side ruled line lower, left side ruled line outer, right side ruled line lower coordinate)
Is used to find the coordinates of the diagonal vertices of the frame. In this way, a frame region as shown as a hatched region in FIG. 3 is recognized.

次に枠領域抽出部21において、枠座標に従って表領域
イメージより枠内のイメージを切り出し、それを枠内イ
メージメモリ22に格納する(処理ステップ36)。
Next, in the frame area extracting unit 21, an image in the frame is cut out from the table area image in accordance with the frame coordinates and stored in the in-frame image memory 22 (processing step 36).

黒連結外接矩形抽出部23において、枠内イメージより
黒連結の外接矩形を抽出し、その座標を外接矩形メモリ
24に格納する(処理ステップ37)。
The black-connected circumscribed rectangle extracting unit 23 extracts a black-connected circumscribed rectangle from the image inside the frame, and stores the coordinates of the circumscribed rectangle memory
Stored in 24 (processing step 37).

行切出し部25において、抽出された黒連結の外接矩形
の座標と枠の座標との比較により、外接矩形と枠との接
触を調べ、枠に接した黒連結外接矩形を枠を構成する罫
線の一部であるとみなし除去する(処理ステップ38)。
そして、残った枠内の黒連結外接矩形について、文字サ
イズの推定、統合を行って枠内の文字行(文字素を構成
する外接矩形の統合矩形)を生成し、また、その必要な
修正または削除を行い、最終的な文字行のイメージを枠
内イメージメモリ22より切り出して行イメージメモリ26
に格納する(処理ステップ39,40,41,42)。
The line cutout unit 25 checks the contact between the circumscribed rectangle and the frame by comparing the coordinates of the extracted black-connected circumscribed rectangle with the coordinates of the frame, and determines the black-connected circumscribed rectangle that is in contact with the frame with the ruled line forming the frame. It is regarded as a part and removed (processing step 38).
Then, for the remaining black connected circumscribed rectangle in the frame, the character size is estimated and integrated to generate a character line (integrated rectangle of the circumscribed rectangle forming the character element) in the frame, and the necessary correction or After deletion, the final character line image is cut out from the in-frame image memory 22 and the line image memory 26 is deleted.
(Processing steps 39, 40, 41, 42).

次に文字切出し・認識部27において、文字行イメージ
より文字を切出して認識する(処理ステップ44)。
Next, the character extraction / recognition unit 27 extracts and recognizes characters from the character line image (processing step 44).

〔発明の効果〕〔The invention's effect〕

以上の説明から明らかな如く、本発明によれば、文書
画像が傾いて入力された場合においても、認識される枠
が不適当なほど実際より狭くなることがないため、文字
画像の欠落を防止することができ、また認識した枠内の
黒連結により矩形のうちの枠に接したものを除去するこ
とによる、枠を構成する罫線を文字切出しに先立って除
去するため、文字の切出し及び認識を正確に行うことが
できる。
As is apparent from the above description, according to the present invention, even when a document image is input with an inclination, the frame to be recognized does not become inappropriately narrower than the actual frame, thereby preventing the character image from being lost. In order to remove the ruled lines constituting the frame prior to character extraction by removing the rectangles that touch the frame by black connection in the recognized frame, character extraction and recognition are performed. Can be done accurately.

【図面の簡単な説明】[Brief description of the drawings]

第1図は本発明の一実施例を示すブロック図、第2図は
処理のフローチャート、第3図は枠認識の説明図、第4
図は従来方法による枠認識の問題点の説明図、第5図は
罫線の矩形の説明図である。 11……2値画像入力部、13……表領域認識部, 15……主走査方向線分抽出部、 17……副走査方向線分抽出部、 19……枠認識部、21……枠領域抽出部、 23……黒連結外接矩形抽出部、 25……行切出し部、 27……文字切出し・認識部。
FIG. 1 is a block diagram showing an embodiment of the present invention, FIG. 2 is a flowchart of a process, FIG. 3 is an explanatory diagram of frame recognition, FIG.
FIG. 5 is an explanatory diagram of a problem of frame recognition by a conventional method, and FIG. 5 is an explanatory diagram of a ruled line rectangle. 11: Binary image input unit, 13: Table area recognition unit, 15: Main scanning direction line segment extraction unit, 17: Sub-scanning direction line segment extraction unit, 19: Frame recognition unit, 21: Frame Area extraction unit, 23: Black connected circumscribed rectangle extraction unit, 25: Line extraction unit, 27: Character extraction / recognition unit

Claims (1)

(57)【特許請求の範囲】(57) [Claims] 【請求項1】表領域において、主走査方向及び副走査方
向の罫線の矩形を抽出し、罫線によって囲まれる枠を罫
線の矩形の外側の座標を用いて認識し、枠内の黒連結の
外接矩形を求め、枠に接している外接矩形を除去し、残
った外接矩形を用いて枠内の文字を切り出すことを特徴
とする表処理方法。
In a table area, rectangles of ruled lines in a main scanning direction and a sub-scanning direction are extracted, and a frame surrounded by the ruled lines is recognized using coordinates outside the rectangle of the ruled lines. A table processing method comprising: obtaining a rectangle; removing a circumscribed rectangle in contact with the frame; and cutting out characters in the frame using the remaining circumscribed rectangle.
JP1314519A 1989-11-30 1989-11-30 Table processing method Expired - Fee Related JP2851089B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP1314519A JP2851089B2 (en) 1989-11-30 1989-11-30 Table processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP1314519A JP2851089B2 (en) 1989-11-30 1989-11-30 Table processing method

Publications (2)

Publication Number Publication Date
JPH03172984A JPH03172984A (en) 1991-07-26
JP2851089B2 true JP2851089B2 (en) 1999-01-27

Family

ID=18054260

Family Applications (1)

Application Number Title Priority Date Filing Date
JP1314519A Expired - Fee Related JP2851089B2 (en) 1989-11-30 1989-11-30 Table processing method

Country Status (1)

Country Link
JP (1) JP2851089B2 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5848186A (en) * 1995-08-11 1998-12-08 Canon Kabushiki Kaisha Feature extraction system for identifying text within a table image
US5898795A (en) * 1995-12-08 1999-04-27 Ricoh Company, Ltd. Character recognition method using a method for deleting ruled lines
JP4329764B2 (en) 2006-01-17 2009-09-09 コニカミノルタビジネステクノロジーズ株式会社 Image processing apparatus and ruled line extraction program
JP4424309B2 (en) 2006-01-23 2010-03-03 コニカミノルタビジネステクノロジーズ株式会社 Image processing apparatus, character determination program, and character determination method

Also Published As

Publication number Publication date
JPH03172984A (en) 1991-07-26

Similar Documents

Publication Publication Date Title
JP2812982B2 (en) Table recognition method
JP4189506B2 (en) Apparatus, method and recording medium for image processing
JP2851089B2 (en) Table processing method
JP2851087B2 (en) Table processing method
JP4281236B2 (en) Image recognition apparatus, image recognition method, and computer-readable recording medium storing image recognition program
JP2003067670A (en) Image processor, image processing method, image processing program and computer-readable recording medium having image processing program recorded thereon
JP2957729B2 (en) Line direction determination device
JP2800192B2 (en) High-speed character / graphic separation device
JP4040231B2 (en) Character extraction method and apparatus, and storage medium
JP3140079B2 (en) Ruled line recognition method and table processing method
JP3046652B2 (en) How to correct the inclination of text documents
JP3391987B2 (en) Form recognition device
JPH09161007A (en) Method for recognizing character in table area
JP2931041B2 (en) Character recognition method in table
JP2948840B2 (en) Rectangle extraction method
JP3157534B2 (en) Table recognition method
JP3566738B2 (en) Shaded area processing method and shaded area processing apparatus
JPH05128305A (en) Area dividing method
JPH02253386A (en) Character recognizing device
JPH0343879A (en) Character area separating system for character recognizing device
JPH05108880A (en) English character recognition device
JPH09237321A (en) Device for recognizing handwritten character
JPS6327990A (en) Character recognizing method
JPH0934995A (en) Method and device for processing image
JPH04167194A (en) Table processing system

Legal Events

Date Code Title Description
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20071113

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20081113

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20081113

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20091113

Year of fee payment: 11

LAPS Cancellation because of no payment of annual fees