JPH03172984A - Table processing method - Google Patents
Table processing methodInfo
- Publication number
- JPH03172984A JPH03172984A JP1314519A JP31451989A JPH03172984A JP H03172984 A JPH03172984 A JP H03172984A JP 1314519 A JP1314519 A JP 1314519A JP 31451989 A JP31451989 A JP 31451989A JP H03172984 A JPH03172984 A JP H03172984A
- Authority
- JP
- Japan
- Prior art keywords
- frame
- scanning direction
- image
- rectangle
- black
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003672 processing method Methods 0.000 title claims description 4
- 239000000284 extract Substances 0.000 claims abstract description 10
- 238000000605 extraction Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 238000000034 method Methods 0.000 description 5
- 238000007796 conventional method Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
Abstract
Description
【発明の詳細な説明】
〔産業上の利用分野〕
本発明は1文字認識装置等において1文書や帳票等の画
像中の表の処理方法に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a method for processing a table in an image of a single document, form, etc. in a single character recognition device or the like.
(従来の技術〕 文字認識装置において文書画像を処理する場合。(Conventional technology) When processing document images in a character recognition device.
画像を文字領域、写真や図等のイメージ領域1表領域等
に分割し、それぞれ別の処理を行うことが多い。An image is often divided into a text area, an image area such as a photograph or diagram, one table area, etc., and different processing is performed on each area.
表領域に関しては、罫線の位IR座標を用いて。For table areas, use IR coordinates for ruled lines.
表中の各枠を認識し、各枠内の画像より文字を切り出し
て認識する方法がとられている。The method used is to recognize each frame in a table, cut out characters from the image within each frame, and recognize them.
このような表の処理において、従来は、枠を構成する主
走査方向及び副走査方向の罫線の矩形を抽出し、この矩
形の内側(枠内より見て)の座標を用いて枠を定義して
いる。Conventionally, in processing such tables, the rectangles of the ruled lines in the main scanning direction and sub-scanning direction that make up the frame are extracted, and the coordinates inside this rectangle (viewed from inside the frame) are used to define the frame. ing.
しかし、文書画像が傾いて入力された場合に枠内の文字
を正しく切り出すことができなくなるという問題があっ
た。However, there is a problem in that when a document image is input at an angle, the characters within the frame cannot be correctly extracted.
例えば第4図に示すような画像の場合について説明する
と、実線は罫線(線分)であるが、これは破線のような
矩形として抽出される。第5図は主走査方向(X方向)
の罫線の矩形の説明図であり、51が入力された実際の
罫線、52はその矩形である。そして、罫線に囲まれた
枠を認識するが、その際に従来は罫線の矩形の内側の座
標を用いる0例えば枠の上辺を構成する主走査方向の矩
形の場合、第5図に示す座標Yeにより枠の上辺のY座
標を定義する。したがって、第4図に示す傾いた画像の
場合、ハツチングを施した領域が枠の領域として抽出さ
れることになり、枠が実際よりも狭くなってしまう。そ
の結果、枠内の罫線に接近した文字が枠からはみ出し、
正常に切り出すことができなくなる場合がある。For example, in the case of an image as shown in FIG. 4, solid lines are ruled lines (line segments), but they are extracted as rectangles like broken lines. Figure 5 shows the main scanning direction (X direction)
is an explanatory diagram of the rectangle of the ruled line, 51 is the actual inputted ruled line, and 52 is its rectangle. Then, when recognizing a frame surrounded by ruled lines, conventionally the coordinates inside the rectangle of the ruled lines are used. For example, in the case of a rectangle in the main scanning direction that forms the upper side of the frame, the coordinates Ye Define the Y coordinate of the top side of the frame. Therefore, in the case of the tilted image shown in FIG. 4, the hatched area will be extracted as the frame area, making the frame narrower than it actually is. As a result, characters that are close to the ruled lines within the frame may protrude from the frame.
It may not be possible to cut it out properly.
本発明の目的は、文書画像が傾いて人力された場合にお
いても、表中の文字を正しく切り出し認識することが可
能な表処理方法を提供することである。An object of the present invention is to provide a table processing method that allows characters in a table to be correctly extracted and recognized even when a document image is tilted manually.
本発明の表処理方法は、表領域において主走査方向及び
層走査方向の罫線を包含する矩形を抽出し、罫線によっ
て囲まれる枠を罫線の矩形の外側(枠内より見て)の座
標を用いて認識し、枠内の黒連結の外接矩形を求め、枠
に接している外接矩形を除去し、残った外接矩形を用い
て枠内の文字を切り出すことを特徴とするものである。The table processing method of the present invention extracts a rectangle that includes ruled lines in the main scanning direction and layer scanning direction in a table area, and creates a frame surrounded by the ruled lines using coordinates outside the ruled line rectangle (viewed from inside the frame). This method is characterized by recognizing the black connected circumscribing rectangles within the frame, removing the circumscribing rectangles touching the frame, and using the remaining circumscribing rectangles to cut out the characters within the frame.
〔作 用〕
罫線の矩形の外側の座標を用いて枠を認識するため、文
書が傾いている場合においても、認識される枠の幅が実
際より極端に狭くなることがなくなるので1文字欠けを
防止できる。[Operation] Since the frame is recognized using the coordinates outside the ruled rectangle, even if the document is tilted, the width of the recognized frame will not be extremely narrower than the actual width, so there will be no missing characters. It can be prevented.
例えば第4図に示した画像と同じ傾いた表の画像の場合
、本発明によれば、認識される枠は第3図に示すように
広くなる。For example, in the case of an image of a tilted table similar to the image shown in FIG. 4, according to the present invention, the recognized frame becomes wider as shown in FIG.
他方、罫線の矩形の外側の座標を用いるため。On the other hand, to use coordinates outside the ruled rectangle.
認識された枠の内部に、枠を構成する罫線の一部が含ま
れる。しかし、枠内の黒連結の外接矩形の中で枠に接触
したものを排除することにより、そのような罫線の線分
部分を除去できる。A portion of the ruled lines forming the frame is included inside the recognized frame. However, such line segment portions of ruled lines can be removed by excluding those that touch the frame among the black connected circumscribed rectangles within the frame.
したがって、傾いた文書画像の場合においても。Therefore, even in the case of a tilted document image.
表中の文字を正しく切り出して認識することが可能とな
る。It becomes possible to correctly extract and recognize the characters in the table.
第1図は本発明の一実施例を示すブロック図、第2図は
処理のフローチャートである。FIG. 1 is a block diagram showing an embodiment of the present invention, and FIG. 2 is a flowchart of processing.
スキャナ等の2値画像入力部11によって文書を読取り
、その2値の画像を2値イメージメモリ12に格納する
(処理ステップ31)、この文書画像に対して、表領域
認識部13は表領域を認識し、そのイメージを表領域イ
メージメモリ14に格納する(処理ステップ32)、こ
の表領域認識は、ランレングス分布等を利用して自動的
に認識する方法と、マウス等を用いて外部より領域を指
定することによってH!&iする方法とがある。The document is read by the binary image input unit 11 such as a scanner, and the binary image is stored in the binary image memory 12 (processing step 31). The image is recognized and stored in the table area image memory 14 (processing step 32).This table area recognition can be performed by automatically recognizing the area using run length distribution or by using a mouse or the like from outside. By specifying H! There is a way to do &i.
主走査方向線分抽出部15において、表領域イメージに
対して、主走査方向に連結した黒画素を追跡して主走査
方向の罫線を囲む矩形を抽出し、その始点座標(Xs、
Ys)及び終点座標(Xe。The main scanning direction line segment extraction unit 15 traces the black pixels connected in the main scanning direction with respect to the table area image, extracts a rectangle surrounding the ruled line in the main scanning direction, and extracts a rectangle surrounding the ruled line in the main scanning direction, and its starting point coordinates (Xs,
Ys) and end point coordinates (Xe.
Ye)を主走査方向線分座標メモリ16に格納する(第
5図参照、処理ステップ33)。Ye) is stored in the main scanning direction line segment coordinate memory 16 (see FIG. 5, processing step 33).
副走査方向線分抽出部17において、表領域イメージに
対し、同様に副走査方向の罫線の矩形を抽出し、その始
終点座標を副走査方向線分座標メモリ18に格納する(
処理ステップ34)。The sub-scanning direction line segment extraction unit 17 similarly extracts the rectangles of the ruled lines in the sub-scanning direction from the table area image, and stores the coordinates of the start and end points in the sub-scanning direction line segment coordinate memory 18 (
Process step 34).
次に枠認識部19において、主走査方向罫線と副走査方
向罫線の組合せにより枠の認識を行い、この枠の例えば
対角頂点の座標を枠座標メモリ20に格納する(ステッ
プ35)。この際、前、述の如く、枠を構成する罫線の
矩形の外側の座標(上辺の罫線では上側、下辺の罫線で
は下側、左辺の罫線では左側、右辺の罫線では下側の座
標)を用いて、枠の対角頂点の座標を求める。このよう
にして、第3図にハツチング領域として示したような枠
の領域が認識される。Next, the frame recognition unit 19 recognizes a frame based on a combination of the ruled lines in the main scanning direction and the ruled lines in the sub-scanning direction, and stores the coordinates of, for example, the diagonal vertices of this frame in the frame coordinate memory 20 (step 35). At this time, as mentioned above, the coordinates of the outside of the rectangle of the ruled lines that make up the frame (the coordinates of the top side for the top ruled line, the bottom side for the bottom ruled line, the left side for the left side ruled line, and the bottom coordinate for the right side ruled line) to find the coordinates of the diagonal vertices of the frame. In this way, the frame area shown as the hatched area in FIG. 3 is recognized.
次に枠領域抽出部21において、枠座標に従って表領域
イメージより枠内のイメージを切り出し、それを枠内イ
メージメモリ22に格納する(処理ステップ36)。Next, the frame area extraction unit 21 cuts out an image within the frame from the table area image according to the frame coordinates, and stores it in the frame image memory 22 (processing step 36).
黒連結外接矩形抽出部23において、枠内イメージより
黒連結の外接矩形を抽出し、その座標を外接矩形メモリ
24に格納する(処理ステップ37)。The black connected circumscribed rectangle extraction unit 23 extracts a black connected circumscribed rectangle from the frame image and stores its coordinates in the circumscribed rectangle memory 24 (processing step 37).
行切出し部25において、抽出された黒連結の外接矩形
の座標と枠の座標との比較により、外接矩形と枠との接
触を調べ、枠に接した黒連結外接矩形を枠を構成する罫
線の一部であるとみなし除去する(処理ステップ38)
、そして、残った枠内の黒連結外接矩形について、文字
サイズの推定、統合を行って枠内の文字行(文字素を構
成する外接矩形の統合矩形)を生成し、また、その必要
な修正または削除を行い、最終的な文字行のイメージを
枠内イメージメモリ22より切り出して行イメージメモ
リ26に格納する(処理ステップ39゜40.41,4
2) 。The line cutting unit 25 checks the contact between the circumscribed rectangle and the frame by comparing the coordinates of the extracted black-connected circumscribed rectangle with the coordinates of the frame, and extracts the black-connected circumscribed rectangle that is in contact with the frame with the border of the ruled line constituting the frame. It is considered to be a part and removed (processing step 38).
Then, for the remaining black connected circumscribed rectangles in the frame, the character size is estimated and integrated to generate a character line within the frame (integrated rectangle of the circumscribed rectangles that make up the grapheme), and any necessary corrections are made. Alternatively, the image of the final character line is cut out from the frame image memory 22 and stored in the line image memory 26 (processing steps 39, 40, 41, 4).
2).
次に文字切出し・認識部27において1文字行イメージ
より文字を切出して認識する(処理ステップ44)。Next, the character cutout/recognition unit 27 cuts out and recognizes characters from the one character line image (processing step 44).
以上の説明から明らかな如く、本発明によれば。 As is clear from the above description, according to the present invention.
文書画像が傾いて人力された場合においても、認識され
る枠が不適当なほど実際より狭くなることがないため1
文字画像の欠落を防止することができ、また認識した枠
内の黒連結により矩形のうちの枠に接したものを除去す
ることによる、枠を構成する罫線を文字切出しに先立っ
て除去するンめ。Even if the document image is tilted manually, the recognized frame will not become inappropriately narrower than it actually is.
It is possible to prevent missing character images, and it is also possible to remove the ruled lines that make up the frame before cutting out characters by removing the rectangles that touch the frame by connecting black in the recognized frame. .
文字の切出し及び認識を正確に行うことができる。It is possible to accurately cut out and recognize characters.
第1図は本発明の一実施例を示すブロック図。
第2図は処理のフローチャート、第3図は枠認識の説明
図、第4図は従来方法による枠認識の問題点の説明図、
第5図は罫線の矩形の説明図である。
11・・・2値画像入力部、 13・・・表領域認識部
。
15・・・主走査方向線分抽出部。
17・・・副走査方向線分抽出部。
19・・・枠認識部、 21・・・枠領域抽出部。
23・・・黒連結外接矩形抽出部。
25・・・行切出し部、
27・・・文字切出し・認識部。
第2
篇3図
富で鐵しn某巨形
#1職し旧肩1丹
ロ===コ粋領成・
第5図
→−×FIG. 1 is a block diagram showing one embodiment of the present invention. Fig. 2 is a flowchart of the process, Fig. 3 is an explanatory diagram of frame recognition, Fig. 4 is an explanatory diagram of problems in frame recognition by the conventional method,
FIG. 5 is an explanatory diagram of a ruled line rectangle. 11... Binary image input unit, 13... Table area recognition unit. 15...Main scanning direction line segment extraction unit. 17... Sub-scanning direction line segment extraction unit. 19... Frame recognition unit, 21... Frame area extraction unit. 23...Black connected circumscribed rectangle extraction part. 25...Line cutting section, 27...Character cutting/recognition section. Part 2 Figure 3 Ironworks with wealth n a certain giant #1 job old shoulder 1 tanro ===Ko style territory・Figure 5→-×
Claims (1)
線の矩形を抽出し、罫線によって囲まれる枠を罫線の矩
形の外側の座標を用いて認識し、枠内の黒連結の外接矩
形を求め、枠に接している外接矩形を除去し、残った外
接矩形を用いて枠内の文字を切り出すことを特徴とする
表処理方法。(1) In the table area, extract the ruled rectangles in the main scanning direction and the sub-scanning direction, recognize the frame surrounded by the ruled lines using the coordinates outside the ruled line rectangle, and calculate the black connected circumscribed rectangle within the frame. A table processing method characterized by removing a circumscribed rectangle touching the frame, and using the remaining circumscribed rectangle to cut out characters within the frame.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP1314519A JP2851089B2 (en) | 1989-11-30 | 1989-11-30 | Table processing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP1314519A JP2851089B2 (en) | 1989-11-30 | 1989-11-30 | Table processing method |
Publications (2)
Publication Number | Publication Date |
---|---|
JPH03172984A true JPH03172984A (en) | 1991-07-26 |
JP2851089B2 JP2851089B2 (en) | 1999-01-27 |
Family
ID=18054260
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP1314519A Expired - Fee Related JP2851089B2 (en) | 1989-11-30 | 1989-11-30 | Table processing method |
Country Status (1)
Country | Link |
---|---|
JP (1) | JP2851089B2 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5848186A (en) * | 1995-08-11 | 1998-12-08 | Canon Kabushiki Kaisha | Feature extraction system for identifying text within a table image |
US5898795A (en) * | 1995-12-08 | 1999-04-27 | Ricoh Company, Ltd. | Character recognition method using a method for deleting ruled lines |
US7660014B2 (en) | 2006-01-17 | 2010-02-09 | Konica Minolta Business Technologies, Inc. | Image processing apparatus capable of extracting rule from document image with high precision |
US8208744B2 (en) | 2006-01-23 | 2012-06-26 | Konica Minolta Business Technologies, Inc. | Image processing apparatus capable of accurately and quickly determining character part included in image |
-
1989
- 1989-11-30 JP JP1314519A patent/JP2851089B2/en not_active Expired - Fee Related
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5848186A (en) * | 1995-08-11 | 1998-12-08 | Canon Kabushiki Kaisha | Feature extraction system for identifying text within a table image |
US5898795A (en) * | 1995-12-08 | 1999-04-27 | Ricoh Company, Ltd. | Character recognition method using a method for deleting ruled lines |
US7660014B2 (en) | 2006-01-17 | 2010-02-09 | Konica Minolta Business Technologies, Inc. | Image processing apparatus capable of extracting rule from document image with high precision |
US8208744B2 (en) | 2006-01-23 | 2012-06-26 | Konica Minolta Business Technologies, Inc. | Image processing apparatus capable of accurately and quickly determining character part included in image |
Also Published As
Publication number | Publication date |
---|---|
JP2851089B2 (en) | 1999-01-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP2940936B2 (en) | Tablespace identification method | |
US5075895A (en) | Method and apparatus for recognizing table area formed in binary image of document | |
JP4189506B2 (en) | Apparatus, method and recording medium for image processing | |
JPH03122773A (en) | Image forming device | |
JP4077094B2 (en) | Color document image recognition device | |
JPH03172984A (en) | Table processing method | |
JPH0991453A (en) | Image processing method and its unit | |
JP2851087B2 (en) | Table processing method | |
JP2890306B2 (en) | Table space separation apparatus and table space separation method | |
JP2800192B2 (en) | High-speed character / graphic separation device | |
JP4040231B2 (en) | Character extraction method and apparatus, and storage medium | |
JP3140079B2 (en) | Ruled line recognition method and table processing method | |
JPH0468481A (en) | Character segmenting device | |
JPS61190679A (en) | Character data processing device | |
JPH0728934A (en) | Document image processor | |
JP2887803B2 (en) | Document image processing device | |
JPH09161007A (en) | Method for recognizing character in table area | |
JPH04106670A (en) | Document picture processor | |
JPH05128305A (en) | Area dividing method | |
JPH0728933A (en) | Character recognition device | |
JP3566738B2 (en) | Shaded area processing method and shaded area processing apparatus | |
JPH05174178A (en) | Character recognizing method | |
JPH02253386A (en) | Character recognizing device | |
JPH04167194A (en) | Table processing system | |
JPH0266681A (en) | Drawing processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20071113 Year of fee payment: 9 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20081113 Year of fee payment: 10 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20081113 Year of fee payment: 10 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20091113 Year of fee payment: 11 |
|
LAPS | Cancellation because of no payment of annual fees |