JPH03172984A

JPH03172984A - Table processing method

Info

Publication number: JPH03172984A
Application number: JP1314519A
Authority: JP
Inventors: Goro Bessho; 吾朗別所
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1989-11-30
Filing date: 1989-11-30
Publication date: 1991-07-26
Anticipated expiration: 2014-01-27
Also published as: JP2851089B2

Abstract

PURPOSE:To accurately segment a character in a frame even when an image is inclined by removing a ruled line for a remarked frame or a black-connected circumscribed rectangle contacting with an adjacent frame in the frame. CONSTITUTION:A main scanning direction segment extracting part 15 traces black elements connected in the main scanning direction in a table area image, extracts a rectangle surrounding a ruled line in the main scanning direction and stores the start point coordinates (Xs, Ys) and end point coordinates (Xe, Ye) of the extracted rectangle in a main scanning direction segment coordinate memory 16. A subscanning direction segment extracting part 17 similarly stores the start and end point coordinates in the subscanning direction in a subscanning direction segment coordinate memory 18. A black-connected circumscribed rectangle extracting part 23 extracts a black-connected circumscribed rectangle from an intra-frame image and stores the coordinates of the extracted contents in a circumscribed rectangle memory 24. A line segmenting part 25 regards the black-connected circumscribed rectangle in contact with the frame as a part of a ruled line constituting the frame and removes it. A character segmenting/recognizing part 27 segments a character from a character line image and recognizes the segmented character.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は１文字認識装置等において１文書や帳票等の画
像中の表の処理方法に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a method for processing a table in an image of a single document, form, etc. in a single character recognition device or the like.

（従来の技術〕文字認識装置において文書画像を処理する場合。(Conventional technology) When processing document images in a character recognition device.

画像を文字領域、写真や図等のイメージ領域１表領域等
に分割し、それぞれ別の処理を行うことが多い。An image is often divided into a text area, an image area such as a photograph or diagram, one table area, etc., and different processing is performed on each area.

表領域に関しては、罫線の位ＩＲ座標を用いて。For table areas, use IR coordinates for ruled lines.

表中の各枠を認識し、各枠内の画像より文字を切り出し
て認識する方法がとられている。The method used is to recognize each frame in a table, cut out characters from the image within each frame, and recognize them.

このような表の処理において、従来は、枠を構成する主
走査方向及び副走査方向の罫線の矩形を抽出し、この矩
形の内側（枠内より見て）の座標を用いて枠を定義して
いる。Conventionally, in processing such tables, the rectangles of the ruled lines in the main scanning direction and sub-scanning direction that make up the frame are extracted, and the coordinates inside this rectangle (viewed from inside the frame) are used to define the frame. ing.

[Problem to be solved by the invention]

しかし、文書画像が傾いて入力された場合に枠内の文字
を正しく切り出すことができなくなるという問題があっ
た。However, there is a problem in that when a document image is input at an angle, the characters within the frame cannot be correctly extracted.

例えば第４図に示すような画像の場合について説明する
と、実線は罫線（線分）であるが、これは破線のような
矩形として抽出される。第５図は主走査方向（Ｘ方向）
の罫線の矩形の説明図であり、５１が入力された実際の
罫線、５２はその矩形である。そして、罫線に囲まれた
枠を認識するが、その際に従来は罫線の矩形の内側の座
標を用いる０例えば枠の上辺を構成する主走査方向の矩
形の場合、第５図に示す座標Ｙｅにより枠の上辺のＹ座
標を定義する。したがって、第４図に示す傾いた画像の
場合、ハツチングを施した領域が枠の領域として抽出さ
れることになり、枠が実際よりも狭くなってしまう。そ
の結果、枠内の罫線に接近した文字が枠からはみ出し、
正常に切り出すことができなくなる場合がある。For example, in the case of an image as shown in FIG. 4, solid lines are ruled lines (line segments), but they are extracted as rectangles like broken lines. Figure 5 shows the main scanning direction (X direction)
is an explanatory diagram of the rectangle of the ruled line, 51 is the actual inputted ruled line, and 52 is its rectangle. Then, when recognizing a frame surrounded by ruled lines, conventionally the coordinates inside the rectangle of the ruled lines are used. For example, in the case of a rectangle in the main scanning direction that forms the upper side of the frame, the coordinates Ye Define the Y coordinate of the top side of the frame. Therefore, in the case of the tilted image shown in FIG. 4, the hatched area will be extracted as the frame area, making the frame narrower than it actually is. As a result, characters that are close to the ruled lines within the frame may protrude from the frame.
It may not be possible to cut it out properly.

本発明の目的は、文書画像が傾いて人力された場合にお
いても、表中の文字を正しく切り出し認識することが可
能な表処理方法を提供することである。An object of the present invention is to provide a table processing method that allows characters in a table to be correctly extracted and recognized even when a document image is tilted manually.

[Means to solve the problem]

本発明の表処理方法は、表領域において主走査方向及び
層走査方向の罫線を包含する矩形を抽出し、罫線によっ
て囲まれる枠を罫線の矩形の外側（枠内より見て）の座
標を用いて認識し、枠内の黒連結の外接矩形を求め、枠
に接している外接矩形を除去し、残った外接矩形を用い
て枠内の文字を切り出すことを特徴とするものである。The table processing method of the present invention extracts a rectangle that includes ruled lines in the main scanning direction and layer scanning direction in a table area, and creates a frame surrounded by the ruled lines using coordinates outside the ruled line rectangle (viewed from inside the frame). This method is characterized by recognizing the black connected circumscribing rectangles within the frame, removing the circumscribing rectangles touching the frame, and using the remaining circumscribing rectangles to cut out the characters within the frame.

〔作　用〕罫線の矩形の外側の座標を用いて枠を認識するため、文
書が傾いている場合においても、認識される枠の幅が実
際より極端に狭くなることがなくなるので１文字欠けを
防止できる。[Operation] Since the frame is recognized using the coordinates outside the ruled rectangle, even if the document is tilted, the width of the recognized frame will not be extremely narrower than the actual width, so there will be no missing characters. It can be prevented.

例えば第４図に示した画像と同じ傾いた表の画像の場合
、本発明によれば、認識される枠は第３図に示すように
広くなる。For example, in the case of an image of a tilted table similar to the image shown in FIG. 4, according to the present invention, the recognized frame becomes wider as shown in FIG.

他方、罫線の矩形の外側の座標を用いるため。On the other hand, to use coordinates outside the ruled rectangle.

認識された枠の内部に、枠を構成する罫線の一部が含ま
れる。しかし、枠内の黒連結の外接矩形の中で枠に接触
したものを排除することにより、そのような罫線の線分
部分を除去できる。A portion of the ruled lines forming the frame is included inside the recognized frame. However, such line segment portions of ruled lines can be removed by excluding those that touch the frame among the black connected circumscribed rectangles within the frame.

したがって、傾いた文書画像の場合においても。Therefore, even in the case of a tilted document image.

表中の文字を正しく切り出して認識することが可能とな
る。It becomes possible to correctly extract and recognize the characters in the table.

〔Example〕

第１図は本発明の一実施例を示すブロック図、第２図は
処理のフローチャートである。FIG. 1 is a block diagram showing an embodiment of the present invention, and FIG. 2 is a flowchart of processing.

スキャナ等の２値画像入力部１１によって文書を読取り
、その２値の画像を２値イメージメモリ１２に格納する
（処理ステップ３１）、この文書画像に対して、表領域
認識部１３は表領域を認識し、そのイメージを表領域イ
メージメモリ１４に格納する（処理ステップ３２）、こ
の表領域認識は、ランレングス分布等を利用して自動的
に認識する方法と、マウス等を用いて外部より領域を指
定することによってＨ！＆ｉする方法とがある。The document is read by the binary image input unit 11 such as a scanner, and the binary image is stored in the binary image memory 12 (processing step 31). The image is recognized and stored in the table area image memory 14 (processing step 32).This table area recognition can be performed by automatically recognizing the area using run length distribution or by using a mouse or the like from outside. By specifying H! There is a way to do &i.

主走査方向線分抽出部１５において、表領域イメージに
対して、主走査方向に連結した黒画素を追跡して主走査
方向の罫線を囲む矩形を抽出し、その始点座標（Ｘｓ、
Ｙｓ）及び終点座標（Ｘｅ。The main scanning direction line segment extraction unit 15 traces the black pixels connected in the main scanning direction with respect to the table area image, extracts a rectangle surrounding the ruled line in the main scanning direction, and extracts a rectangle surrounding the ruled line in the main scanning direction, and its starting point coordinates (Xs,
Ys) and end point coordinates (Xe.

Ｙｅ）を主走査方向線分座標メモリ１６に格納する（第
５図参照、処理ステップ３３）。Ye) is stored in the main scanning direction line segment coordinate memory 16 (see FIG. 5, processing step 33).

副走査方向線分抽出部１７において、表領域イメージに
対し、同様に副走査方向の罫線の矩形を抽出し、その始
終点座標を副走査方向線分座標メモリ１８に格納する（
処理ステップ３４）。The sub-scanning direction line segment extraction unit 17 similarly extracts the rectangles of the ruled lines in the sub-scanning direction from the table area image, and stores the coordinates of the start and end points in the sub-scanning direction line segment coordinate memory 18 (
Process step 34).

次に枠認識部１９において、主走査方向罫線と副走査方
向罫線の組合せにより枠の認識を行い、この枠の例えば
対角頂点の座標を枠座標メモリ２０に格納する（ステッ
プ３５）。この際、前、述の如く、枠を構成する罫線の
矩形の外側の座標（上辺の罫線では上側、下辺の罫線で
は下側、左辺の罫線では左側、右辺の罫線では下側の座
標）を用いて、枠の対角頂点の座標を求める。このよう
にして、第３図にハツチング領域として示したような枠
の領域が認識される。Next, the frame recognition unit 19 recognizes a frame based on a combination of the ruled lines in the main scanning direction and the ruled lines in the sub-scanning direction, and stores the coordinates of, for example, the diagonal vertices of this frame in the frame coordinate memory 20 (step 35). At this time, as mentioned above, the coordinates of the outside of the rectangle of the ruled lines that make up the frame (the coordinates of the top side for the top ruled line, the bottom side for the bottom ruled line, the left side for the left side ruled line, and the bottom coordinate for the right side ruled line) to find the coordinates of the diagonal vertices of the frame. In this way, the frame area shown as the hatched area in FIG. 3 is recognized.

次に枠領域抽出部２１において、枠座標に従って表領域
イメージより枠内のイメージを切り出し、それを枠内イ
メージメモリ２２に格納する（処理ステップ３６）。Next, the frame area extraction unit 21 cuts out an image within the frame from the table area image according to the frame coordinates, and stores it in the frame image memory 22 (processing step 36).

黒連結外接矩形抽出部２３において、枠内イメージより
黒連結の外接矩形を抽出し、その座標を外接矩形メモリ
２４に格納する（処理ステップ３７）。The black connected circumscribed rectangle extraction unit 23 extracts a black connected circumscribed rectangle from the frame image and stores its coordinates in the circumscribed rectangle memory 24 (processing step 37).

行切出し部２５において、抽出された黒連結の外接矩形
の座標と枠の座標との比較により、外接矩形と枠との接
触を調べ、枠に接した黒連結外接矩形を枠を構成する罫
線の一部であるとみなし除去する（処理ステップ３８）
、そして、残った枠内の黒連結外接矩形について、文字
サイズの推定、統合を行って枠内の文字行（文字素を構
成する外接矩形の統合矩形）を生成し、また、その必要
な修正または削除を行い、最終的な文字行のイメージを
枠内イメージメモリ２２より切り出して行イメージメモ
リ２６に格納する（処理ステップ３９゜４０．４１，４
２）　。The line cutting unit 25 checks the contact between the circumscribed rectangle and the frame by comparing the coordinates of the extracted black-connected circumscribed rectangle with the coordinates of the frame, and extracts the black-connected circumscribed rectangle that is in contact with the frame with the border of the ruled line constituting the frame. It is considered to be a part and removed (processing step 38).
Then, for the remaining black connected circumscribed rectangles in the frame, the character size is estimated and integrated to generate a character line within the frame (integrated rectangle of the circumscribed rectangles that make up the grapheme), and any necessary corrections are made. Alternatively, the image of the final character line is cut out from the frame image memory 22 and stored in the line image memory 26 (processing steps 39, 40, 41, 4).
2).

次に文字切出し・認識部２７において１文字行イメージ
より文字を切出して認識する（処理ステップ４４）。Next, the character cutout/recognition unit 27 cuts out and recognizes characters from the one character line image (processing step 44).

〔Effect of the invention〕

以上の説明から明らかな如く、本発明によれば。 As is clear from the above description, according to the present invention.

文書画像が傾いて人力された場合においても、認識され
る枠が不適当なほど実際より狭くなることがないため１
文字画像の欠落を防止することができ、また認識した枠
内の黒連結により矩形のうちの枠に接したものを除去す
ることによる、枠を構成する罫線を文字切出しに先立っ
て除去するンめ。Even if the document image is tilted manually, the recognized frame will not become inappropriately narrower than it actually is.
It is possible to prevent missing character images, and it is also possible to remove the ruled lines that make up the frame before cutting out characters by removing the rectangles that touch the frame by connecting black in the recognized frame. .

文字の切出し及び認識を正確に行うことができる。It is possible to accurately cut out and recognize characters.

[Brief explanation of the drawing]

第１図は本発明の一実施例を示すブロック図。第２図は処理のフローチャート、第３図は枠認識の説明
図、第４図は従来方法による枠認識の問題点の説明図、
第５図は罫線の矩形の説明図である。１１・・・２値画像入力部、　１３・・・表領域認識部
。１５・・・主走査方向線分抽出部。１７・・・副走査方向線分抽出部。１９・・・枠認識部、　２１・・・枠領域抽出部。２３・・・黒連結外接矩形抽出部。２５・・・行切出し部、２７・・・文字切出し・認識部。第２篇３図富で鐵しｎ某巨形＃１職し旧肩１丹ロ＝＝＝コ粋領成・第５図 →−×FIG. 1 is a block diagram showing one embodiment of the present invention. Fig. 2 is a flowchart of the process, Fig. 3 is an explanatory diagram of frame recognition, Fig. 4 is an explanatory diagram of problems in frame recognition by the conventional method,
FIG. 5 is an explanatory diagram of a ruled line rectangle. 11... Binary image input unit, 13... Table area recognition unit. 15...Main scanning direction line segment extraction unit. 17... Sub-scanning direction line segment extraction unit. 19... Frame recognition unit, 21... Frame area extraction unit. 23...Black connected circumscribed rectangle extraction part. 25...Line cutting section, 27...Character cutting/recognition section. Part 2 Figure 3 Ironworks with wealth n a certain giant #1 job old shoulder 1 tanro ===Ko style territory・Figure 5→-×

Claims

[Claims]

(1) In the table area, extract the ruled rectangles in the main scanning direction and the sub-scanning direction, recognize the frame surrounded by the ruled lines using the coordinates outside the ruled line rectangle, and calculate the black connected circumscribed rectangle within the frame. A table processing method characterized by removing a circumscribed rectangle touching the frame, and using the remaining circumscribed rectangle to cut out characters within the frame.