JPH01166184A

JPH01166184A - Document processor

Info

Publication number: JPH01166184A
Application number: JP62326231A
Authority: JP
Inventors: Katsumi Hosokawa; 勝美細川
Original assignee: Fuji Electric Co Ltd
Current assignee: Fuji Electric Co Ltd
Priority date: 1987-12-22
Filing date: 1987-12-22
Publication date: 1989-06-30
Anticipated expiration: 2012-04-23
Also published as: JP2602259B2

Abstract

PURPOSE:To improve the recognition efficiency of a character at a surface part by recognizing the character included within an area divided by a ruled line as one belonging to the surface part. CONSTITUTION:A line segment data memory part 4 stores each data concerning with the start point and the end point of each line segment in document information inputted through an image input part 1. A ruled line recognition part 15 recognizes each line segment which continues in an identical direction as one ruled line based on the stored data of the memory part 4 and simultaneously stores each data concerning with the start point and the end point of each ruled line in a ruled line data memory part 5. Then, a character within a ruled line output part 16 sends the character part included within the area divided by the ruled line to a character recognition part 17 based on the stored data of the memory part 5 and a video signal from the image input part 1 and makes the character recognition part 17 to recognize the character.

Description

[Detailed description of the invention] [Industrial application field]

この発明は、文章部分２表部分および図形部分に大別さ
れる文書情報から前記表部分を特定し、この表部分を構
成する罫線を認識するとともに、この罫線内の文字を文
字認識部によって認識させる文書処理装置に関する。This invention specifies the table part from document information that is roughly divided into text parts, table parts, and graphic parts, recognizes the ruled lines that make up this table part, and recognizes the characters within the ruled lines by a character recognition unit. The present invention relates to a document processing device.

[Conventional technology]

従来、文書情報の認識処理は、その内の文章部分が主な
対象であり、まず文字の行または列に大分けし、ついで
個々の文字を分離（切り分け）した後に、光学的読取り
方式に基づいて文字認識するという手順でなされた。Conventionally, the main target of the recognition process for document information is the text within the document, which is first divided into rows or columns of characters, then separated (cut) into individual characters, and then processed based on an optical reading method. This was done using the procedure of character recognition.

[Problems to be solved by the invention]

一般に文書情報は、文章部分２表部分および図形部分に
大別され、さらに表部分は、罫線とこの内にある文字部
（正確には、文字、数字、記号など）とからなる。そし
て、文章部分は、前記のように主として光学的読取り方
式に基づく文字認識手段によって、まだ図形部分も、類
似の方式に基づく図形認識手段によって、現在それぞれ
ある水準の認識率で認識可能である。ところが、裏部分に属する文字部は、罫線内に記入され
る特殊条件から、■行間隔や文字間隔が異なるためにそ
の文字認識がし難い、■たとえ認識できたとしても時間
が多くかかる、■罫線で区切られた別々の文が、一つの
つながった文として誤認識されるおそれがある、■縦書
き、横書きが混同して誤認識のおそれがある。例えば、第１０図（ａ）に示した裏部分において、罫線
の、中央左側の領域内に１行の文字「ａｂｃ」があり、
中央の領域内に２行の文字ｒｄｅｆｇｊとｒｈｉｊｋ」
とがある。同図（ｂ）のように罫線を除去した後、横方
向に文字を調べていったとき、１行の部分が２行の部分
に分岐するから判断に迷うことになる。また、第１１図
（ａ）のように裏部分において、「ｑｒｓｔ」とｒｕｖ
ｗｘｙｚ」とが隣接した罫線内領域に記入されていると
き、同図（ｂ）のように罫線を除去すると、あたかも１
行の連続した文字群ｒｑｒＳｔｕＶＷＸ）’ＺＪである
がのように誤認されるおそれが生じる。すなわち、従来の技術では、個々の文字の認識は、かな
りの水準で正しくなされるが、裏部分に含まれる文字の
認識としては不十分であり、認識率が低く、かつ認識時
間も多くかかる、という問題がある。この発明の目的は、従来の技術がもつ以上の問題点を解
消し、文書情報の内の裏部分を正確に、かつ迅速に認識
することのできる文書処理装置を提供することにある。In general, document information is roughly divided into a text part, a table part, and a graphic part, and the table part consists of ruled lines and character parts (more precisely, letters, numbers, symbols, etc.) within the ruled lines. Text portions can be recognized by character recognition means mainly based on the optical reading method as described above, and graphic portions can still be recognized at a certain level of recognition rate by means of figure recognition means based on a similar method. However, due to the special conditions that the characters belonging to the back part are written within the ruled lines, ■ it is difficult to recognize the characters because the line spacing and character spacing is different, ■ it takes a lot of time even if it is possible to recognize them, ■ Separate sentences separated by ruled lines may be mistakenly recognized as one continuous sentence. - Vertical and horizontal writing may be confused and misrecognized. For example, in the back part shown in FIG. 10(a), there is one line of characters "abc" in the area to the left of the center of the ruled line,
Two lines of letters rdefgj and rhijk in the central area.
There is. After removing the ruled lines as shown in FIG. 2B, when examining the characters horizontally, the one-line portion branches into two-line portions, making it difficult to make a decision. Also, as shown in Figure 11(a), on the back side, "qrst" and ruv
wxyz" is written in the adjacent ruled line area, if you remove the ruled line as shown in FIG.
There is a risk that the character group rqrStuVWX)'ZJ may be mistakenly recognized as a continuous line of characters. In other words, with the conventional technology, individual characters can be recognized accurately to a fairly high level, but recognition of characters included in the back part is insufficient, the recognition rate is low, and the recognition time is long. There is a problem. SUMMARY OF THE INVENTION It is an object of the present invention to provide a document processing device that can solve the above-mentioned problems of the prior art and can accurately and quickly recognize the hidden parts of document information.

[Means to solve the problem]

前記の目的を達成するために、本発明に係る文書処理装
置は、文章部分９表部分および図形部分に大別される文書情報
から前記裏部分を特定し、この裏部分を構成する罫線を
認識するとともに、この罫線内の文字部を文字認識部に
よって認識させる装置であって、画像入力部を介して入力される前記文書情報の内の各線
分の始点、終点に関する各データを格納する線分データ
記憶部と；この線分データ記憶部の格納データに基づき前記各線分
が同一方向に連続するものを一つの罫線と認識する罫線
認識部と；この罫線認識部によって認識された前記各罫線の始点、
終点に関する各データを格納する罫線データ記憶部と；この罫線データ記憶部の格納データと、前記画像入力部
からの映像信号とに基づき、前記各罫線により区画され
た領域内に含まれる前記文字部を前記文字認識部に送出
する罫線内文字出力部と；を備えてなる。In order to achieve the above object, the document processing device according to the present invention specifies the back part from document information that is roughly divided into a text part, a front part, and a graphic part, and recognizes the ruled lines that make up this back part. At the same time, the device recognizes the character portion within the ruled line by a character recognition unit, the line segment storing each data regarding the start point and end point of each line segment in the document information input via the image input unit. a data storage unit; a ruled line recognition unit that recognizes each line segment continued in the same direction as one ruled line based on the data stored in the line segment data storage unit; starting point,
a ruled line data storage unit that stores each data regarding the end point; and the character portion included in the area partitioned by each of the ruled lines based on the data stored in the ruled line data storage unit and the video signal from the image input unit. and a ruled line character output section for sending the character to the character recognition section.

[For use]

線分データ記憶部が、画像入力部を介して入力された文
書情報の内の各線分の始点５終点に関する各データを格
納する。罫線認識部が、線分データ記憶部の格納データ
に基づき各線分が同一方向に連続するものを−っの罫線
と認識するとともに、この各罫線の始点、終点に関する
各データを罫線データ記憶部に格納する。罫線内文字出
力部が、罫線データ記憶部の格納データと、画像入力部
からの映像信号とに基づき、罫線によって区画された領
域内に含まれる文字部を文字認識部に送出し、この文字
認識部によって文字認識がおこなわれる。The line segment data storage unit stores each data regarding the starting point and ending point of each line segment in the document information input through the image input unit. Based on the data stored in the line segment data storage unit, the ruled line recognition unit recognizes that each line segment continues in the same direction as a ruled line, and also stores each data regarding the starting point and end point of each ruled line in the ruled line data storage unit. Store. Based on the data stored in the ruled line data storage section and the video signal from the image input section, the ruled line character output section sends the character section included in the area divided by the ruled lines to the character recognition section, and performs character recognition. Character recognition is performed by the section.

【Example】

本発明に係る実施例を示す文書処理装置について、以下
に図面を参照しながら説明する。第１図はこの文書処理装置の構成を示すブロック図で、
この文書処理装置２０は大別すると、画像入力部１、各
種メモリ、および各種処理部から構成される。各種メモリは、画像入力部Ｉからの文書映像信号のデー
タを格納する原イメージメモリ２、このデータを後述す
る細線化処理部１３によって細線化処理したものを格納
する細線化イメージメモリ３、罫線を構成する個々の線
分データを格納する線分データメモリ４、この線分デー
タで構成された罫線のデータを格納する罫線データメモ
リ５、および文書情報としての図形部のデータを格納す
る図形メモリ６である。なお、図形メモリ６は、この発
明に係る文書処理装置２０では特には取り扱われない。各種処理部は、先程の細線化処理部１３、細線化イメー
ジメモリ３のデータに基づき罫線を構成する各線分の始
点、終点の座標を決定する線分座標決定部１４、線分デ
ータメモリ４のデータに基づき各罫線を認識するととも
にその始点、終点の座標を決定する罫線座標決定部１５
、罫線内に含まれる文字を特定しこれを出力する罫線内
文字出力部１６、およびこの文字出力を受けこれを認識
する文字認識部１７である。この文書処理装置２０の動作について、基本的に第２図
のフローチャートを、補助的に第１図の構成図その他を
、それぞれ参照しながら以下に説明する。第２図において、ステップＳ１で、画像入力部１（第１
図参照）からの文書情報の映像信号を原イメージメモリ
２に記憶、格納させる。ステップＳ２の「細線化処理」
は、原イメージメモリ２に格納されている映像信号デー
タに基づき細線化処理部１３によってなされ、画像を所
定幅の細線に変換する一種の画像整形処理である。その
結果は細線化イメージメモリ３に格納される。ステップＳ３で「線分座標の決定」がなされるが、この
「線分座標の決定」は、細線化イメージメモリ３の格納
データに基づき線分座標決定部１４によってなされ、罫
線を構成する各線分の始点。終点の各座標を決める処理である。ここで、線分とは、
自由端部または交点で区分される直線部分であり、その
始点、終点は自由端部または交点に相当する。線分座標の決定について、さらに第３図、第４図を参照
しながら具体的に説明する。なお、第３図は文書情報に
おける一表部分の例示図、第４図はこの表部分の罫線を
構成する各線分を示す図である。罫線内のアルファベッ
トは記入文字を示す。第４図のように、罫線は、その各交点間の線分の集合と
して構成され、その各線分は各交点に付けられた番号■
ないし■によって表される。たとえば、線分Ｌ１は■−
■、線分Ｌ２は■−■、以下同様にして線分Ｌ２１は０
−■、である。各線分の始点、終点の座標は、■ないし
［相］の付けられた交点の座標に相当し、この各交点の
座標は、周知の画像処理技術によって容易に求めること
ができるから、各線分の始点、終点の各座標が決定する
ステップＳ４で「罫線座標の決定」がなされるが、これ
は、線分データメモリ４からのデータに基づき罫線認識
部１５によってなされ、表部分を構成する各罫線の始点
、終点の各座標を決める処理である。罫線は、同一方向
をもつ複数個の線分の集合として定義され、例えば表部
分を構成する罫線を示す第５図において、Ｋ１ないしに
７で示される。第５図において、各交点に付けた番号■
ないし■は、第４図におけるのと同じである。罫線をそ
の始点、終点の各番号で表すと、各罫線に１ないしに７
は、第６図に示す対応図のようになる。この第６図の対応図を求めることが、罫線座標決定に相
当する。ところで、表部分の罫線には、既に説明した第３図のよ
うに、外側が閉じた枠状をなし、内部に縦、横の各罫線
が配置される形態の外に、第７図（ａ）、同図う）、同
図（Ｃ）のような各種の形態のものがある。すなわち、
第３図の罫線と比べて、第７図（ａ）では左右両側の各
縦罫線がなく、第７図（ｂ）では内部の各縦罫線がなく
、第７図（Ｃ）では各縦罫線がまったくない。ステップＳ５で「罫線による区画領域の決定」がなされ
るが、これは、２個ないし４個の罫線によって区画され
る領域を決める処理であり、罫線認識部１５によって前
記の「罫線座標の決定」に関連しておこなわれる。次に
第８図、第９図を参照しながら説明する。なお、第８図
は罫線によって区画された領域を示す図で、同図（ａ）
は罫線の全体図、同図（ｂ）は罫線による最大領域の図
、同図（Ｃ）は同じくその中間領域の図、同図（ｄ）は
同じくその最終領域の図である。まず、第８図働）に示すように、外側の枠を構成する４
個の罫線Ｋｌ、に２．に７．に８によって領域Ｍｏが決
められる。次に、第８図（Ｃ）に示すように：前記の罫
線Ｋｌ、に２．に７．に８とともに、２個の横罫線に３
．に５によって、領域ＭＯが細分される形で領域Ｍ１．
Ｍ２．Ｍ３が決定される。次に、第８図（ｄ）に示すよ
うに、２個の縦罫線に４．に６によって細分され、新た
な領域が決定されるが、まず罫線に４による段階と、次
の罫線に６による段階との２段階をとる。すなわち、罫
線に４により、領域Ｍ１が領域Ｍ１１．　Ｍ１２に、領
域Ｍ２が領域Ｍ２１゜Ｍｇ２（破線枠）に、領域Ｍ３が
領域Ｍ３１．　Ｍ３２（破線枠）に、それぞれ細分され
る。次に罫線６によって、領域Ｍ２２が領域Ｍ２３．　
Ｍ２４に、領域Ｍ３２が領域Ｍ３３．　Ｍ３４に、それ
ぞれ細分される。第９図は罫線によって区画された各領域の階層構造を示
す図で、前記の各領域を、決定される順序にしたがって
配置した図である。すなわち、丸印に付けた符号が前記
の領域符号で、例えば領域Ｍｏが領域Ｍｌ、Ｍ２．Ｍ３
に細分化され、ついで領域Ｍ１が領域Ｍｌｌ、　Ｍ１２
に細分化されることを表し、以下同様である。さて第２図に戻り、ステップＳ６で、罫線で区画された
領域内の文字だけが、罫線データメモリ５からのデータ
と、原イメージメモリ２からのデータとに基づき罫線内
文字出力部１６によって、文字認識部１７へ送出され、
ここで文字認識がおこなわれる（ステップＳ７）。なお、この発明では直接関係ないが、原イメージメモリ
２のデータと、罫線認識部１５とに基づいて罫線内の図
形情報のデータが図形メモリ６に格納される。そして、
この図形メモリ６のデータに基づき図示してない罫線内
図形出力部と、図形認識部とによって罫線内の図形が認
識される。この処理が、ステップＳ８の「罫線内図形の
出力」とステップＳ９の「図形認識」である。A document processing device showing an embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing the configuration of this document processing device.
This document processing device 20 is roughly divided into an image input section 1, various memories, and various processing sections. The various memories include an original image memory 2 that stores the document video signal data from the image input section I, a thinning image memory 3 that stores the thinned data processed by a thinning processing section 13 (to be described later), and a thinning image memory 3 that stores ruled lines. A line segment data memory 4 that stores constituent line segment data, a ruled line data memory 5 that stores ruled line data composed of the line segment data, and a graphic memory 6 that stores graphic part data as document information. It is. Note that the graphic memory 6 is not particularly handled in the document processing device 20 according to the present invention. The various processing units include the line thinning processing unit 13 mentioned above, the line segment coordinate determining unit 14 that determines the coordinates of the starting point and ending point of each line segment forming a ruled line based on the data in the line thinning image memory 3, and the line segment coordinate determining unit 14 in the line segment data memory 4. A ruled line coordinate determining unit 15 that recognizes each ruled line based on data and determines the coordinates of its starting point and ending point.
, a ruled line character output section 16 that specifies and outputs the characters included in the ruled lines, and a character recognition section 17 that receives and recognizes the character output. The operation of this document processing device 20 will be described below, with reference basically to the flowchart shown in FIG. 2, and supplementary to the block diagram shown in FIG. 1, and the like. In FIG. 2, in step S1, the image input section 1 (first
The video signal of the document information from (see figure) is memorized and stored in the original image memory 2. "Thinning process" in step S2
This is a type of image shaping process that is performed by the thinning processing section 13 based on the video signal data stored in the original image memory 2, and converts the image into thin lines of a predetermined width. The result is stored in the thinning image memory 3. "Determination of line segment coordinates" is performed in step S3. This "determination of line segment coordinates" is performed by the line segment coordinate determination unit 14 based on the data stored in the thinning image memory 3, and each line segment constituting the ruled line is starting point. This is the process of determining each coordinate of the end point. Here, the line segment is
It is a straight line section divided by free ends or intersections, and its start and end points correspond to the free ends or intersections. Determination of line segment coordinates will be specifically explained with further reference to FIGS. 3 and 4. Note that FIG. 3 is an illustrative diagram of a table portion in document information, and FIG. 4 is a diagram showing each line segment forming the ruled lines of this table portion. Alphabets within the ruled lines indicate characters to be written. As shown in Figure 4, ruled lines are constructed as a collection of line segments between their intersections, and each line segment has a number assigned to each intersection.
It is represented by or ■. For example, line segment L1 is ■-
■, line segment L2 is ■-■, and so on, line segment L21 is 0
−■, is. The coordinates of the starting point and end point of each line segment correspond to the coordinates of the intersection marked with ■ or [phase], and the coordinates of each intersection point can be easily obtained using well-known image processing technology, so each line segment "Determination of ruled line coordinates" is performed in step S4 in which the coordinates of the start point and end point are determined. This is done by the ruled line recognition unit 15 based on the data from the line segment data memory 4, and each ruled line forming the table part This process determines the coordinates of the starting point and ending point. A ruled line is defined as a set of a plurality of line segments having the same direction, and is indicated by K1 to K7 in FIG. 5, which shows ruled lines constituting a table portion, for example. In Figure 5, the number given to each intersection point is
to ■ are the same as in FIG. If a ruled line is represented by numbers for its starting point and ending point, each ruled line has a number from 1 to 7.
is as shown in the correspondence diagram shown in FIG. Obtaining the corresponding diagram in FIG. 6 corresponds to determining the ruled line coordinates. By the way, the ruled lines of the table part have a frame shape with the outside closed and vertical and horizontal ruled lines arranged inside, as shown in FIG. 3 already explained. There are various forms such as ), (a) in the same figure, and (c) in the same figure. That is,
Compared to the ruled lines in Figure 3, in Figure 7 (a) there are no vertical ruled lines on both the left and right sides, in Figure 7 (b) there are no internal vertical ruled lines, and in Figure 7 (C) there are no vertical ruled lines on both sides. There is no such thing. In step S5, "determination of divided area by ruled lines" is performed, which is a process of determining an area divided by two to four ruled lines, and the ruled line recognition unit 15 performs the "determination of ruled line coordinates". It is carried out in connection with. Next, explanation will be given with reference to FIGS. 8 and 9. In addition, FIG. 8 is a diagram showing an area divided by ruled lines, and FIG.
10 is an overall view of the ruled lines, (b) is a view of the maximum area by the ruled lines, (c) is a view of the intermediate area, and (d) is a view of the final area. First, as shown in Figure 8, the four parts that make up the outer frame are
2. 7. The area Mo is determined by 8. Next, as shown in FIG. 8(C): 2. 7. with 8 on , and 3 on two horizontal lines
．． 5, the area MO is subdivided into areas M1.
M2. M3 is determined. Next, as shown in FIG. 8(d), 4. The area is subdivided by 6 to determine a new area, but two steps are taken: first by using 4 for the ruled line, and then by using 6 for the next ruled line. That is, due to the ruled line 4, the area M1 is changed to the area M11. M12, region M2 is region M21°Mg2 (broken line frame), region M3 is region M31. Each is subdivided into M32 (dashed line frame). Next, by the ruled line 6, the area M22 is changed to the area M23.
M24, area M32 and area M33. Each is subdivided into M34. FIG. 9 is a diagram showing a hierarchical structure of each region divided by ruled lines, and is a diagram in which the above-mentioned regions are arranged in a determined order. That is, the code attached to the circle is the area code mentioned above, and for example, area Mo is area Ml, M2 . M3
Then, area M1 is subdivided into areas Mll, M12
The same applies hereafter. Now, returning to FIG. 2, in step S6, only the characters within the area demarcated by the ruled lines are outputted by the ruled line character output section 16 based on the data from the ruled line data memory 5 and the data from the original image memory 2. Sent to the character recognition unit 17,
Character recognition is performed here (step S7). Although not directly relevant to the present invention, data of graphic information within ruled lines is stored in the graphic memory 6 based on the data in the original image memory 2 and the ruled line recognition unit 15. and,
Based on the data in the graphic memory 6, a graphic within the ruled line is recognized by a graphic within the ruled line output section (not shown) and a graphic recognizing section. This processing is "output of figures within ruled lines" in step S8 and "figure recognition" in step S9.

【Effect of the invention】

以上説明したように、この発明においては、線分データ
記憶部が、画像入力部を介して入力された文書情報の内
の各線分の始点、終点に関する各データを格納する；罫
線認識部が、線分データ記憶部の格納データに基づき各
線分が同一方向に連続するものを一つの罫線と認識する
とともに、この各罫線の始点、終点に関する各データを
罫線データ記憶部に格納する；罫線内文字出力部が、罫
線データ記憶部の格納データと、画像入力部からの映像
信号とに基づき、罫線によって区画された領域内に含ま
れる文字部を文字認識部に送出し、この文字認識部によ
って文字認識がおこなわれる。したがって、この発明によれば、従来の技術に比べ次の
ようなすぐれた効果がある。（１）罫線は、この発明装置の罫線認識部によって認識
され、罫線によって区画された領域内に含まれる文字は
、表部分に属するものとして文字認識部によって、文章
部分と混同されることなく認識されるから、結果として
表部分の認識率の向上を図ることができる。（２）同系統の罫線に属する文字は、関連性あるものと
して、その認識処理速度を向上させることができるから
、全体的に表部分の認識に要する時間を短縮することが
できる。As explained above, in the present invention, the line segment data storage section stores data regarding the starting point and end point of each line segment in the document information input via the image input section; Based on the data stored in the line segment data storage unit, each line segment that continues in the same direction is recognized as one ruled line, and each data regarding the starting point and end point of each ruled line is stored in the ruled line data storage unit; Characters in the ruled line The output section sends the character part included in the area divided by the ruled lines to the character recognition section based on the data stored in the ruled line data storage section and the video signal from the image input section, and the character recognition section converts the characters. Recognition takes place. Therefore, the present invention has the following superior effects compared to the conventional technology. (1) The ruled lines are recognized by the ruled line recognition section of the device of the present invention, and the characters included in the area demarcated by the ruled lines are recognized by the character recognition section as belonging to the front section without being confused with the text section. As a result, the recognition rate of the front part can be improved. (2) Since characters belonging to the same type of ruled line are considered to be related, the recognition processing speed can be improved, so that the overall time required for recognizing the front part can be shortened.

[Brief explanation of the drawing]

第１図は本発明に係る実施例の構成を示すブロック図、第２図はこの実施例の動作を示すフローチャート、第３
図は文書情報における一表部分の例示図、第４図はこの
表部分の罫線を構成する各線分を示す図、第５図はこの表部分を構成する罫線を示す図、第６図は
各罫線とその始点、終点との対応を示す図、第７図（ａ）は別の罫線の図、同図（ｂ）はさらに別の
罫線の図、同図（Ｃ）はまたさらに別の罫線の図、第８
図は罫線によって区画された領域を示す図で、同図（ａ
）は罫線の全体図、同図（ｂ）は罫線によって区画され
た最大領域の図、同図（Ｃ）は同じくその中間領域の図
、同図（ｄ）は同じ（その最終領域の図、第９図は罫線
によって区画された領域の階層構造を示す図、第１０図は従来の罫線内文字の認識において誤りを生じ
るおそれのある一例の説明図で、同図（ａ）は罫線のあ
る場合、同図（ｂ）は罫線を除去した場合、第１１図は
同じくその別の例の説明図で、同図（ａ）は罫線のある
場合、同図（ｂ）は罫線を除去した場合である。符号説明 −に画像入力部、２：原イメージメモリ、３：細線化イ
メージメモリ、４：線分データメモリ、５：罫線データメモリ、６：図
形メモリ、１３：細線化処理部、１４：線分座標決定部
、１５：罫線認識部、１６：罫線内文字出力部、１７：
文字認識部、２０：文書処理装置。第１把％２Ｖ％６１　に７：＠−■　　　Ｋ８：■−■晃１１　ｆｆ
ｉFIG. 1 is a block diagram showing the configuration of an embodiment according to the present invention, FIG. 2 is a flowchart showing the operation of this embodiment, and FIG.
The figure is an example of a table part in document information, Figure 4 is a diagram showing each line segment making up the ruled lines of this table part, Figure 5 is a diagram showing the ruled lines making up this table part, and Figure 6 is a diagram showing each line segment making up the ruled lines of this table part. Figure 7 shows the correspondence between ruled lines and their starting and ending points. Figure 7 (a) is another ruled line, Figure 7 (b) is yet another ruled line, and Figure 7 (C) is yet another ruled line. Figure 8
The figure shows an area divided by ruled lines.
) is an overall view of the ruled line, (b) is a view of the maximum area divided by the ruled line, (C) is a view of the intermediate area, and (d) is the same (a view of the final area, Figure 9 is a diagram showing the hierarchical structure of areas partitioned by ruled lines, Figure 10 is an explanatory diagram of an example of a case where errors may occur in conventional recognition of characters within ruled lines, and (a) in the same figure shows the hierarchical structure of areas partitioned by ruled lines. In this case, Fig. 11 is an explanatory diagram of another example, Fig. 11 is an explanatory diagram of another example, Fig. 11 is an explanatory diagram of the case in which the ruled lines are removed, Fig. 11 is an explanatory diagram of the case in which the ruled lines are removed, and Fig. 11 (b) is in the case in which the ruled lines are removed. 2: original image memory, 3: thinning image memory, 4: line segment data memory, 5: ruled line data memory, 6: figure memory, 13: thinning processing unit, 14. : line segment coordinate determination unit, 15: ruled line recognition unit, 16: ruled line character output unit, 17:
Character recognition unit, 20: document processing device. 1st group %2V %61 ni7: @-■ K8: ■-■ Akira 11 ff
i

Claims

[Claims] The table portion is specified from document information roughly divided into text portions, table portions, and graphic portions, and the ruled lines that make up this table portion are recognized, and the character portions within the ruled lines are recognized as characters. a line segment data storage section that stores data regarding the starting point and end point of each line segment in the document information input through the image input section; a ruled line recognition unit that recognizes each line segment continued in the same direction as one ruled line based on stored data; and a ruled line data storage that stores data regarding the starting point and end point of each ruled line recognized by the ruled line recognition unit. and; based on the data stored in the ruled line data storage section and the video signal from the image input section, the character section included in the area partitioned by each ruled line is sent to the character recognition section. A document processing device comprising: a character output unit;