JPH01166184A - Document processor - Google Patents

Document processor

Info

Publication number
JPH01166184A
JPH01166184A JP62326231A JP32623187A JPH01166184A JP H01166184 A JPH01166184 A JP H01166184A JP 62326231 A JP62326231 A JP 62326231A JP 32623187 A JP32623187 A JP 32623187A JP H01166184 A JPH01166184 A JP H01166184A
Authority
JP
Japan
Prior art keywords
ruled
ruled line
line
character
line segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP62326231A
Other languages
Japanese (ja)
Other versions
JP2602259B2 (en
Inventor
Katsumi Hosokawa
勝美 細川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuji Electric Co Ltd
Original Assignee
Fuji Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuji Electric Co Ltd filed Critical Fuji Electric Co Ltd
Priority to JP62326231A priority Critical patent/JP2602259B2/en
Publication of JPH01166184A publication Critical patent/JPH01166184A/en
Application granted granted Critical
Publication of JP2602259B2 publication Critical patent/JP2602259B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Character Input (AREA)

Abstract

PURPOSE:To improve the recognition efficiency of a character at a surface part by recognizing the character included within an area divided by a ruled line as one belonging to the surface part. CONSTITUTION:A line segment data memory part 4 stores each data concerning with the start point and the end point of each line segment in document information inputted through an image input part 1. A ruled line recognition part 15 recognizes each line segment which continues in an identical direction as one ruled line based on the stored data of the memory part 4 and simultaneously stores each data concerning with the start point and the end point of each ruled line in a ruled line data memory part 5. Then, a character within a ruled line output part 16 sends the character part included within the area divided by the ruled line to a character recognition part 17 based on the stored data of the memory part 5 and a video signal from the image input part 1 and makes the character recognition part 17 to recognize the character.

Description

【発明の詳細な説明】[Detailed description of the invention] 【産業上の利用分野】[Industrial application field]

この発明は、文章部分2表部分および図形部分に大別さ
れる文書情報から前記表部分を特定し、この表部分を構
成する罫線を認識するとともに、この罫線内の文字を文
字認識部によって認識させる文書処理装置に関する。
This invention specifies the table part from document information that is roughly divided into text parts, table parts, and graphic parts, recognizes the ruled lines that make up this table part, and recognizes the characters within the ruled lines by a character recognition unit. The present invention relates to a document processing device.

【従来の技術】[Conventional technology]

従来、文書情報の認識処理は、その内の文章部分が主な
対象であり、まず文字の行または列に大分けし、ついで
個々の文字を分離(切り分け)した後に、光学的読取り
方式に基づいて文字認識するという手順でなされた。
Conventionally, the main target of the recognition process for document information is the text within the document, which is first divided into rows or columns of characters, then separated (cut) into individual characters, and then processed based on an optical reading method. This was done using the procedure of character recognition.

【発明が解決しようとする問題点】[Problems to be solved by the invention]

一般に文書情報は、文章部分2表部分および図形部分に
大別され、さらに表部分は、罫線とこの内にある文字部
(正確には、文字、数字、記号など)とからなる。そし
て、文章部分は、前記のように主として光学的読取り方
式に基づく文字認識手段によって、まだ図形部分も、類
似の方式に基づく図形認識手段によって、現在それぞれ
ある水準の認識率で認識可能である。 ところが、裏部分に属する文字部は、罫線内に記入され
る特殊条件から、■行間隔や文字間隔が異なるためにそ
の文字認識がし難い、■たとえ認識できたとしても時間
が多くかかる、■罫線で区切られた別々の文が、一つの
つながった文として誤認識されるおそれがある、■縦書
き、横書きが混同して誤認識のおそれがある。 例えば、第10図(a)に示した裏部分において、罫線
の、中央左側の領域内に1行の文字「abc」があり、
中央の領域内に2行の文字rdefgjとrhijk」
とがある。同図(b)のように罫線を除去した後、横方
向に文字を調べていったとき、1行の部分が2行の部分
に分岐するから判断に迷うことになる。また、第11図
(a)のように裏部分において、「qrst」とruv
wxyz」とが隣接した罫線内領域に記入されていると
き、同図(b)のように罫線を除去すると、あたかも1
行の連続した文字群rqrStuVWX)’ZJである
がのように誤認されるおそれが生じる。 すなわち、従来の技術では、個々の文字の認識は、かな
りの水準で正しくなされるが、裏部分に含まれる文字の
認識としては不十分であり、認識率が低く、かつ認識時
間も多くかかる、という問題がある。 この発明の目的は、従来の技術がもつ以上の問題点を解
消し、文書情報の内の裏部分を正確に、かつ迅速に認識
することのできる文書処理装置を提供することにある。
In general, document information is roughly divided into a text part, a table part, and a graphic part, and the table part consists of ruled lines and character parts (more precisely, letters, numbers, symbols, etc.) within the ruled lines. Text portions can be recognized by character recognition means mainly based on the optical reading method as described above, and graphic portions can still be recognized at a certain level of recognition rate by means of figure recognition means based on a similar method. However, due to the special conditions that the characters belonging to the back part are written within the ruled lines, ■ it is difficult to recognize the characters because the line spacing and character spacing is different, ■ it takes a lot of time even if it is possible to recognize them, ■ Separate sentences separated by ruled lines may be mistakenly recognized as one continuous sentence. - Vertical and horizontal writing may be confused and misrecognized. For example, in the back part shown in FIG. 10(a), there is one line of characters "abc" in the area to the left of the center of the ruled line,
Two lines of letters rdefgj and rhijk in the central area.
There is. After removing the ruled lines as shown in FIG. 2B, when examining the characters horizontally, the one-line portion branches into two-line portions, making it difficult to make a decision. Also, as shown in Figure 11(a), on the back side, "qrst" and ruv
wxyz" is written in the adjacent ruled line area, if you remove the ruled line as shown in FIG.
There is a risk that the character group rqrStuVWX)'ZJ may be mistakenly recognized as a continuous line of characters. In other words, with the conventional technology, individual characters can be recognized accurately to a fairly high level, but recognition of characters included in the back part is insufficient, the recognition rate is low, and the recognition time is long. There is a problem. SUMMARY OF THE INVENTION It is an object of the present invention to provide a document processing device that can solve the above-mentioned problems of the prior art and can accurately and quickly recognize the hidden parts of document information.

【問題点を解決するための手段】[Means to solve the problem]

前記の目的を達成するために、本発明に係る文書処理装
置は、 文章部分9表部分および図形部分に大別される文書情報
から前記裏部分を特定し、この裏部分を構成する罫線を
認識するとともに、この罫線内の文字部を文字認識部に
よって認識させる装置であって、 画像入力部を介して入力される前記文書情報の内の各線
分の始点、終点に関する各データを格納する線分データ
記憶部と; この線分データ記憶部の格納データに基づき前記各線分
が同一方向に連続するものを一つの罫線と認識する罫線
認識部と; この罫線認識部によって認識された前記各罫線の始点、
終点に関する各データを格納する罫線データ記憶部と; この罫線データ記憶部の格納データと、前記画像入力部
からの映像信号とに基づき、前記各罫線により区画され
た領域内に含まれる前記文字部を前記文字認識部に送出
する罫線内文字出力部と;を備えてなる。
In order to achieve the above object, the document processing device according to the present invention specifies the back part from document information that is roughly divided into a text part, a front part, and a graphic part, and recognizes the ruled lines that make up this back part. At the same time, the device recognizes the character portion within the ruled line by a character recognition unit, the line segment storing each data regarding the start point and end point of each line segment in the document information input via the image input unit. a data storage unit; a ruled line recognition unit that recognizes each line segment continued in the same direction as one ruled line based on the data stored in the line segment data storage unit; starting point,
a ruled line data storage unit that stores each data regarding the end point; and the character portion included in the area partitioned by each of the ruled lines based on the data stored in the ruled line data storage unit and the video signal from the image input unit. and a ruled line character output section for sending the character to the character recognition section.

【作 用】[For use]

線分データ記憶部が、画像入力部を介して入力された文
書情報の内の各線分の始点5終点に関する各データを格
納する。罫線認識部が、線分データ記憶部の格納データ
に基づき各線分が同一方向に連続するものを−っの罫線
と認識するとともに、この各罫線の始点、終点に関する
各データを罫線データ記憶部に格納する。罫線内文字出
力部が、罫線データ記憶部の格納データと、画像入力部
からの映像信号とに基づき、罫線によって区画された領
域内に含まれる文字部を文字認識部に送出し、この文字
認識部によって文字認識がおこなわれる。
The line segment data storage unit stores each data regarding the starting point and ending point of each line segment in the document information input through the image input unit. Based on the data stored in the line segment data storage unit, the ruled line recognition unit recognizes that each line segment continues in the same direction as a ruled line, and also stores each data regarding the starting point and end point of each ruled line in the ruled line data storage unit. Store. Based on the data stored in the ruled line data storage section and the video signal from the image input section, the ruled line character output section sends the character section included in the area divided by the ruled lines to the character recognition section, and performs character recognition. Character recognition is performed by the section.

【実施例】【Example】

本発明に係る実施例を示す文書処理装置について、以下
に図面を参照しながら説明する。 第1図はこの文書処理装置の構成を示すブロック図で、
この文書処理装置20は大別すると、画像入力部1、各
種メモリ、および各種処理部から構成される。 各種メモリは、画像入力部Iからの文書映像信号のデー
タを格納する原イメージメモリ2、このデータを後述す
る細線化処理部13によって細線化処理したものを格納
する細線化イメージメモリ3、罫線を構成する個々の線
分データを格納する線分データメモリ4、この線分デー
タで構成された罫線のデータを格納する罫線データメモ
リ5、および文書情報としての図形部のデータを格納す
る図形メモリ6である。なお、図形メモリ6は、この発
明に係る文書処理装置20では特には取り扱われない。 各種処理部は、先程の細線化処理部13、細線化イメー
ジメモリ3のデータに基づき罫線を構成する各線分の始
点、終点の座標を決定する線分座標決定部14、線分デ
ータメモリ4のデータに基づき各罫線を認識するととも
にその始点、終点の座標を決定する罫線座標決定部15
、罫線内に含まれる文字を特定しこれを出力する罫線内
文字出力部16、およびこの文字出力を受けこれを認識
する文字認識部17である。 この文書処理装置20の動作について、基本的に第2図
のフローチャートを、補助的に第1図の構成図その他を
、それぞれ参照しながら以下に説明する。 第2図において、ステップS1で、画像入力部1(第1
図参照)からの文書情報の映像信号を原イメージメモリ
2に記憶、格納させる。ステップS2の「細線化処理」
は、原イメージメモリ2に格納されている映像信号デー
タに基づき細線化処理部13によってなされ、画像を所
定幅の細線に変換する一種の画像整形処理である。その
結果は細線化イメージメモリ3に格納される。 ステップS3で「線分座標の決定」がなされるが、この
「線分座標の決定」は、細線化イメージメモリ3の格納
データに基づき線分座標決定部14によってなされ、罫
線を構成する各線分の始点。 終点の各座標を決める処理である。ここで、線分とは、
自由端部または交点で区分される直線部分であり、その
始点、終点は自由端部または交点に相当する。 線分座標の決定について、さらに第3図、第4図を参照
しながら具体的に説明する。なお、第3図は文書情報に
おける一表部分の例示図、第4図はこの表部分の罫線を
構成する各線分を示す図である。罫線内のアルファベッ
トは記入文字を示す。 第4図のように、罫線は、その各交点間の線分の集合と
して構成され、その各線分は各交点に付けられた番号■
ないし■によって表される。たとえば、線分L1は■−
■、線分L2は■−■、以下同様にして線分L21は0
−■、である。各線分の始点、終点の座標は、■ないし
[相]の付けられた交点の座標に相当し、この各交点の
座標は、周知の画像処理技術によって容易に求めること
ができるから、各線分の始点、終点の各座標が決定する
ステップS4で「罫線座標の決定」がなされるが、これ
は、線分データメモリ4からのデータに基づき罫線認識
部15によってなされ、表部分を構成する各罫線の始点
、終点の各座標を決める処理である。罫線は、同一方向
をもつ複数個の線分の集合として定義され、例えば表部
分を構成する罫線を示す第5図において、K1ないしに
7で示される。第5図において、各交点に付けた番号■
ないし■は、第4図におけるのと同じである。罫線をそ
の始点、終点の各番号で表すと、各罫線に1ないしに7
は、第6図に示す対応図のようになる。 この第6図の対応図を求めることが、罫線座標決定に相
当する。 ところで、表部分の罫線には、既に説明した第3図のよ
うに、外側が閉じた枠状をなし、内部に縦、横の各罫線
が配置される形態の外に、第7図(a)、同図う)、同
図(C)のような各種の形態のものがある。すなわち、
第3図の罫線と比べて、第7図(a)では左右両側の各
縦罫線がなく、第7図(b)では内部の各縦罫線がなく
、第7図(C)では各縦罫線がまったくない。 ステップS5で「罫線による区画領域の決定」がなされ
るが、これは、2個ないし4個の罫線によって区画され
る領域を決める処理であり、罫線認識部15によって前
記の「罫線座標の決定」に関連しておこなわれる。次に
第8図、第9図を参照しながら説明する。なお、第8図
は罫線によって区画された領域を示す図で、同図(a)
は罫線の全体図、同図(b)は罫線による最大領域の図
、同図(C)は同じくその中間領域の図、同図(d)は
同じくその最終領域の図である。 まず、第8図働)に示すように、外側の枠を構成する4
個の罫線Kl、に2.に7.に8によって領域Moが決
められる。次に、第8図(C)に示すように:前記の罫
線Kl、に2.に7.に8とともに、2個の横罫線に3
.に5によって、領域MOが細分される形で領域M1.
M2.M3が決定される。次に、第8図(d)に示すよ
うに、2個の縦罫線に4.に6によって細分され、新た
な領域が決定されるが、まず罫線に4による段階と、次
の罫線に6による段階との2段階をとる。すなわち、罫
線に4により、領域M1が領域M11. M12に、領
域M2が領域M21゜Mg2(破線枠)に、領域M3が
領域M31. M32(破線枠)に、それぞれ細分され
る。次に罫線6によって、領域M22が領域M23. 
M24に、領域M32が領域M33. M34に、それ
ぞれ細分される。 第9図は罫線によって区画された各領域の階層構造を示
す図で、前記の各領域を、決定される順序にしたがって
配置した図である。すなわち、丸印に付けた符号が前記
の領域符号で、例えば領域Moが領域Ml、M2.M3
に細分化され、ついで領域M1が領域Mll、 M12
に細分化されることを表し、以下同様である。 さて第2図に戻り、ステップS6で、罫線で区画された
領域内の文字だけが、罫線データメモリ5からのデータ
と、原イメージメモリ2からのデータとに基づき罫線内
文字出力部16によって、文字認識部17へ送出され、
ここで文字認識がおこなわれる(ステップS7)。 なお、この発明では直接関係ないが、原イメージメモリ
2のデータと、罫線認識部15とに基づいて罫線内の図
形情報のデータが図形メモリ6に格納される。そして、
この図形メモリ6のデータに基づき図示してない罫線内
図形出力部と、図形認識部とによって罫線内の図形が認
識される。この処理が、ステップS8の「罫線内図形の
出力」とステップS9の「図形認識」である。
A document processing device showing an embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing the configuration of this document processing device.
This document processing device 20 is roughly divided into an image input section 1, various memories, and various processing sections. The various memories include an original image memory 2 that stores the document video signal data from the image input section I, a thinning image memory 3 that stores the thinned data processed by a thinning processing section 13 (to be described later), and a thinning image memory 3 that stores ruled lines. A line segment data memory 4 that stores constituent line segment data, a ruled line data memory 5 that stores ruled line data composed of the line segment data, and a graphic memory 6 that stores graphic part data as document information. It is. Note that the graphic memory 6 is not particularly handled in the document processing device 20 according to the present invention. The various processing units include the line thinning processing unit 13 mentioned above, the line segment coordinate determining unit 14 that determines the coordinates of the starting point and ending point of each line segment forming a ruled line based on the data in the line thinning image memory 3, and the line segment coordinate determining unit 14 in the line segment data memory 4. A ruled line coordinate determining unit 15 that recognizes each ruled line based on data and determines the coordinates of its starting point and ending point.
, a ruled line character output section 16 that specifies and outputs the characters included in the ruled lines, and a character recognition section 17 that receives and recognizes the character output. The operation of this document processing device 20 will be described below, with reference basically to the flowchart shown in FIG. 2, and supplementary to the block diagram shown in FIG. 1, and the like. In FIG. 2, in step S1, the image input section 1 (first
The video signal of the document information from (see figure) is memorized and stored in the original image memory 2. "Thinning process" in step S2
This is a type of image shaping process that is performed by the thinning processing section 13 based on the video signal data stored in the original image memory 2, and converts the image into thin lines of a predetermined width. The result is stored in the thinning image memory 3. "Determination of line segment coordinates" is performed in step S3. This "determination of line segment coordinates" is performed by the line segment coordinate determination unit 14 based on the data stored in the thinning image memory 3, and each line segment constituting the ruled line is starting point. This is the process of determining each coordinate of the end point. Here, the line segment is
It is a straight line section divided by free ends or intersections, and its start and end points correspond to the free ends or intersections. Determination of line segment coordinates will be specifically explained with further reference to FIGS. 3 and 4. Note that FIG. 3 is an illustrative diagram of a table portion in document information, and FIG. 4 is a diagram showing each line segment forming the ruled lines of this table portion. Alphabets within the ruled lines indicate characters to be written. As shown in Figure 4, ruled lines are constructed as a collection of line segments between their intersections, and each line segment has a number assigned to each intersection.
It is represented by or ■. For example, line segment L1 is ■-
■, line segment L2 is ■-■, and so on, line segment L21 is 0
−■, is. The coordinates of the starting point and end point of each line segment correspond to the coordinates of the intersection marked with ■ or [phase], and the coordinates of each intersection point can be easily obtained using well-known image processing technology, so each line segment "Determination of ruled line coordinates" is performed in step S4 in which the coordinates of the start point and end point are determined. This is done by the ruled line recognition unit 15 based on the data from the line segment data memory 4, and each ruled line forming the table part This process determines the coordinates of the starting point and ending point. A ruled line is defined as a set of a plurality of line segments having the same direction, and is indicated by K1 to K7 in FIG. 5, which shows ruled lines constituting a table portion, for example. In Figure 5, the number given to each intersection point is
to ■ are the same as in FIG. If a ruled line is represented by numbers for its starting point and ending point, each ruled line has a number from 1 to 7.
is as shown in the correspondence diagram shown in FIG. Obtaining the corresponding diagram in FIG. 6 corresponds to determining the ruled line coordinates. By the way, the ruled lines of the table part have a frame shape with the outside closed and vertical and horizontal ruled lines arranged inside, as shown in FIG. 3 already explained. There are various forms such as ), (a) in the same figure, and (c) in the same figure. That is,
Compared to the ruled lines in Figure 3, in Figure 7 (a) there are no vertical ruled lines on both the left and right sides, in Figure 7 (b) there are no internal vertical ruled lines, and in Figure 7 (C) there are no vertical ruled lines on both sides. There is no such thing. In step S5, "determination of divided area by ruled lines" is performed, which is a process of determining an area divided by two to four ruled lines, and the ruled line recognition unit 15 performs the "determination of ruled line coordinates". It is carried out in connection with. Next, explanation will be given with reference to FIGS. 8 and 9. In addition, FIG. 8 is a diagram showing an area divided by ruled lines, and FIG.
10 is an overall view of the ruled lines, (b) is a view of the maximum area by the ruled lines, (c) is a view of the intermediate area, and (d) is a view of the final area. First, as shown in Figure 8, the four parts that make up the outer frame are
2. 7. The area Mo is determined by 8. Next, as shown in FIG. 8(C): 2. 7. with 8 on , and 3 on two horizontal lines
.. 5, the area MO is subdivided into areas M1.
M2. M3 is determined. Next, as shown in FIG. 8(d), 4. The area is subdivided by 6 to determine a new area, but two steps are taken: first by using 4 for the ruled line, and then by using 6 for the next ruled line. That is, due to the ruled line 4, the area M1 is changed to the area M11. M12, region M2 is region M21°Mg2 (broken line frame), region M3 is region M31. Each is subdivided into M32 (dashed line frame). Next, by the ruled line 6, the area M22 is changed to the area M23.
M24, area M32 and area M33. Each is subdivided into M34. FIG. 9 is a diagram showing a hierarchical structure of each region divided by ruled lines, and is a diagram in which the above-mentioned regions are arranged in a determined order. That is, the code attached to the circle is the area code mentioned above, and for example, area Mo is area Ml, M2 . M3
Then, area M1 is subdivided into areas Mll, M12
The same applies hereafter. Now, returning to FIG. 2, in step S6, only the characters within the area demarcated by the ruled lines are outputted by the ruled line character output section 16 based on the data from the ruled line data memory 5 and the data from the original image memory 2. Sent to the character recognition unit 17,
Character recognition is performed here (step S7). Although not directly relevant to the present invention, data of graphic information within ruled lines is stored in the graphic memory 6 based on the data in the original image memory 2 and the ruled line recognition unit 15. and,
Based on the data in the graphic memory 6, a graphic within the ruled line is recognized by a graphic within the ruled line output section (not shown) and a graphic recognizing section. This processing is "output of figures within ruled lines" in step S8 and "figure recognition" in step S9.

【発明の効果】【Effect of the invention】

以上説明したように、この発明においては、線分データ
記憶部が、画像入力部を介して入力された文書情報の内
の各線分の始点、終点に関する各データを格納する;罫
線認識部が、線分データ記憶部の格納データに基づき各
線分が同一方向に連続するものを一つの罫線と認識する
とともに、この各罫線の始点、終点に関する各データを
罫線データ記憶部に格納する;罫線内文字出力部が、罫
線データ記憶部の格納データと、画像入力部からの映像
信号とに基づき、罫線によって区画された領域内に含ま
れる文字部を文字認識部に送出し、この文字認識部によ
って文字認識がおこなわれる。 したがって、この発明によれば、従来の技術に比べ次の
ようなすぐれた効果がある。 (1)罫線は、この発明装置の罫線認識部によって認識
され、罫線によって区画された領域内に含まれる文字は
、表部分に属するものとして文字認識部によって、文章
部分と混同されることなく認識されるから、結果として
表部分の認識率の向上を図ることができる。 (2)同系統の罫線に属する文字は、関連性あるものと
して、その認識処理速度を向上させることができるから
、全体的に表部分の認識に要する時間を短縮することが
できる。
As explained above, in the present invention, the line segment data storage section stores data regarding the starting point and end point of each line segment in the document information input via the image input section; Based on the data stored in the line segment data storage unit, each line segment that continues in the same direction is recognized as one ruled line, and each data regarding the starting point and end point of each ruled line is stored in the ruled line data storage unit; Characters in the ruled line The output section sends the character part included in the area divided by the ruled lines to the character recognition section based on the data stored in the ruled line data storage section and the video signal from the image input section, and the character recognition section converts the characters. Recognition takes place. Therefore, the present invention has the following superior effects compared to the conventional technology. (1) The ruled lines are recognized by the ruled line recognition section of the device of the present invention, and the characters included in the area demarcated by the ruled lines are recognized by the character recognition section as belonging to the front section without being confused with the text section. As a result, the recognition rate of the front part can be improved. (2) Since characters belonging to the same type of ruled line are considered to be related, the recognition processing speed can be improved, so that the overall time required for recognizing the front part can be shortened.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明に係る実施例の構成を示すブロック図、 第2図はこの実施例の動作を示すフローチャート、第3
図は文書情報における一表部分の例示図、第4図はこの
表部分の罫線を構成する各線分を示す図、 第5図はこの表部分を構成する罫線を示す図、第6図は
各罫線とその始点、終点との対応を示す図、 第7図(a)は別の罫線の図、同図(b)はさらに別の
罫線の図、同図(C)はまたさらに別の罫線の図、第8
図は罫線によって区画された領域を示す図で、同図(a
)は罫線の全体図、同図(b)は罫線によって区画され
た最大領域の図、同図(C)は同じくその中間領域の図
、同図(d)は同じ(その最終領域の図、第9図は罫線
によって区画された領域の階層構造を示す図、 第10図は従来の罫線内文字の認識において誤りを生じ
るおそれのある一例の説明図で、同図(a)は罫線のあ
る場合、同図(b)は罫線を除去した場合、第11図は
同じくその別の例の説明図で、同図(a)は罫線のある
場合、同図(b)は罫線を除去した場合である。 符号説明 −に画像入力部、2:原イメージメモリ、3:細線化イ
メージメモリ、 4:線分データメモリ、5:罫線データメモリ、6:図
形メモリ、13:細線化処理部、14:線分座標決定部
、15:罫線認識部、16:罫線内文字出力部、17:
文字認識部、20:文書処理装置。 第1把 %2V %61 に7:@−■   K8:■−■晃11 ff
FIG. 1 is a block diagram showing the configuration of an embodiment according to the present invention, FIG. 2 is a flowchart showing the operation of this embodiment, and FIG.
The figure is an example of a table part in document information, Figure 4 is a diagram showing each line segment making up the ruled lines of this table part, Figure 5 is a diagram showing the ruled lines making up this table part, and Figure 6 is a diagram showing each line segment making up the ruled lines of this table part. Figure 7 shows the correspondence between ruled lines and their starting and ending points. Figure 7 (a) is another ruled line, Figure 7 (b) is yet another ruled line, and Figure 7 (C) is yet another ruled line. Figure 8
The figure shows an area divided by ruled lines.
) is an overall view of the ruled line, (b) is a view of the maximum area divided by the ruled line, (C) is a view of the intermediate area, and (d) is the same (a view of the final area, Figure 9 is a diagram showing the hierarchical structure of areas partitioned by ruled lines, Figure 10 is an explanatory diagram of an example of a case where errors may occur in conventional recognition of characters within ruled lines, and (a) in the same figure shows the hierarchical structure of areas partitioned by ruled lines. In this case, Fig. 11 is an explanatory diagram of another example, Fig. 11 is an explanatory diagram of another example, Fig. 11 is an explanatory diagram of the case in which the ruled lines are removed, Fig. 11 is an explanatory diagram of the case in which the ruled lines are removed, and Fig. 11 (b) is in the case in which the ruled lines are removed. 2: original image memory, 3: thinning image memory, 4: line segment data memory, 5: ruled line data memory, 6: figure memory, 13: thinning processing unit, 14. : line segment coordinate determination unit, 15: ruled line recognition unit, 16: ruled line character output unit, 17:
Character recognition unit, 20: document processing device. 1st group %2V %61 ni7: @-■ K8: ■-■ Akira 11 ff
i

Claims (1)

【特許請求の範囲】 文章部分、表部分および図形部分に大別される文書情報
から前記表部分を特定し、この表部分を構成する罫線を
認識するとともに、この罫線内の文字部を文字認識部に
よって認識させる装置であって、 画像入力部を介して入力される前記文書情報の内の各線
分の始点、終点に関する各データを格納する線分データ
記憶部と;この線分データ記憶部の格納データに基づき
前記各線分が同一方向に連続するものを一つの罫線と認
識する罫線認識部と;この罫線認識部によって認識され
た前記各罫線の始点、終点に関する各データを格納する
罫線データ記憶部と;この罫線データ記憶部の格納デー
タと、前記画像入力部からの映像信号とに基づき、前記
各罫線によって区画された領域内に含まれる前記文字部
を前記文字認識部に送出する罫線内文字出力部と;を備
えてなることを特徴とする文書処理装置。
[Claims] The table portion is specified from document information roughly divided into text portions, table portions, and graphic portions, and the ruled lines that make up this table portion are recognized, and the character portions within the ruled lines are recognized as characters. a line segment data storage section that stores data regarding the starting point and end point of each line segment in the document information input through the image input section; a ruled line recognition unit that recognizes each line segment continued in the same direction as one ruled line based on stored data; and a ruled line data storage that stores data regarding the starting point and end point of each ruled line recognized by the ruled line recognition unit. and; based on the data stored in the ruled line data storage section and the video signal from the image input section, the character section included in the area partitioned by each ruled line is sent to the character recognition section. A document processing device comprising: a character output unit;
JP62326231A 1987-12-22 1987-12-22 Document processing device Expired - Lifetime JP2602259B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP62326231A JP2602259B2 (en) 1987-12-22 1987-12-22 Document processing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP62326231A JP2602259B2 (en) 1987-12-22 1987-12-22 Document processing device

Publications (2)

Publication Number Publication Date
JPH01166184A true JPH01166184A (en) 1989-06-30
JP2602259B2 JP2602259B2 (en) 1997-04-23

Family

ID=18185455

Family Applications (1)

Application Number Title Priority Date Filing Date
JP62326231A Expired - Lifetime JP2602259B2 (en) 1987-12-22 1987-12-22 Document processing device

Country Status (1)

Country Link
JP (1) JP2602259B2 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS57211674A (en) * 1981-06-23 1982-12-25 Ricoh Co Ltd Frame recognizing method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS57211674A (en) * 1981-06-23 1982-12-25 Ricoh Co Ltd Frame recognizing method

Also Published As

Publication number Publication date
JP2602259B2 (en) 1997-04-23

Similar Documents

Publication Publication Date Title
JP2002203207A (en) Character recognizing method and program, and recording medium
JPH01166184A (en) Document processor
US6088666A (en) Method of synthesizing pronunciation transcriptions for English sentence patterns/words by a computer
JPH06215184A (en) Labeling device for extracted area
WO2022025216A1 (en) Information processing device using compression data search engine, and information processing method therefor
JPH1049624A (en) Method and device for handwritten character recognition
JP2538543B2 (en) Character information recognition device
JP3072126B2 (en) Method and apparatus for identifying typeface
JPH0896080A (en) Optical character reader
JP2890241B2 (en) Optical character recognition device
JPS61133487A (en) Character recognizing device
JP2519782B2 (en) Character separation method
JP2740506B2 (en) Image recognition method
JPH04342089A (en) Character input procedding method
JP2784004B2 (en) Character recognition device
JP2987877B2 (en) Character recognition method
JPH02253340A (en) Knowledge processing method applying picture processing
JP2743995B2 (en) Character reader
CN114758349A (en) Format identification method, audio file acquisition method and auxiliary reading equipment
JPS61158389A (en) Rule processor
JPH02136956A (en) Extracting method for layout information
JPH01173178A (en) Extracting system for broken/dotted line string area
JPH04177483A (en) Pattern matching system
JPS5943486A (en) Processing system for extracting circle
JPH04156694A (en) Character recognition system