JPH0628520A - Character recognition device - Google Patents

Character recognition device

Info

Publication number
JPH0628520A
JPH0628520A JP4183357A JP18335792A JPH0628520A JP H0628520 A JPH0628520 A JP H0628520A JP 4183357 A JP4183357 A JP 4183357A JP 18335792 A JP18335792 A JP 18335792A JP H0628520 A JPH0628520 A JP H0628520A
Authority
JP
Japan
Prior art keywords
rectangle
character
area
unit
image data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP4183357A
Other languages
Japanese (ja)
Inventor
Yumiko Ikemure
由美子 池牟▲禮▼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Priority to JP4183357A priority Critical patent/JPH0628520A/en
Publication of JPH0628520A publication Critical patent/JPH0628520A/en
Pending legal-status Critical Current

Links

Abstract

PURPOSE:To provide a character recognition device capable of correctly extracting a character area within a ruled line. CONSTITUTION:A character rectangle decision part 10 classifying picture data into a character rectangle or into others, a table rectangle decision part 12 extracting a linear component in the rectangle as to data except for the character rectangle and judging whether it is a table or others except for the table from the linear component, and a ruled line area extraction part 13 detecting circumscribing rectangle black element density and whole black picture element density in a rectangle area as to circumscribing rectangle of the others except for the table and judging whether the area is the ruled line area or a graphic area from black picture element density and linear component information are provided. Rectangle information on the rectangle judged to be the ruled line area is exchanged with the extracted linear component.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は、印刷文書のデータベー
ス化や文書の再利用のために、スキャナ等の光学的手段
を用いて文書画像を取り込み、取り込んだ画像データか
ら文字、図形、表、罫線等の属性ごとに領域を抽出し、
各属性に応じた認識処理を行う文字認識装置に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention captures a document image by using an optical means such as a scanner, etc., for creating a database of printed documents and reusing the document, and characters, figures, tables, etc. are captured from the captured image data. Extract the area for each attribute such as ruled line,
The present invention relates to a character recognition device that performs recognition processing according to each attribute.

【0002】[0002]

【従来の技術】従来の方式について以下説明する。2. Description of the Related Art A conventional method will be described below.

【0003】まず、スキャナによって取り込まれた2値
データから外接矩形を検出し、外接矩形の大きさと外接
矩形の黒画素密度から、文字矩形と文字以外の矩形とに
分類し、分類された文字矩形を統合することによって文
字領域を抽出する。
First, a circumscribing rectangle is detected from binary data read by a scanner, and is classified into a character rectangle and a non-character rectangle based on the size of the circumscribing rectangle and the black pixel density of the circumscribing rectangle. The character area is extracted by integrating.

【0004】文字以外の矩形については、矩形内に微小
な矩形があらかじめ定められたしきい値以上か、あるい
は、矩形内に占める黒画素密度があらかじめ定められた
しきい値以上あれば、その矩形領域は画像領域となる。
For rectangles other than characters, if a minute rectangle is within a predetermined threshold value or if the black pixel density in the rectangle is greater than or equal to a predetermined threshold value, the rectangle The area becomes an image area.

【0005】文字・画像にならなかった残りの矩形に対
しては、矩形内での罫線候補を検出し、線情報を基に表
か図形かを判断して領域を抽出する。
With respect to the remaining rectangles that have not become characters / images, ruled line candidates within the rectangles are detected, and a region is extracted by determining whether they are tables or figures based on the line information.

【0006】抽出した領域が他の領域と重なる場合は領
域を1つの領域に統合して、文字領域については文字認
識を行う。画像領域は画像圧縮、表領域は表認識、図形
領域はベクトル化を行う。
When the extracted area overlaps another area, the areas are integrated into one area, and character recognition is performed on the character area. The image area is image-compressed, the table area is table-recognized, and the graphic area is vectorized.

【0007】[0007]

【発明が解決しようとする課題】しかしながら従来の方
式では、図3に示すような例においては、図7のように
複数の罫線で構成される領域は、表の条件に当てはまら
ないために図形領域となってしまい、内部の文字に対し
てベクトル化するといった課題を有していた。
However, in the conventional method, in the example shown in FIG. 3, the area formed by a plurality of ruled lines as shown in FIG. Therefore, there is a problem that the internal characters are vectorized.

【0008】本発明は上記課題を解決するもので、罫線
で囲まれている文字領域を正しく抽出する文字認識装置
の提供を目的とする。
The present invention solves the above problems, and an object of the present invention is to provide a character recognition apparatus that correctly extracts a character area surrounded by ruled lines.

【0009】[0009]

【課題を解決するための手段】本発明は、上記目的を達
成するため、図形領域となった外接矩形について、外接
矩形黒画素密度と矩形領域内の全黒画素密度(図3の例
では外接矩形黒画素密度は外接矩形領域内の全画素数に
対する領域内に占める罫線の黒画素数の割合、矩形領域
内の全黒画素密度は外接矩形領域内の全画素数に対する
外接矩形領域内の罫線と文字を合せた全部の黒画素数の
割合)を検出する。全黒画素密度が外接矩形黒画素密度
の所定の倍数以上で、かつ、図形候補外接矩形内に、図
2の線成分抽出部で抽出した線成分があればその線成分
は罫線と判定格納され、罫線が取り出された図形候補矩
形を削除する構成を有する。
In order to achieve the above object, the present invention provides a circumscribed rectangle black pixel density and a total black pixel density in the circumscribed rectangle (in the example of FIG. The rectangular black pixel density is the ratio of the number of black pixels of the ruled line in the area to the total number of pixels in the circumscribed rectangular area, and the total black pixel density in the rectangular area is the ruled line in the circumscribed rectangular area relative to the total number of pixels in the circumscribed rectangular area. And the ratio of the total number of black pixels). If the total black pixel density is equal to or greater than a predetermined multiple of the circumscribed rectangle black pixel density, and the line component extracted by the line component extraction unit of FIG. 2 is present in the figure candidate circumscribed rectangle, the line component is determined to be a ruled line and stored. The configuration is such that the figure candidate rectangle from which the ruled line is extracted is deleted.

【0010】[0010]

【作用】本発明は上記した構成によって、複数の罫線が
結合している領域に対して1本ずつの罫線として取り出
すことが可能となるため、罫線で囲まれている文字領域
も罫線領域に統合されることなく、文字領域として正確
に抽出するように作用する。
According to the present invention, since the ruled lines can be taken out one by one with respect to the region in which a plurality of ruled lines are combined, the character region surrounded by the ruled lines is also integrated into the ruled line region. It operates so as to be accurately extracted as a character area without being deleted.

【0011】[0011]

【実施例】本発明の一実施例について図面を参照して説
明する。図1は本実施例における領域分割を実行する装
置のハードウェア構成を示すブロック図である。図1に
おいて、1は領域抽出を行う中央処理装置(以下、CP
Uという)であって図2に示す画像データ入力部7、画
像データ格納部8、外接矩形検出部9、文字矩形決定部
10、線成分抽出部11、表矩形決定部12、罫線領域
抽出部13、認識処理部14を含む。2は領域抽出プロ
グラムが格納されているリードオンリメモリ(以下、R
OMという)である。3はランダムアクセスメモリ(以
下、RAMという)であって、4のスキャナで読み取っ
た画像データが格納される。5は外部からCPU1に対
して指令を与えるためのキーボードであり、6はCPU
1によって認識された認識結果を表示する表示装置であ
る。
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing the hardware configuration of an apparatus that executes area division in this embodiment. In FIG. 1, reference numeral 1 is a central processing unit (hereinafter, referred to as CP
2) and the image data input unit 7, the image data storage unit 8, the circumscribed rectangle detection unit 9, the character rectangle determination unit 10, the line component extraction unit 11, the table rectangle determination unit 12, and the ruled line region extraction unit shown in FIG. 13 and a recognition processing unit 14. 2 is a read only memory (hereinafter, R
OM). Reference numeral 3 is a random access memory (hereinafter referred to as RAM) in which image data read by the scanner 4 is stored. Reference numeral 5 is a keyboard for giving commands to the CPU 1 from the outside, and 6 is a CPU
1 is a display device that displays the recognition result recognized by 1.

【0012】以下、領域分割処理について、図1ないし
図10を参照しながら説明する。画像データ入力部7と
画像データ格納部8では、スキャナ4で取り込んだ2値
画像データをRAM3に格納する(図6のフローチャー
トのs1)。画像データの座標は図9に示すように左上
が原点で、水平方向の座標をx、垂直方向の座標をyで
表す。
The area division processing will be described below with reference to FIGS. 1 to 10. The image data input unit 7 and the image data storage unit 8 store the binary image data captured by the scanner 4 in the RAM 3 (s1 in the flowchart of FIG. 6). As shown in FIG. 9, the coordinates of the image data are such that the upper left corner is the origin, the horizontal coordinates are x, and the vertical coordinates are y.

【0013】外接矩形検出部9では、格納された画像デ
ータから、8近傍で連結している黒画素の外接矩形と黒
画素の数を検出し(s2)、外接矩形のデータ、すなわ
ち左上の座標(x1,y1)と右下の座標(x2,y
2)と、黒画素の数をRAM3に格納する。図9に示す
画像データの外接矩形は図10となり、座標(x1,y
1,x2,y2)=(3,2,7,8)と、黒画素数1
5が格納される。図3の外接矩形については、図4に示
すように10個の外接矩形が検出される。
The circumscribed rectangle detection unit 9 detects the circumscribed rectangle of black pixels and the number of black pixels connected in the vicinity of 8 from the stored image data (s2), and the circumscribed rectangle data, that is, the upper left coordinate. (X1, y1) and lower right coordinates (x2, y
2) and the number of black pixels are stored in the RAM 3. The circumscribed rectangle of the image data shown in FIG. 9 is shown in FIG. 10, and the coordinates (x1, y
1, x2, y2) = (3,2,7,8) and the number of black pixels is 1
5 is stored. As for the circumscribed rectangle of FIG. 3, ten circumscribed rectangles are detected as shown in FIG.

【0014】文字矩形決定部10では、外接矩形の短辺
の長さがあらかじめ定められたしきい値Th1(=2
5)以上か、あるいは、領域に占める黒画素の割合がし
きい値Th2(=15)以下の場合は文字以外の矩形と
判定し、処理を続ける。外接矩形の短辺がしきい値Th
1未満で、かつ、外接矩形黒画素密度がしきい値Th2
を超える場合は文字領域と判定し、s9へ処理を進める
(s3)。図4の矩形1は、文字以外矩形としてs4へ
進み、矩形2〜矩形10は文字候補としてs9へ進む。
In the character rectangle determining unit 10, the length of the short side of the circumscribed rectangle is a predetermined threshold value Th1 (= 2).
5) or more, or if the ratio of black pixels in the area is less than or equal to the threshold Th2 (= 15), it is determined to be a rectangle other than a character, and the process is continued. The short side of the circumscribed rectangle is the threshold Th.
Is less than 1 and the circumscribed rectangle black pixel density is the threshold Th2.
If it exceeds, it is determined to be a character area, and the process proceeds to s9 (s3). The rectangle 1 in FIG. 4 is a rectangle other than a character and the process proceeds to s4, and the rectangles 2 to 10 are character candidates and the process proceeds to s9.

【0015】ここで、黒画素密度は[黒画素密度d1=
外接矩形の黒画素数/(矩形の幅×矩形の高さ)×10
0%]として求めた値であって、図4の矩形1は、座標
(x1,y1,x2,y2)=(5,5,34,4
4)、黒画素数98であるので、黒画素密度d1は、
[98/((34−5+1)×(44−5+1))×1
00=8.17]となる。
Here, the black pixel density is [black pixel density d1 =
Number of black pixels in circumscribed rectangle / (width of rectangle x height of rectangle) x 10
0%], the rectangle 1 in FIG. 4 has coordinates (x1, y1, x2, y2) = (5, 5, 34, 4).
4) and the number of black pixels is 98, the black pixel density d1 is
[98 / ((34-5 + 1) × (44-5 + 1)) × 1
00 = 8.17].

【0016】文字以外の矩形として残った矩形に対し
て、その矩形内に線の成分があるかどうかを調べる線成
分の検出処理を線成分抽出部11において行う(s
4)。線成分抽出の方法は水平方向/垂直方向それぞれ
に、黒画素の長さがしきい値Th3(=10)以上ある
かどうかをチェックする。図4の矩形1の線成分抽出結
果は図5のようになる。こうして検出された線成分を基
に次の表矩形決定部12では表の判定を行う。検出され
た水平線の線の長さが矩形の幅のしきい値Th4(=4
/5)倍のものがしきい値Th5(=3)個以上で、か
つ、垂直線の長さが矩形の高さのしきい値Th4倍以上
のものがしきい値Th5個以上あり、さらに、上記のい
ずれかの線に対して横切る線がしきい値Th6(=2)
以上あればその矩形は表と判定する(s5)。表と決定
された矩形はs9へジャンプし、表と判定されなかった
ものは罫線判定処理へ進む。図4の矩形1は、線成分が
図5のように水平線2、垂直線1であるため表の条件を
満たさない。したがって、この表と判定されなかった矩
形に対して、罫線領域抽出部13において、以下に述べ
る罫線判定を行う。
With respect to a rectangle remaining as a rectangle other than characters, a line component detection process for checking whether or not there is a line component in the rectangle is performed by the line component extraction unit 11 (s
4). The line component extraction method checks whether the length of the black pixel is equal to or greater than the threshold value Th3 (= 10) in each of the horizontal direction and the vertical direction. The result of extracting the line component of rectangle 1 in FIG. 4 is as shown in FIG. Based on the line components detected in this way, the table rectangle determination unit 12 next makes a table determination. The threshold value Th4 (= 4) in which the length of the detected horizontal line is the width of the rectangle
/ 5) times the threshold value is Th5 (= 3) or more, and the length of the vertical line is the threshold value Th4 times the height of the rectangle or more, the threshold value Th5 or more, and , The line crossing any of the above lines is the threshold value Th6 (= 2).
If so, the rectangle is determined to be a table (s5). A rectangle determined to be a table jumps to s9, and a rectangle not determined to be a table proceeds to ruled line determination processing. The rectangle 1 in FIG. 4 does not satisfy the conditions in the table because the line components are horizontal lines 2 and vertical lines 1 as in FIG. Therefore, the ruled line region extraction unit 13 performs ruled line determination described below on the rectangles that are not determined to be this table.

【0017】s3で検出した外接矩形黒画素密度とs6
で検出する矩形内全黒画素密度の関係から罫線領域であ
るかどうかを判定する。全黒画素密度d2は矩形内のす
べての黒画素を計数してその数を矩形の面積で割ったも
のに100を掛けることにより算出する。図4の矩形1
の全黒画素数は239であるので、全黒画素密度d2
は、[239/((34−5+1)×(44−5+
1))×100=19.9]となる(s6)。s6で検
出した全黒画素密度d2が黒画素密度d1の2倍以上あ
り、線成分抽出部11で検出した水平線が矩形の幅のし
きい値Th7(=4/5)倍のものがあるか、または、
垂直線が矩形の高さのしきい値Th7倍以上のものがあ
れば、その矩形は罫線領域矩形と判定する(s7)。罫
線領域と判定されなかったものは図形領域と判定され、
s9へジャンプする。罫線領域と判定されたものに対し
ては、その矩形情報を削除して、その代わりに線成分抽
出部11で検出した線情報を追加する(s8)。図4の
矩形1は罫線領域候補矩形と判定され、矩形1は削除さ
れ、図5に示すように線1,線2,線3が追加される。
The circumscribed rectangular black pixel density detected in s3 and s6
It is determined whether the area is a ruled line area or not based on the relationship between the all-black pixel densities in the rectangle detected in step 1. The total black pixel density d2 is calculated by counting all black pixels in the rectangle, dividing the number by the area of the rectangle, and multiplying by 100. Rectangle 1 in FIG.
, The total number of black pixels is 239, so the total black pixel density d2
Is [239 / ((34-5 + 1) × (44-5 +)
1)) × 100 = 19.9] (s6). Is the total black pixel density d2 detected in s6 twice or more than the black pixel density d1 and the horizontal line detected by the line component extraction unit 11 is the threshold value Th7 (= 4/5) times the rectangular width? , Or
If there is a vertical line with a height Th7 times the height of the rectangle or more, the rectangle is determined to be a ruled line area rectangle (s7). Those that are not judged to be ruled line areas are judged to be graphic areas,
Jump to s9. For information that is determined to be a ruled line area, the rectangular information is deleted and the line information detected by the line component extraction unit 11 is added instead (s8). The rectangle 1 in FIG. 4 is determined to be a ruled line area candidate rectangle, the rectangle 1 is deleted, and lines 1, 2, and 3 are added as shown in FIG.

【0018】以上の処理によって、図3の画像データに
ついては、文字矩形である図4の矩形2〜矩形10の9
個と罫線である図5の線1、線2、線3の3個の計12
個の矩形情報から領域の抽出を行う。このようにして得
られた領域結果を図8に示す。
As a result of the above processing, the image data of FIG.
5 in total, which are the line and the ruled line, line 1, line 2 and line 3 in total 12
An area is extracted from each piece of rectangular information. The area result thus obtained is shown in FIG.

【0019】次に、抽出された領域が他領域と重ならな
いかどうかをチェックし、重なる場合は領域を統合する
(s9)。
Next, it is checked whether the extracted areas do not overlap with other areas, and if they overlap, the areas are integrated (s9).

【0020】以上のようにして得たそれぞれの領域に対
して認識処理部14では、文字領域の場合は文字切り出
し処理を施した後、文字認識処理を行う。図形領域の場
合は図形をベクトル化し、表領域の場合は、表の構造認
識を行い、各セルに対して文字認識処理を行う。
The recognition processing section 14 performs character recognition processing on the respective areas obtained as described above, after performing character cutting processing in the case of character areas. In the case of the graphic area, the graphic is vectorized, and in the case of the table area, the structure of the table is recognized, and the character recognition processing is performed on each cell.

【0021】このように本実施例によると、領域抽出処
理を行う中央処理装置と、領域抽出プログラムが格納さ
れているROMと、読み取った画像データを格納するR
AMと、中央処理装置で認識された結果を表示する表示
装置を備え、中央処理装置は、画像データ入力部、画像
データ格納部、外接矩形検出部、文字矩形決定部、線成
分抽出部、表矩形決定部、罫線領域抽出部、認識処理部
を含み、画像データから文字矩形か、それ以外かに分類
し、文字矩形以外のものから表か、表でないかを判定
し、表以外の矩形について罫線領域か図形領域かを判定
し、罫線領域と判定された矩形に対し、その矩形情報を
線成分に変えて、従来のように罫線領域を1つの図形領
域として抽出することなく、1本ずつの罫線として抽出
するので、罫線で囲まれている文字領域が正しく抽出で
きる。
As described above, according to this embodiment, the central processing unit for performing the area extracting process, the ROM storing the area extracting program, and the R storing the read image data.
The central processing unit includes an AM and a display device that displays a result recognized by the central processing unit. The central processing unit includes an image data input unit, an image data storage unit, a circumscribed rectangle detection unit, a character rectangle determination unit, a line component extraction unit, and a table. Includes a rectangle determination unit, ruled line area extraction unit, and recognition processing unit, classifies image data into character rectangles or other types, determines whether a table other than a character rectangle is a table or not, and determines a rectangle other than a table. It is determined whether a ruled line area or a graphic area is determined, and for the rectangle determined as a ruled line area, the rectangular information is converted into line components, and the ruled line areas are not extracted as one graphic area as in the conventional case, but one by one. Since it is extracted as the ruled line of, the character area surrounded by the ruled line can be correctly extracted.

【0022】なお、本実施例では、各しきい値は、Th
1=25、Th2=15、Th3=10、Th4=4/
5、Th5=3、Th6=2、Th7=4/5、とした
が、これらの値に限定されるものではない。
In this embodiment, each threshold value is Th
1 = 25, Th2 = 15, Th3 = 10, Th4 = 4 /
5, Th5 = 3, Th6 = 2, Th7 = 4/5, but the values are not limited to these values.

【0023】[0023]

【発明の効果】以上の実施例から明らかなように本発明
によると、複数の線が結合している罫線領域を1つの図
形領域として抽出せずに、1本ずつの罫線として取り出
せるので、罫線で囲まれている文字領域は図形領域に統
合されず、文字領域として正確に抽出することが可能と
なり、精度の高い認識が行える文字認識装置を提供でき
る。
As is apparent from the above-described embodiments, according to the present invention, the ruled line area in which a plurality of lines are combined can be extracted as one ruled line without being extracted as one figure area. The character area surrounded by is not integrated into the graphic area, and can be accurately extracted as a character area, and a character recognition device capable of highly accurate recognition can be provided.

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明の一実施例における領域分割を実行する
装置のハードウェア構成を示すブロック図
FIG. 1 is a block diagram showing a hardware configuration of an apparatus that executes area division according to an embodiment of the present invention.

【図2】同装置における機能構成を示すブロック図FIG. 2 is a block diagram showing a functional configuration of the device.

【図3】本実施例における画像データ例を示す模式図FIG. 3 is a schematic diagram showing an example of image data in this embodiment.

【図4】図3の外接矩形検出結果の模式図FIG. 4 is a schematic diagram of a circumscribed rectangle detection result of FIG.

【図5】図3の線成分抽出結果の模式図5 is a schematic diagram of the line component extraction result of FIG.

【図6】本実施例における画像領域分割処理を示すフロ
ーチャート
FIG. 6 is a flowchart showing an image area dividing process according to the present embodiment.

【図7】画像データの座標の説明図FIG. 7 is an explanatory diagram of coordinates of image data.

【図8】本実施例における領域抽出結果を示す模式図FIG. 8 is a schematic diagram showing a region extraction result in this embodiment.

【図9】画像データの一例を示す模式図FIG. 9 is a schematic diagram showing an example of image data.

【図10】図9に示す画像データの外接矩形図10 is a circumscribed rectangle diagram of the image data shown in FIG.

【符号の説明】[Explanation of symbols]

1 CPU 2 ROM 3 RAM 4 スキャナ 6 表示装置 7 画像データ入力部 8 画像データ格納部 9 外接矩形検出部 10 文字矩形決定部 11 線成分抽出部 12 表矩形決定部 13 罫線領域抽出部 1 CPU 2 ROM 3 RAM 4 Scanner 6 Display Device 7 Image Data Input Unit 8 Image Data Storage Unit 9 Enclosing Rectangle Detection Unit 10 Character Rectangle Determination Unit 11 Line Component Extraction Unit 12 Table Rectangle Determination Unit 13 Ruled Line Area Extraction Unit

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】文字を読み取り、画像データを生成するス
キャナ、読み取られた画像データを格納するRAM、文
字領域抽出プログラムが格納されているROM、文字領
域抽出処理を行う中央処理装置、前記中央処理装置で認
識された結果を表示する表示装置を備え、前記中央処理
装置の中に、前記スキャナで読み取った画像データを前
記RAMに格納するための画像データ入力および格納部
と、前記RAMに格納された画像データにおける黒画素
の外接矩形と黒画素を検出し、外接矩形の座標と黒画素
数を前記RAMに格納する外接矩形検出部と、前記外接
矩形の大きさを基準値と比較し、外接矩形内全画素数に
対して前記外接矩形の黒画素数の占める割合である黒画
素密度を密度基準値と比較した結果から前記外接矩形が
文字矩形か非文字矩形かを判定する文字矩形決定部と、
前記文字矩形決定部において判定された非文字矩形から
線成分の抽出処理をする線成分抽出部と、前記線成分抽
出部において抽出された線成分の水平線分および垂直線
分の長さをそれぞれの長さ基準値と比較した結果から前
記非文字矩形は表であるか否かを決定する表矩形決定部
と、前記表矩形決定部で表でないと決定された矩形につ
いて前記矩形内全画素数に対する前記矩形内全黒画素数
の割合である全黒画素密度を全密度基準値と比較し、そ
して前記線成分抽出部において抽出された水平および垂
直線分長をそれぞれの長さ基準値と比較した結果から前
記矩形を罫線領域矩形と判定する罫線領域抽出部とを含
み、前記中央処理装置は、前記罫線領域抽出部において
罫線領域矩形と判定された矩形を削除し、前記線成分抽
出部にて抽出された線成分に入れ替え、罫線で囲まれて
いる文字領域を抽出して、文字認識処理が行われるよう
に構成した文字認識装置。
1. A scanner for reading characters and generating image data, a RAM for storing the read image data, a ROM for storing a character area extraction program, a central processing unit for performing character area extraction processing, and the central processing. An image data input and storage unit for storing image data read by the scanner in the RAM, and a display unit for displaying the result recognized by the device, and the image data input and storage unit stored in the RAM. The circumscribing rectangle detecting unit that detects the circumscribing rectangle and the black pixel of the black pixel in the image data, and stores the coordinates of the circumscribing rectangle and the number of black pixels in the RAM, and compares the size of the circumscribing rectangle with a reference value to From the result of comparing the black pixel density, which is the ratio of the number of black pixels in the circumscribed rectangle to the total number of pixels in the rectangle, with the density reference value, the circumscribed rectangle is a character rectangle or a non-character A character rectangle determining unit determines the form,
A line component extraction unit for extracting a line component from the non-character rectangle determined by the character rectangle determination unit, and a horizontal line segment length and a vertical line segment length of the line component extracted by the line component extraction unit, respectively. A table rectangle determination unit that determines whether the non-character rectangle is a table from the result of comparison with a length reference value, and a rectangle determined to be not a table by the table rectangle determination unit with respect to the total number of pixels in the rectangle The total black pixel density, which is the ratio of the total number of black pixels in the rectangle, was compared with a total density reference value, and the horizontal and vertical line segment lengths extracted by the line component extraction unit were compared with respective length reference values. The central processing unit deletes the rectangle determined to be the ruled line region rectangle by the ruled line region extraction unit, and the line component extraction unit. Extracted Swapping the line components, and extracts the character region surrounded by ruled lines, configured character recognition apparatus as character recognition processing is performed.
JP4183357A 1992-07-10 1992-07-10 Character recognition device Pending JPH0628520A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP4183357A JPH0628520A (en) 1992-07-10 1992-07-10 Character recognition device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP4183357A JPH0628520A (en) 1992-07-10 1992-07-10 Character recognition device

Publications (1)

Publication Number Publication Date
JPH0628520A true JPH0628520A (en) 1994-02-04

Family

ID=16134340

Family Applications (1)

Application Number Title Priority Date Filing Date
JP4183357A Pending JPH0628520A (en) 1992-07-10 1992-07-10 Character recognition device

Country Status (1)

Country Link
JP (1) JPH0628520A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003216944A (en) * 2002-01-23 2003-07-31 Fujitsu Ltd Device for combining image
JP2012022575A (en) * 2010-07-15 2012-02-02 Canon Inc Image processing apparatus, image processing method, and program

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003216944A (en) * 2002-01-23 2003-07-31 Fujitsu Ltd Device for combining image
JP2012022575A (en) * 2010-07-15 2012-02-02 Canon Inc Image processing apparatus, image processing method, and program

Similar Documents

Publication Publication Date Title
JP2940936B2 (en) Tablespace identification method
JP3278471B2 (en) Area division method
JP2002133426A (en) Ruled line extracting device for extracting ruled line from multiple image
JP3913985B2 (en) Character string extraction apparatus and method based on basic components in document image
JPH0721310A (en) Document recognizing device
JP3411472B2 (en) Pattern extraction device
JPH0628520A (en) Character recognition device
JPH06187489A (en) Character recognizing device
JP3607753B2 (en) Document image region dividing method and apparatus, and column type discrimination method and apparatus
JP3476595B2 (en) Image area division method and image binarization method
Mitchell et al. Document page segmentation based on pattern spread analysis
US7103220B2 (en) Image processing apparatus, method and program, and storage medium
JPH07160810A (en) Character recognizing device
Okun et al. Robust text detection from binarized document images
JP3095470B2 (en) Character recognition device
JPH0721309A (en) Document recognizing device
JP3060248B2 (en) Table recognition device
JP3406942B2 (en) Image processing apparatus and method
JPH0822507A (en) Document recognition device
JP3100825B2 (en) Line recognition method
JP3517077B2 (en) Pattern extraction device and method for extracting pattern area
JPH07168911A (en) Document recognition device
JP3220226B2 (en) Character string direction determination method
JPS603676B2 (en) Intersection extraction method
JPH03126188A (en) Character recognizing device