JPH0855185A - Character recognition device - Google Patents
Character recognition deviceInfo
- Publication number
- JPH0855185A JPH0855185A JP6191408A JP19140894A JPH0855185A JP H0855185 A JPH0855185 A JP H0855185A JP 6191408 A JP6191408 A JP 6191408A JP 19140894 A JP19140894 A JP 19140894A JP H0855185 A JPH0855185 A JP H0855185A
- Authority
- JP
- Japan
- Prior art keywords
- character
- spacing
- line
- area
- unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Character Input (AREA)
- Character Discrimination (AREA)
Abstract
Description
【0001】[0001]
【産業上の利用分野】本発明は印刷文書のデータベース
化や文書の再利用のために、スキャナ等の光学的手段を
用いて文書画像を取り込み、取り込んだ画像データから
文字、図形、表等に属性毎に領域を抽出し、各属性に応
じた認識処理を行う文字認識装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention captures a document image by using an optical means such as a scanner and converts the captured image data into characters, figures, tables, etc. in order to create a database of printed documents and reuse the documents. The present invention relates to a character recognition device that extracts a region for each attribute and performs a recognition process according to each attribute.
【0002】[0002]
【従来の技術】近年、文書等の文字、図形、表等の情報
を含む画像データを文字認識装置で処理することが多く
おこなわれている。この文字認識装置を用いて、画像デ
ータで表されている文字情報をJISコード等のコード
情報(文字認識装置では認識結果と呼んでいる)に変換
して処理されている。2. Description of the Related Art In recent years, image data including information such as characters, figures and tables of documents has been often processed by a character recognition device. Using this character recognition device, the character information represented by the image data is converted into code information such as JIS code (which is called a recognition result in the character recognition device) and processed.
【0003】以下に従来の文字認識装置について説明す
る。図7は従来の文字認識装置の構成を示す構成ブロッ
ク図である。図7において、1は文字データを含む画像
データを入力する画像入力部、2は画像入力部1に入力
された画像データを格納する画像格納部、3は文字の図
形特徴を格納する認識辞書部、4は画像格納部1に格納
された画像データから黒画素領域に外接する外接矩形を
検出する外接矩形検出手段、5は外接矩形検出手段4に
より検出された外接矩形から文字候補矩形を検出する文
字候補矩形検出手段、6は文字候補矩形検出手段5によ
り検出された文字候補矩形間の文字間隔及び行間隔を検
出する文字間隔,行間隔検出手段、7は文字間隔,行間
隔検出手段6で検出した文字間隔が予め定められている
統合距離より短い場合は同一の行領域であると判定する
行領域判定手段、8は行領域判定手段7により判定され
た行領域から文字領域を検出する文字領域検出手段、9
は文字領域検出手段8により検出された文字領域と認識
辞書部3に格納された図形特徴とを照合することにより
認識結果を得る文字認識手段、10は外接矩形検出手段
4と、文字候補矩形検出手段5と、文字間隔,行間隔検
出手段6と、行領域判定手段7と、文字領域検出手段8
と、文字認識手段9とを有する制御部、11は外接矩形
検出手段4により検出された外接矩形と、文字候補矩形
検出手段5により検出された文字候補矩形と、文字間
隔,行間隔検出手段6により検出された文字間隔及び行
間隔と、行領域判定手段7により判定された行領域と、
文字領域検出手段8により検出された文字領域と、文字
認識手段9において照合された認識結果と、を格納する
認識情報格納部、12は文字認識手段9において照合さ
れた認識結果を出力する認識結果出力部である。A conventional character recognition device will be described below. FIG. 7 is a configuration block diagram showing a configuration of a conventional character recognition device. In FIG. 7, 1 is an image input unit for inputting image data including character data, 2 is an image storage unit for storing the image data input to the image input unit 1, and 3 is a recognition dictionary unit for storing graphic features of characters. Reference numeral 4 denotes a circumscribing rectangle detecting means for detecting a circumscribing rectangle circumscribing the black pixel area from the image data stored in the image storage unit 5. Reference numeral 5 denotes a character candidate rectangle from the circumscribing rectangle detected by the circumscribing rectangle detecting means 4. A character candidate rectangle detecting means, 6 is a character interval and a line interval detecting means for detecting a character interval and a line interval between the character candidate rectangles detected by the character candidate rectangle detecting means 5, and 7 is a character interval and a line interval detecting means 6. When the detected character spacing is shorter than a predetermined integrated distance, the line area determination means determines that the line areas are the same, and 8 detects the character area from the line area determined by the line area determination means 7. Character region detection means, 9
Is a character recognizing means for obtaining a recognition result by collating the character area detected by the character area detecting means 8 with the graphic feature stored in the recognition dictionary unit 3, and 10 is a circumscribing rectangle detecting means 4 and a character candidate rectangle detecting means. Means 5, character spacing / line spacing detection means 6, line area determination means 7, and character area detection means 8
And a character recognizing means 9, and a control section 11 having a circumscribing rectangle detected by the circumscribing rectangle detecting means 4, a character candidate rectangle detected by the character candidate rectangle detecting means 5, a character spacing, and a line spacing detecting means 6. The character spacing and the line spacing detected by the line area and the line area determined by the line area determination means 7,
A recognition information storage unit for storing the character area detected by the character area detection unit 8 and the recognition result collated by the character recognition unit 9, and 12 is a recognition result for outputting the recognition result collated by the character recognition unit 9. It is an output part.
【0004】以上のように構成された文字認識装置につ
いて、以下その動作を説明する。まず、文字データを含
む画像データをスキャナ等の画像入力部1に入力し画像
格納部2に格納する。次に、外接矩形検出手段4により
画像格納部2に格納された二値データの黒画素が連結し
ているかたまりに外接する矩形(以下外接矩形と呼ぶ)
の座標を検出し、認識情報格納部11に格納する。次
に、文字候補矩形検出手段5において外接矩形検出手段
4により検出した外接矩形が比較的大きな矩形を図形候
補矩形、それ以外を文字候補矩形と検出し、図形候補矩
形と文字候補矩形の座標を認識情報格納部11に格納す
る。次に、文字間隔,行間隔検出手段6により文字候補
矩形の水平方向と垂直方向の間隔を測定し、文字間隔と
行間隔を検出し、文字間隔と行間隔を認識情報格納部1
1に格納する。次に、行領域判定手段7により文字間
隔,行間隔検出手段6により検出された文字間隔の最頻
値に固定値である統合距離をかけた値を基準にして、こ
の値より文字間隔が小さなものは同一行領域と判定し、
行領域を認識情報格納部11に格納する。次に、文字領
域検出手段8により行領域判定手段7により判定された
行領域から文字領域を検出し、認識情報格納部11に格
納する。次に、文字認識手段9により文字領域検出手段
8で検出された文字領域と認識辞書部3に格納されてい
る図形特徴とを照合することにより認識結果を得、この
認識結果を認識情報格納部11に格納する。次に、認識
結果出力部12により文字認識手段9において照合され
た認識結果を出力する。図8は従来の文字間隔が大きい
場合の行領域の判定を示す図であり、図9は従来の文字
間隔が小さい場合の行領域の判定を示す図である。The operation of the character recognizing device having the above structure will be described below. First, image data including character data is input to the image input unit 1 such as a scanner and stored in the image storage unit 2. Next, a rectangle circumscribing a block in which black pixels of the binary data stored in the image storage unit 2 are connected by the circumscribing rectangle detector 4 (hereinafter referred to as a circumscribing rectangle).
The coordinates are detected and stored in the recognition information storage unit 11. Next, the character candidate rectangle detecting means 5 detects a rectangle having a relatively large circumscribing rectangle detected by the circumscribing rectangle detecting means 4 as a figure candidate rectangle, and detects the other rectangles as a character candidate rectangle, and determines the coordinates of the figure candidate rectangle and the character candidate rectangle. It is stored in the recognition information storage unit 11. Next, the character spacing and line spacing detection means 6 measures the horizontal and vertical spacings of the character candidate rectangles to detect the character spacing and line spacing, and recognizes the character spacing and line spacing.
Store in 1. Next, the character spacing is smaller than this value based on the value obtained by multiplying the mode value of the character spacing detected by the line area determination means 7 and the character spacing detected by the line spacing detection means 6 by the integrated distance which is a fixed value. Objects are judged to be in the same line area,
The row area is stored in the recognition information storage unit 11. Next, the character area detection unit 8 detects a character area from the line area determined by the line area determination unit 7, and stores the character area in the recognition information storage unit 11. Next, the character recognition unit 9 collates the character region detected by the character region detection unit 8 with the graphic feature stored in the recognition dictionary unit 3 to obtain a recognition result, and the recognition result is stored in the recognition information storage unit. It is stored in 11. Next, the recognition result output unit 12 outputs the recognition result collated by the character recognition means 9. FIG. 8 is a diagram showing a conventional determination of a line area when a character spacing is large, and FIG. 9 is a diagram showing a conventional determination of a line area when a character spacing is small.
【0005】[0005]
【発明が解決しようとする課題】しかしながら上記従来
の文字認識装置は、図8に示すように、文字間隔が長い
文章があると同一の行領域とは判定しないという問題点
を有していた。また図9に示すように、文字間隔が短い
場合は別の行領域と判定しなければならない場合でも、
同一の行領域だと判定してしまうという問題点を有して
いた。However, the above-described conventional character recognition device has a problem that, as shown in FIG. 8, if there is a sentence having a long character interval, it is not determined to be the same line area. In addition, as shown in FIG. 9, even when it is necessary to determine a different line area when the character spacing is short,
There is a problem that it is determined that they are in the same line area.
【0006】また行領域の判定を誤認すると、文字認識
された文書を編集する際に正確な文章ではないので、編
集が正常に行われないという問題点を有していた。Further, if the line area is erroneously determined, it is not an accurate sentence when a character-recognized document is edited, so that there is a problem that the editing is not normally performed.
【0007】本発明は上記従来の問題点を解決するもの
で、行領域の判定の正確で迅速な、かつ文字認識の信頼
性の高い文字認識装置の提供を目的とする。The present invention solves the above-mentioned conventional problems, and an object of the present invention is to provide a character recognition device which is accurate and quick in determining a line area and has high reliability in character recognition.
【0008】[0008]
【課題を解決するための手段】この目的を達成するため
に本発明の請求項1に記載の文字認識装置は、文字デー
タを含む画像データを入力する画像入力部と、画像入力
部に入力された画像データを格納する画像格納部と、文
字の図形特徴を格納する認識辞書部と、画像格納部に格
納された画像データから黒画素領域に外接する外接矩形
を検出する外接矩形検出手段と、外接矩形検出手段によ
り検出された外接矩形から文字候補矩形を検出する文字
候補矩形検出手段と、文字候補矩形検出手段により検出
された文字候補矩形間の文字間隔及び行間隔を検出する
文字間隔,行間隔検出手段と、文字間隔,行間隔検出手
段で検出した文字間隔が統合距離より短い場合は同一の
行領域であると判定する行領域判定手段と、行領域判定
手段により判定された行領域から文字領域を検出する文
字領域検出手段と、文字領域検出手段により検出された
文字領域と認識辞書部に格納された図形特徴とを照合す
ることにより認識結果を得る文字認識手段と、外接矩形
検出手段により検出された外接矩形、文字候補矩形検出
手段により検出された文字候補矩形、文字間隔,行間隔
検出手段により検出された文字間隔及び行間隔、行領域
判定手段により判定された行領域、文字領域検出手段に
より検出された文字領域、及び文字認識手段において照
合された認識結果を格納する認識情報格納部と、文字認
識手段において照合された認識結果を出力する認識結果
出力部と、を備えた文字認識装置であって、行領域を判
定する基準値を設定する文字統合強度設定部と、文字間
隔,行間隔検出手段により検出された文字間隔の最頻値
に文字統合強度設定部で設定された基準値を掛け合わせ
た統合距離を算定する統合距離算定手段と、文字統合強
度設定部で設定された基準値、及び、統合距離算定手段
により算定された統合距離を格納している認識情報格納
部と、を備えている構成を有している。In order to achieve this object, a character recognition apparatus according to claim 1 of the present invention is an image input section for inputting image data including character data, and an image input section. An image storage unit for storing the image data, a recognition dictionary unit for storing the graphic features of the character, a circumscribing rectangle detecting means for detecting a circumscribing rectangle circumscribing the black pixel region from the image data stored in the image storage unit, A character candidate rectangle detecting means for detecting a character candidate rectangle from the circumscribing rectangle detected by the circumscribing rectangle detecting means, and a character interval and a line for detecting a character interval and a line interval between the character candidate rectangles detected by the character candidate rectangle detecting means. If the character spacing and the character spacing detected by the line spacing detecting means are shorter than the integrated distance, the line area determining means determines the same line area, and the line area determining means determines the same. A character area detecting means for detecting a character area from the line area, and a character recognizing means for obtaining a recognition result by collating the character area detected by the character area detecting means with the graphic feature stored in the recognition dictionary section, The circumscribing rectangle detected by the circumscribing rectangle detecting means, the character candidate rectangle detected by the character candidate rectangle detecting means, the character spacing, the character spacing and line spacing detected by the line spacing detecting means, and the line determined by the line area determining means An area, a character area detected by the character area detecting unit, and a recognition information storage unit that stores the recognition result collated by the character recognition unit; and a recognition result output unit that outputs the recognition result collated by the character recognition unit, A character recognition device including: a character integrated strength setting unit that sets a reference value for determining a line area; An integrated distance calculating means for calculating the integrated distance by multiplying the mode value of the character spacing by the reference value set by the character integrated strength setting unit, and the reference value set by the character integrated strength setting unit, and the integrated distance calculation And a recognition information storage unit that stores the integrated distance calculated by the means.
【0009】[0009]
【作用】この構成によって、文字統合強度設定部が行領
域を判定する基準値を設定し、統合距離算定手段が文字
間隔,行間隔検出手段により検出された文字間隔の最頻
値に文字統合強度設定部で設定された基準値を掛け合わ
せた統合距離を算定し、認識情報格納部に文字統合強度
設定部で設定された基準値と、統合距離算定手段により
算定された統合距離を格納できるので、使用者が画像デ
ータの状態を観察して行領域を判定する統合距離を決定
できる。従って、行領域の判定を正確にすることができ
るので、信頼性の高い文字認識ができる。With this configuration, the character integrated strength setting unit sets the reference value for determining the line area, and the integrated distance calculating means sets the character integrated strength to the mode value of the character spacing and the character spacing detected by the line spacing detecting means. Since the integrated distance calculated by multiplying the reference value set by the setting unit can be calculated, and the reference value set by the character integrated strength setting unit and the integrated distance calculated by the integrated distance calculating unit can be stored in the recognition information storage unit. The user can determine the integrated distance for deciding the row area by observing the state of the image data. Therefore, since the line area can be accurately determined, highly reliable character recognition can be performed.
【0010】[0010]
【実施例】以下本発明の一実施例について、図面を参照
しながら説明する。An embodiment of the present invention will be described below with reference to the drawings.
【0011】図1は本発明の一実施例における文字認識
装置の構成を示す機能ブロック図であり、図2は本発明
の一実施例における文字認識装置の構成を示す装置ブロ
ック図である。FIG. 1 is a functional block diagram showing the configuration of a character recognition device in one embodiment of the present invention, and FIG. 2 is a device block diagram showing the configuration of a character recognition device in one embodiment of the present invention.
【0012】図1において、1は画像入力部、2は画像
格納部、3は認識辞書部、4は外接矩形検出手段、5は
文字候補矩形検出手段、6は文字間隔,行間隔検出手
段、7は行領域判定手段、8は文字領域検出手段、9は
文字認識手段、12は認識結果出力部である。これらは
従来例と同様のものであり、同一の符号を付けて説明を
省略する。In FIG. 1, 1 is an image input unit, 2 is an image storage unit, 3 is a recognition dictionary unit, 4 is a circumscribing rectangle detecting unit, 5 is a character candidate rectangle detecting unit, 6 is a character interval and line interval detecting unit, Reference numeral 7 is a line area determination unit, 8 is a character area detection unit, 9 is a character recognition unit, and 12 is a recognition result output unit. These are the same as in the conventional example, and the same reference numerals are given and the description thereof is omitted.
【0013】13は行領域を判定する基準値を設定する
文字統合強度設定部、14は文字間隔,行間隔検出手段
6により検出された文字間隔の最頻値に文字統合強度設
定部13で設定された基準値を掛け合わせた統合距離を
算定する統合距離算定手段、15は外接矩形検出手段4
と、文字候補矩形検出手段5と、文字間隔,行間隔検出
手段6と、統合距離算定手段14と、行領域判定手段7
と、文字領域検出手段8と、文字認識手段9と、を有す
る制御部、16は文字統合強度設定部13に設定された
基準値と、外接矩形検出手段4により検出された外接矩
形と、文字候補矩形検出手段5により検出された文字候
補矩形と、文字間隔,行間隔検出手段6により検出され
た文字間隔及び行間隔と、統合距離算定手段14により
算定された統合距離と、行領域判定手段7により判定さ
れた行領域と、文字領域検出手段8により検出された文
字領域と、文字認識手段9において照合された認識結果
と、を格納する認識情報格納部である。Reference numeral 13 is a character integrated strength setting unit for setting a reference value for judging a line area, and 14 is set by the character integrated strength setting unit 13 for the character interval and the mode value of the character interval detected by the line interval detecting means 6. Integrated distance calculating means for calculating an integrated distance by multiplying the determined reference value, 15 is a circumscribing rectangle detecting means 4
, Character candidate rectangle detecting means 5, character spacing / line spacing detecting means 6, integrated distance calculating means 14, and line area determining means 7.
And a character area detecting unit 8 and a character recognizing unit 9, and 16 is a reference value set in the character integrated strength setting unit 13, a circumscribing rectangle detected by the circumscribing rectangle detecting unit 4, and a character. The character candidate rectangle detected by the candidate rectangle detecting means 5, the character spacing, the character spacing and line spacing detected by the line spacing detecting means 6, the integrated distance calculated by the integrated distance calculating means 14, and the line area determining means. 7 is a recognition information storage unit that stores the line area determined by 7, the character area detected by the character area detection unit 8, and the recognition result collated by the character recognition unit 9.
【0014】図2において、17は画像入力部1である
スキャナ、18は認識結果出力部12であるCRT、1
9は外部から中央処理装置21に対して指令を与えるた
めのキーボード、20は外部から中央処理装置21に対
して指令を与えるためのマウス、21はスキャナ17と
CRT18とキーボード19とマウス20とランダム・
アクセス・メモリ25とを制御する中央処理装置(CP
Uと略す)、22は画像データを格納する画像格納部、
23は文字の図形特徴を格納する認識辞書部、24は制
御部15内の各手段により得られた認識情報を格納する
認識情報格納部、25は画像格納部22と認識辞書部2
3と認識情報格納部24とを有するランダム・アクセス
・メモリ、26はスキャナ17とCRT18とキーボー
ド19とマウス20と中央処理装置21とランダム・ア
クセス・メモリ25とを結ぶシステム・バスである。In FIG. 2, 17 is a scanner which is the image input unit 1, 18 is a CRT which is a recognition result output unit 12, 1
Reference numeral 9 is a keyboard for giving a command to the central processing unit 21 from the outside, 20 is a mouse for giving a command to the central processing unit 21 from the outside, 21 is a scanner 17, a CRT 18, a keyboard 19, a mouse 20 and a random number.・
Central processing unit (CP which controls access memory 25)
(Abbreviated as U), 22 is an image storage unit for storing image data,
Reference numeral 23 is a recognition dictionary unit that stores graphic features of characters, 24 is a recognition information storage unit that stores recognition information obtained by each means in the control unit 15, and 25 is an image storage unit 22 and a recognition dictionary unit 2.
A random access memory having 3 and a recognition information storage unit 24 is a system bus connecting the scanner 17, the CRT 18, the keyboard 19, the mouse 20, the central processing unit 21, and the random access memory 25.
【0015】以上のように構成された文字認識装置につ
いて、図面を用いて以下その動作について説明する。図
3は本発明の一実施例における文字認識装置の動作を示
すフローチャートであり、図4は本発明の一実施例にお
ける文字統合強度設定のCRT画面を示す図である。始
めに、図4に示すように、文字統合強度設定部13にお
いて、文字統合強度に強い(以下CHAR_COMB1
と称す。ここでCHAR_COMB1=2)、普通(以
下CHAR_COMB2と称す。ここでCHAR_CO
MB=12/5)、弱い(以下CHAR_COMB3と
称す。ここでCHAR_COMB3=3)の内の1つを
決定する(S1)。次に、文字データを含んだ画像デー
タを画像入力部1で読み込んだ後に画像格納部2に格納
する(S2)。次に、画像格納部2に格納した画像デー
タを縮小する(S3)。次に、画像格納部2に格納され
た画像データの黒画素領域を外接する外接矩形を抽出
し、外接矩形の左上の座標(x1 、y1 )と右下の座標
(x2 、y2 )を外接矩形の情報として認識情報格納部
16に格納する(S4)。次に、外接矩形の縦横比が閾
値FD_RATIO(ここではFD_RATIO=3
0)以上ならば外接矩形を罫線候補領域とする。また罫
線候補領域間で座標が交差する部分を検索し、交差する
罫線候補領域を統合して表候補領域とする。外接矩形の
長辺が閾値CHAR_MAX(ここでCHAR_MAX
=100)以上であり、かつ外接矩形の黒画素密度が閾
値PER_DIAG_MIN(ここでPER_DIAG
_MIN=15)以下ならば外接矩形を図形候補領域と
する。外接矩形の長辺が閾値CHAR_MAX以上であ
り、かつ外接矩形の黒画素密度が閾値PER_DIAG
_MAX(ここでPER_DIAG_MAX=80)以
上ならば外接矩形を画像候補領域とする。罫線候補領
域、表候補領域、図形候補領域、及び画像候補領域と認
定される以外の外接矩形を文字候補矩形とし、各領域を
認識情報格納部16に格納する(S5)。次に、文字候
補矩形の水平方向の間隔(ここでは文字間隔と呼ぶ)と
文字候補矩形の垂直方向の間隔(ここでは行間隔と呼
ぶ)を検出し、認識情報格納部16に格納する(S
6)。次に、文字間隔の最頻値にCHAR_COMB
1、CHAR_COMB2、またはCHAR_COMB
3の内設定した1を掛け合わせて統合距離を算定し、認
識情報格納部16に格納する(S7)。次に、文字間隔
が統合距離より短い文字候補矩形を同一行領域であると
判定し、認識情報格納部16に格納する(S8)。次
に、行領域から行単位の文字領域を統合して文字領域を
検出し、認識情報格納部16に格納する(S9)。次
に、文字領域検出手段8により検出された文字領域と認
識辞書部3に格納している図形特徴とを照合して認識結
果を得、認識情報格納部16に格納する(S10)。次
に、文字認識手段7により照合された認識結果を出力し
(S11)、終了する。The operation of the character recognition device configured as described above will be described below with reference to the drawings. FIG. 3 is a flowchart showing the operation of the character recognition device in one embodiment of the present invention, and FIG. 4 is a diagram showing a CRT screen for character integrated strength setting in one embodiment of the present invention. First, as shown in FIG. 4, in the character integrated strength setting unit 13, the character integrated strength is strong (hereinafter, CHAR_COMB1).
Called. Here, CHAR_COMB1 = 2, ordinary (hereinafter referred to as CHAR_COMB2).
MB = 12/5), weak (hereinafter referred to as CHAR_COMB3. Here, one of CHAR_COMB3 = 3) is determined (S1). Next, the image data including the character data is read by the image input unit 1 and then stored in the image storage unit 2 (S2). Next, the image data stored in the image storage unit 2 is reduced (S3). Next, a circumscribing rectangle that circumscribes the black pixel area of the image data stored in the image storage unit 2 is extracted, and the upper left coordinates (x 1 , y 1 ) and the lower right coordinates (x 2 , y 2 ) of the circumscribing rectangle are extracted. ) Is stored as circumscribed rectangle information in the recognition information storage unit 16 (S4). Next, the aspect ratio of the circumscribed rectangle is a threshold value FD_RATIO (here, FD_RATIO = 3).
If 0 or more, the circumscribed rectangle is set as the ruled line candidate area. Further, a portion where the coordinates intersect between the ruled line candidate areas is searched, and the intersected ruled line candidate areas are integrated into a table candidate area. The long side of the circumscribed rectangle is the threshold CHAR_MAX (here, CHAR_MAX
= 100) or more and the black pixel density of the circumscribed rectangle is a threshold value PER_DIAG_MIN (here, PER_DIAG).
If _MIN = 15) or less, the circumscribed rectangle is set as the figure candidate area. The long side of the circumscribed rectangle is greater than or equal to the threshold CHAR_MAX, and the black pixel density of the circumscribed rectangle is the threshold PER_DIAG.
If _MAX (here, PER_DIAG_MAX = 80) or more, the circumscribed rectangle is set as the image candidate area. The circumscribing rectangles other than the ruled line candidate area, the table candidate area, the figure candidate area, and the image candidate area are set as the character candidate rectangles, and each area is stored in the recognition information storage unit 16 (S5). Next, the horizontal spacing between character candidate rectangles (herein referred to as character spacing) and the vertical spacing between character candidate rectangles (herein referred to as line spacing) are detected and stored in the recognition information storage unit 16 (S).
6). Next, CHAR_COMB is set to the mode of the character spacing.
1, CHAR_COMB2, or CHAR_COMB
The set distance of 3 is multiplied by 1 to calculate the integrated distance, which is stored in the recognition information storage unit 16 (S7). Next, the character candidate rectangle whose character interval is shorter than the integrated distance is determined to be in the same line area, and is stored in the recognition information storage unit 16 (S8). Next, the character areas on a line-by-line basis are integrated from the line areas to detect the character areas, which are stored in the recognition information storage unit 16 (S9). Next, the character area detected by the character area detecting unit 8 is collated with the graphic feature stored in the recognition dictionary section 3 to obtain a recognition result, which is stored in the recognition information storage section 16 (S10). Next, the recognition result collated by the character recognition means 7 is output (S11), and the process ends.
【0016】以下に外接矩形の例を示す。図5は本発明
の一実施例における画像データの例を示す図であり、図
6は本発明の一実施例における図5で示した画像データ
の外接矩形を示す図である。図5に示す画像データの例
から外接矩形検出手段4により検出した外接矩形を図6
に示した。図6において、K1〜K9までが外接矩形で
ある。An example of a circumscribed rectangle is shown below. 5 is a diagram showing an example of image data in one embodiment of the present invention, and FIG. 6 is a diagram showing a circumscribed rectangle of the image data shown in FIG. 5 in one embodiment of the present invention. The circumscribing rectangle detected by the circumscribing rectangle detecting means 4 from the example of the image data shown in FIG. 5 is shown in FIG.
It was shown to. In FIG. 6, K1 to K9 are circumscribed rectangles.
【0017】以上のように本実施例によれば、行領域を
判定する基準値を設定する文字統合強度設定部13と、
文字間隔,行間隔検出手段6により検出された文字間隔
の最頻値に文字統合強度設定部13で設定された基準値
を掛け合わせた統合距離を算定する統合距離算定手段1
4と、文字統合強度設定部13で設定された基準値と、
統合距離算定手段14により算定された統合距離を格納
できる認識情報格納部16と、を設けることにより、使
用者が画像データの状態を観察して、行領域を判定する
基準値を決定できるので、行領域の判定が正確にでき
る。従って文字データの編集を行う際も、文字データを
正確な文章で認識することができる。また文字の認識を
行う際も、正確で信頼性の高い文字の認識を行うことが
できる。As described above, according to the present embodiment, the character integrated strength setting unit 13 for setting the reference value for determining the line area,
Integrated distance calculating means 1 for calculating the integrated distance by multiplying the mode of the character spacing detected by the character spacing / line spacing detecting means 6 by the reference value set by the character integrated strength setting unit 13.
4 and the reference value set by the character integrated strength setting unit 13,
By providing the recognition information storage unit 16 capable of storing the integrated distance calculated by the integrated distance calculating unit 14, the user can observe the state of the image data and determine the reference value for determining the line area. The line area can be accurately determined. Therefore, even when the character data is edited, the character data can be recognized as an accurate sentence. Further, when recognizing a character, it is possible to recognize an accurate and highly reliable character.
【0018】[0018]
【発明の効果】以上のように本発明は、行領域を判定す
る基準値を設定する文字統合強度設定部と、文字間隔,
行間隔検出手段により検出された文字間隔の最頻値に文
字統合強度設定部で設定された基準値を掛け合わせた統
合距離を算定する統合距離算定手段と、文字統合強度設
定部で設定された基準値と、統合距離算定手段により算
定された統合距離を格納できる認識情報格納部と、を設
けることにより、使用者が画像データの状態を観察し
て、行領域を判定する基準値を決定できるので、行領域
の判定が正確にできる。従って文字データの編集を行う
際も、文字データを正確な文章で認識することができ
る。また文字の認識を行う際も、正確で信頼性の高い文
字の認識を行うことができる優れた文字認識装置を達成
できる。As described above, according to the present invention, a character integrated strength setting unit for setting a reference value for determining a line area, a character interval,
The integrated distance calculating means for calculating the integrated distance by multiplying the mode value of the character spacing detected by the line spacing detecting means by the reference value set by the character integrated strength setting section, and the integrated distance setting section set by the character integrated strength setting section By providing the reference value and the recognition information storage section capable of storing the integrated distance calculated by the integrated distance calculating means, the user can determine the reference value for judging the line area by observing the state of the image data. Therefore, the line area can be accurately determined. Therefore, even when the character data is edited, the character data can be recognized as an accurate sentence. Further, it is possible to achieve an excellent character recognition device capable of performing accurate and highly reliable character recognition when recognizing characters.
【図1】本発明の一実施例における文字認識装置の構成
を示す機能ブロック図FIG. 1 is a functional block diagram showing a configuration of a character recognition device according to an embodiment of the present invention.
【図2】本発明の一実施例における文字認識装置の構成
を示す装置ブロック図FIG. 2 is a device block diagram showing a configuration of a character recognition device according to an embodiment of the present invention.
【図3】本発明の一実施例における文字認識装置の動作
を示すフローチャートFIG. 3 is a flowchart showing the operation of the character recognition device in the embodiment of the present invention.
【図4】本発明の一実施例における文字統合強度設定の
CRT画面を示す図FIG. 4 is a diagram showing a CRT screen for character integrated strength setting according to an embodiment of the present invention.
【図5】本発明の一実施例における画像データの例を示
す図FIG. 5 is a diagram showing an example of image data according to an embodiment of the present invention.
【図6】本発明の一実施例における図5で示した画像デ
ータの外接矩形を示す図FIG. 6 is a diagram showing a circumscribed rectangle of the image data shown in FIG. 5 according to an embodiment of the present invention.
【図7】従来の文字認識装置の構成を示す構成ブロック
図FIG. 7 is a configuration block diagram showing a configuration of a conventional character recognition device.
【図8】従来の文字間隔が大きい場合の行領域の判定を
示す図FIG. 8 is a diagram showing determination of a line area when a conventional character spacing is large.
【図9】従来の文字間隔が小さい場合の行領域の判定を
示す図FIG. 9 is a diagram showing determination of a line area when the character spacing is small in the related art.
1 画像入力部 2 画像格納部 3 認識辞書部 4 外接矩形検出手段 5 文字候補矩形検出手段 6 文字間隔,行間隔検出手段 7 行領域判定手段 8 文字領域検出手段 9 文字認識手段 10 制御部 11 認識情報格納部 12 認識結果出力部 13 文字統合強度設定部 14 統合距離算定手段 15 制御部 16 認識情報格納部 17 スキャナ 18 CRT 19 キーボード 20 マウス 21 中央処理装置 22 画像格納部 23 認識辞書部 24 認識情報格納部 25 ランダム・アクセス・メモリ 26 システム・バス DESCRIPTION OF SYMBOLS 1 image input section 2 image storage section 3 recognition dictionary section 4 circumscribing rectangle detection means 5 character candidate rectangle detection means 6 character spacing, line spacing detection means 7 line area determination means 8 character area detection means 9 character recognition means 10 control section 11 recognition Information storage unit 12 Recognition result output unit 13 Character integrated strength setting unit 14 Integrated distance calculating unit 15 Control unit 16 Recognition information storage unit 17 Scanner 18 CRT 19 Keyboard 20 Mouse 21 Central processing unit 22 Image storage unit 23 Recognition dictionary unit 24 Recognition information Storage 25 Random access memory 26 System bus
Claims (1)
像入力部と、前記画像入力部に入力された前記画像デー
タを格納する画像格納部と、文字の図形特徴を格納する
認識辞書部と、前記画像格納部に格納された前記画像デ
ータから黒画素領域に外接する外接矩形を検出する外接
矩形検出手段と、前記外接矩形検出手段により検出され
た前記外接矩形から文字候補矩形を検出する文字候補矩
形検出手段と、前記文字候補矩形検出手段により検出さ
れた前記文字候補矩形間の文字間隔及び行間隔を検出す
る文字間隔,行間隔検出手段と、前記文字間隔,行間隔
検出手段で検出した前記文字間隔が統合距離より短い場
合は同一の行領域であると判定する行領域判定手段と、
前記行領域判定手段により判定された前記行領域から文
字領域を検出する文字領域検出手段と、前記文字領域検
出手段により検出された前記文字領域と前記認識辞書部
に格納された前記図形特徴とを照合することにより認識
結果を得る文字認識手段と、前記外接矩形検出手段によ
り検出された前記外接矩形、前記文字候補矩形検出手段
により検出された前記文字候補矩形、前記文字間隔,行
間隔検出手段により検出された前記文字間隔及び前記行
間隔、前記行領域判定手段により判定された前記行領
域、前記文字領域検出手段により検出された前記文字領
域、及び前記文字認識手段において照合された前記認識
結果を格納する認識情報格納部と、前記文字認識手段に
おいて照合された前記認識結果を出力する認識結果出力
部と、を備えた文字認識装置であって、前記行領域を判
定する基準値を設定する文字統合強度設定部と、前記文
字間隔,行間隔検出手段により検出された前記文字間隔
の最頻値に前記文字統合強度設定部で設定された前記基
準値を掛け合わせた統合距離を算定する統合距離算定手
段と、前記文字統合強度設定部で設定された前記基準
値、及び、前記統合距離算定手段により算定された前記
統合距離を格納している前記認識情報格納部と、を備え
ていることを特徴とする文字認識装置。1. An image input section for inputting image data including character data, an image storage section for storing the image data input to the image input section, and a recognition dictionary section for storing graphic features of characters. A circumscribing rectangle detecting means for detecting a circumscribing rectangle circumscribing a black pixel area from the image data stored in the image storage section, and a character candidate for detecting a character candidate rectangle from the circumscribing rectangle detected by the circumscribing rectangle detecting means. Rectangle detecting means, character spacing for detecting the character spacing and line spacing between the character candidate rectangles detected by the character candidate rectangle detecting means, line spacing detecting means, and the character spacing, the line spacing detecting means A line area determination unit that determines that the line areas are the same when the character spacing is shorter than the integrated distance,
A character area detecting unit for detecting a character area from the line area determined by the line area determining unit; the character area detected by the character area detecting unit; and the graphic feature stored in the recognition dictionary unit. A character recognizing means for obtaining a recognition result by collation, the circumscribing rectangle detected by the circumscribing rectangle detecting means, the character candidate rectangle detected by the character candidate rectangle detecting means, the character spacing, line spacing detecting means The detected character spacing and line spacing, the line area determined by the line area determination means, the character area detected by the character area detection means, and the recognition result collated by the character recognition means. A character provided with a recognition information storage unit for storing and a recognition result output unit for outputting the recognition result collated by the character recognition means. And a character integrated strength setting unit that sets a reference value for determining the line area, and the character integrated strength setting unit that sets the mode value of the character spacing detected by the character spacing and line spacing detection unit to the mode. Integrated distance calculating means for calculating the integrated distance by multiplying the reference value set in, the reference value set in the character integrated strength setting unit, and the integrated distance calculated by the integrated distance calculating means And a recognition information storage section that stores the character recognition apparatus.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP6191408A JPH0855185A (en) | 1994-08-15 | 1994-08-15 | Character recognition device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP6191408A JPH0855185A (en) | 1994-08-15 | 1994-08-15 | Character recognition device |
Publications (1)
Publication Number | Publication Date |
---|---|
JPH0855185A true JPH0855185A (en) | 1996-02-27 |
Family
ID=16274115
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP6191408A Pending JPH0855185A (en) | 1994-08-15 | 1994-08-15 | Character recognition device |
Country Status (1)
Country | Link |
---|---|
JP (1) | JPH0855185A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7149352B2 (en) | 2000-06-28 | 2006-12-12 | Minolta Co., Ltd. | Image processing device, program product and system |
JP2009251872A (en) * | 2008-04-04 | 2009-10-29 | Fuji Xerox Co Ltd | Information processing device and information processing program |
JP2014164320A (en) * | 2013-02-21 | 2014-09-08 | Mitsubishi Electric Corp | Character recognition device and character recognition method |
-
1994
- 1994-08-15 JP JP6191408A patent/JPH0855185A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7149352B2 (en) | 2000-06-28 | 2006-12-12 | Minolta Co., Ltd. | Image processing device, program product and system |
JP2009251872A (en) * | 2008-04-04 | 2009-10-29 | Fuji Xerox Co Ltd | Information processing device and information processing program |
JP2014164320A (en) * | 2013-02-21 | 2014-09-08 | Mitsubishi Electric Corp | Character recognition device and character recognition method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7519226B2 (en) | Form search apparatus and method | |
JPH08235341A (en) | Method and device for document filing | |
JPH0721310A (en) | Document recognizing device | |
JP3232991B2 (en) | Character reading method and address reading method | |
JPH0855185A (en) | Character recognition device | |
JP3276555B2 (en) | Format recognition device and character reader | |
JPH07168911A (en) | Document recognition device | |
JPH07160810A (en) | Character recognizing device | |
JP3074691B2 (en) | Character recognition device | |
JP3060248B2 (en) | Table recognition device | |
JP3276554B2 (en) | Format recognition device and character reader | |
JPH0728935A (en) | Document image processor | |
JP3052438B2 (en) | Table recognition device | |
JP3000480B2 (en) | Character area break detection method | |
JP3160458B2 (en) | Character reading device and character reading method | |
JPH10214308A (en) | Character discrimination method | |
JP2972443B2 (en) | Character recognition device | |
JPH0721309A (en) | Document recognizing device | |
JPH08272909A (en) | Method and device for character recognition | |
JPH06150062A (en) | Character recognizing device | |
JPH04163681A (en) | Information processor and character recognizing device | |
JPH0628520A (en) | Character recognition device | |
JPH05298494A (en) | Method and device for recognizing character | |
JP2002074269A (en) | Method for recognizing character | |
JPH0696277A (en) | Alphabet recognizing device |