JPH05205104A - System for coupling blurred character - Google Patents

System for coupling blurred character

Info

Publication number
JPH05205104A
JPH05205104A JP4035699A JP3569992A JPH05205104A JP H05205104 A JPH05205104 A JP H05205104A JP 4035699 A JP4035699 A JP 4035699A JP 3569992 A JP3569992 A JP 3569992A JP H05205104 A JPH05205104 A JP H05205104A
Authority
JP
Japan
Prior art keywords
character
area
blurred
areas
combined
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP4035699A
Other languages
Japanese (ja)
Other versions
JP2821303B2 (en
Inventor
Hiroyuki Kami
博行 上
Yoichi Kobayashi
陽一 小林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NAGANO NIPPON DENKI SOFTWARE KK
NEC Corp
NEC Software Nagano Ltd
Original Assignee
NAGANO NIPPON DENKI SOFTWARE KK
NEC Corp
NEC Software Nagano Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NAGANO NIPPON DENKI SOFTWARE KK, NEC Corp, NEC Software Nagano Ltd filed Critical NAGANO NIPPON DENKI SOFTWARE KK
Priority to JP4035699A priority Critical patent/JP2821303B2/en
Publication of JPH05205104A publication Critical patent/JPH05205104A/en
Application granted granted Critical
Publication of JP2821303B2 publication Critical patent/JP2821303B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Landscapes

  • Character Input (AREA)

Abstract

PURPOSE:To exactly couple character areas related to blurred characters even though it is a blurred character area written in the normal typeface or the one written in the inclined typeface. CONSTITUTION:An area information check means 13 checks the area information including the size of the character area, the dimension and coordinate of the contact area, the distance between contact areas, and coupling character width for a character line picture to be processed. A blurred character discriminating means 14 judges whether or not the character in the character area is one part of the blurred character based on the area information. A character area coupling means 15 generates a coupling area by coupling the adjacent character areas where internal characters are judged to be one part of the blurred character based on the recognition result of the blurred character discriminating means 14.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は掠れ文字結合方式に関
し、特に英文活字認識システムにおける掠れ文字結合方
式に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a blurred character combining method, and more particularly to a blurred character combining method in an English type recognition system.

【0002】[0002]

【従来の技術】従来、この種の掠れ文字結合方式では、
斜めに傾いた文字である斜体文字の存在が考慮されてお
らず、直立の文字である通常体文字だけが存在するもの
として処理が行われていた。
2. Description of the Related Art Conventionally, in this kind of blurring character combination method,
The presence of italic characters that are diagonally inclined characters is not taken into consideration, and processing is performed assuming that only normal characters that are upright characters exist.

【0003】図2(a)〜(f)を参照して、従来の掠
れ文字結合方式について説明する。
A conventional blurred character combining method will be described with reference to FIGS.

【0004】ここで、英文活字認識システムにおいて掠
れ文字結合方式の前段に位置する文字切出し方式によっ
て、図2(a)に示す文字行画像(1行分の文字群の画
像)に関する文字領域と推定文字ピッチPとが図2
(b)および(c)に示すように決定されているとす
る。
Here, in the English type recognition system, it is estimated that the character region is the character region related to the character line image (the image of the character group for one line) shown in FIG. The character pitch P is shown in Fig. 2.
It is assumed that it is determined as shown in (b) and (c).

【0005】まず、図2(c)中の推定文字ピッチPに
基づいて、同図中の推定文字幅Wおよび推定文字間空白
Sが求められる。推定文字ピッチP,推定文字幅Wおよ
び推定文字間空白Sの関係は、次式のようになる。同式
において、C1 とC2 とは定数であり、実際の文字行画
像のいくつかが調査されて(あるいは、所定の実験が行
われて)求められた値である。 W=C1 ×P(ただし、0<C1 <1) S=C2 ×(P−W)(ただし、1>C2
First, based on the estimated character pitch P in FIG. 2C, the estimated character width W and the estimated inter-character space S in FIG. 2 are obtained. The relationship between the estimated character pitch P, the estimated character width W, and the estimated inter-character space S is as follows. In the equation, C 1 and C 2 are constants, and are values obtained by examining some of the actual character line images (or by performing a predetermined experiment). W = C 1 × P (where 0 <C 1 <1) S = C 2 × (P−W) (where 1> C 2 )

【0006】一方、図2(b)に示す文字領域に基づい
て、図2(d)に示す文字間空白と図2(e)に示す結
合文字幅とが求められる。文字間空白とはある文字領域
の右辺から次の文字領域の左辺までの長さをいい、結合
文字幅とはある文字領域の左辺から次の文字領域の右辺
までの長さをいう。
On the other hand, the inter-character space shown in FIG. 2 (d) and the combined character width shown in FIG. 2 (e) are obtained based on the character area shown in FIG. 2 (b). The inter-character space is the length from the right side of one character area to the left side of the next character area, and the combined character width is the length from the left side of one character area to the right side of the next character area.

【0007】最後に、推定文字幅W,推定文字間空白
S,文字間空白および結合文字幅に基づいて、結合すべ
き隣接する文字領域(結合領域)が求められる。すなわ
ち、結合文字幅が推定文字幅Wより小さく文字間空白が
推定文字間空白Sより小さい隣接する2つの文字領域が
探索され、探索された文字領域が結合された領域が結合
領域として求められる(図2(f)参照)。
Finally, based on the estimated character width W, the estimated inter-character space S, the inter-character space and the combined character width, adjacent character areas (combined areas) to be combined are obtained. That is, two adjacent character areas whose combined character width is smaller than the estimated character width W and whose inter-character space is smaller than the estimated inter-character space S are searched for, and the area where the searched character areas are combined is obtained as the combined area ( 2 (f)).

【0008】上述の掠れ文字結合方式においては、「結
合文字幅が推定文字幅Wより小さく文字間空白が推定文
字幅Sより小さい」という条件を満たす隣接する文字領
域が結合領域として認識されている。ここで、斜体文字
は文字が傾いているために、一般的に、斜体文字に係る
隣接する文字領域間の文字間空白は小さくなり、結合文
字幅は小さくなる。したがって、斜体文字に係る文字領
域については、掠れ文字に係る文字領域でないにもかか
わらず上述の条件が満たされる場合が多くなる。すなわ
ち、従来の掠れ文字結合方式では、処理対象の文字が全
て通常体文字であることを前提として(斜体文字の存在
が考慮されずに)、上述の条件だけで文字領域の結合を
行うべきか否かが判定されていた。
In the above-mentioned blurred character combination method, adjacent character areas satisfying the condition that "the combined character width is smaller than the estimated character width W and the inter-character space is smaller than the estimated character width S" are recognized as the combined area. .. Here, since the italic characters are inclined, the inter-character space between adjacent character areas related to the italic characters is generally small, and the combined character width is small. Therefore, with respect to the character area related to italic characters, the above-mentioned condition is often satisfied even though the character area is not related to blurred characters. That is, in the conventional blurred character combination method, should all the characters to be processed be ordinary characters (without considering the presence of italic characters), should the character areas be combined only under the above conditions? It was determined whether or not.

【0009】[0009]

【発明が解決しようとする課題】上述した従来の掠れ文
字結合方式では、斜体文字の存在が考慮されていないの
で、斜体文字に係る文字領域が誤って結合されてしまう
という欠点があった。
In the above-mentioned conventional blurred character combination method, since the existence of italic characters is not taken into consideration, there is a drawback that the character areas related to italic characters are accidentally combined.

【0010】本発明の目的は、上述の点に鑑み、通常体
文字の掠れ文字に係る文字領域であっても斜体文字の掠
れ文字に係る文字領域であっても共通して正確に結合す
ることが可能な掠れ文字結合方式を提供することにあ
る。
In view of the above points, an object of the present invention is to accurately and commonly connect a character area related to a blurred character of a normal character and a character area related to a blurred character of an italic character. It is to provide a blurred character combination method that enables

【0011】[0011]

【課題を解決するための手段】本発明の掠れ文字結合方
式は、処理対象の文字行画像に関して文字領域の大きさ
と接触領域の大きさおよび座標と接触領域間距離と結合
文字幅とを含む領域情報を求める領域情報調査手段と、
この領域情報調査手段により求められた領域情報に基づ
いて文字領域中の文字が掠れ文字の一部であるか否かを
判定する掠れ文字判定手段と、この掠れ文字判定手段に
よる判定結果に基づいて内部の文字が掠れ文字の一部で
あると判定された隣接する文字領域を結合して結合領域
を生成する文字領域結合手段とを有する。
The blurred character combination method of the present invention is an area including a character area size, a contact area size and coordinates, a contact area distance, and a combined character width for a character line image to be processed. Area information survey means for information,
Based on the judgment result by the blurred character judgment means for judging whether or not the character in the character area is a part of the blurred character based on the area information obtained by the area information investigation means, and the judgment result by this blurred character judgment means. And a character area combining unit that combines the adjacent character areas in which the internal character is determined to be a part of the blurred character to generate a combined area.

【0012】[0012]

【作用】本発明の掠れ文字結合方式では、領域情報調査
手段が処理対象の文字行画像に関して文字領域の大きさ
と接触領域の大きさおよび座標と接触領域間距離と結合
文字幅とを含む領域情報を求め、掠れ文字判定手段が領
域情報調査手段により求められた領域情報に基づいて文
字領域中の文字が掠れ文字の一部であるか否かを判定
し、文字領域結合手段が掠れ文字判定手段による判定結
果に基づいて内部の文字が掠れ文字の一部であると判定
された隣接する文字領域を結合して結合領域を生成す
る。
In the blurred character combination method of the present invention, the area information examining means includes area information including the size of the character area, the size of the contact area, the coordinates, the distance between the contact areas, and the combined character width with respect to the character line image to be processed. The blurred character determining means determines whether or not the character in the character area is a part of the blurred character based on the area information obtained by the area information examining means, and the character area combining means determines the blurred character determining means. Based on the result of the determination by, the adjacent character areas, which are determined to have the internal character as a part of the blurred character, are combined to generate a combined area.

【0013】[0013]

【実施例】次に、本発明について図面を参照して詳細に
説明する。
The present invention will be described in detail with reference to the drawings.

【0014】図1は、本発明の掠れ文字結合方式の一実
施例の構成を示すブロック図である。本実施例の掠れ文
字結合方式は、英文活字認識システムにおいて、既存の
方式である文字切出し方式11と、既存の方式である文
字認識方式16とに接続されている。また、本実施例の
掠れ文字結合方式は、文字行画像記憶手段12と、領域
情報調査手段13と、掠れ文字判定手段14と、文字領
域結合手段15とを含んで構成されている。
FIG. 1 is a block diagram showing the configuration of an embodiment of the blurred character combination method of the present invention. The blurring character combination method of the present embodiment is connected to the existing character extraction method 11 and the existing character recognition method 16 in the English type recognition system. Further, the blurred character combination method of the present embodiment is configured to include a character line image storage means 12, an area information investigation means 13, a blurred character determination means 14, and a character area combination means 15.

【0015】図3(a)〜(c)は、本実施例の掠れ文
字結合方式の具体的な動作を説明するための図である。
FIGS. 3A to 3C are views for explaining the specific operation of the blurred character combination method of this embodiment.

【0016】図4(a)および(b)は、領域情報調査
手段13による処理を説明するための図である。
FIGS. 4A and 4B are views for explaining the processing by the area information examining means 13.

【0017】図5は、掠れ文字判定手段14の処理を示
す流れ図である。この処理は、文字領域抽出ステップ5
1と、文字行画像処理終了判定ステップ52と、結合文
字幅大小判定ステップ53と、接触領域大小判定ステッ
プ54と、接触領域遠近判定ステップ55と、縦横比範
囲内判定ステップ56と、文字高さ・行高さ比大小判定
ステップ57と、結合領域認識ステップ58とからな
る。
FIG. 5 is a flow chart showing the processing of the blurred character judging means 14. This processing is performed in the character area extraction step 5
1, a character line image processing end determination step 52, a combined character width determination step 53, a contact area size determination step 54, a contact area perspective determination step 55, an aspect ratio range determination step 56, and a character height. A line height ratio size determination step 57 and a combined area recognition step 58.

【0018】図6(a)〜(e)は、掠れ文字判定手段
14の処理を具体的に説明するための図である。
FIGS. 6A to 6E are views for specifically explaining the processing of the blurred character determining means 14.

【0019】次に、このように構成された本実施例の掠
れ文字結合方式の動作について説明する。なお、ここで
は、図3および図4に示すような「falling u
nder」という斜体文字を有する文字行画像に対する
処理が行われる場合の動作について説明する。
Next, the operation of the blurred character combination system of the present embodiment having the above-described structure will be described. In addition, here, as shown in FIG. 3 and FIG.
The operation in the case where the processing is performed on the character line image having the italicized characters "nder" will be described.

【0020】文字行画像記憶手段12は、英文活字認識
システムにおける処理対象の文書画像中の任意の1つの
文字行画像を記憶している。ここでは、文字行画像記憶
手段12は、図3(a)に示す文字行画像を記憶してい
るものとする。
The character line image storage means 12 stores an arbitrary character line image in the document image to be processed in the English type recognition system. Here, it is assumed that the character line image storage means 12 stores the character line image shown in FIG.

【0021】文字切出し方式11は、この文字行画像に
ついて、文字領域,行高さおよび推定文字ピッチを決定
する。図3(b)は、文字切出し方式11により求めら
れた文字領域の集合を示す図である。
The character segmentation method 11 determines the character area, line height and estimated character pitch for this character line image. FIG. 3B is a diagram showing a set of character areas obtained by the character cutout method 11.

【0022】領域情報調査手段13は、文字行画像記憶
手段12に記憶されている文字行画像と文字切出し方式
11により決定された文字領域とを参照して、領域情報
を求める。領域情報とは、文字領域の大きさ(幅および
高さ)と、文字領域中の接触領域の大きさおよび座標
と、隣接する2つの文字領域における接触領域間距離
と、隣接する2つの文字領域における結合文字幅とを含
む情報をいう。
The area information checking means 13 obtains area information by referring to the character line image stored in the character line image storage means 12 and the character area determined by the character cutting method 11. The area information includes the size (width and height) of the character area, the size and coordinates of the contact area in the character area, the distance between the contact areas in two adjacent character areas, and the two adjacent character areas. And the combined character width in.

【0023】領域情報調査手段13による領域情報を求
める処理について、図4(a)および(b)を参照して
詳細に説明する。
The process of obtaining the area information by the area information checking means 13 will be described in detail with reference to FIGS. 4 (a) and 4 (b).

【0024】図4(a)は、文字行画像記憶手段12に
記憶されている文字行画像と文字切出し方式11により
求められた文字領域とを重ねて示す図である。
FIG. 4A is a view showing the character line image stored in the character line image storage means 12 and the character area obtained by the character cutout method 11 in an overlapping manner.

【0025】図4(b)は、図4(a)中の一点鎖線で
囲まれた領域を拡大して示す図である。
FIG. 4 (b) is an enlarged view of the area surrounded by the alternate long and short dash line in FIG. 4 (a).

【0026】図4(b)において、文字領域R1 の幅と
はwであり、文字領域R1 の高さとはhである。これら
の値は、文字領域R1 中の文字が掠れ文字の一部である
か否かを判定する際の基本的な判定基準となる値であ
る。
In FIG. 4B, the width of the character area R 1 is w, and the height of the character area R 1 is h. These values are values that are basic judgment criteria when judging whether or not the character in the character area R 1 is a part of the blurred character.

【0027】接触領域とは、文字領域の右辺と左辺とに
おいてその文字領域中の文字の黒画素が接する領域をい
い(接触領域は1つの文字領域の1つの辺において複数
個存在しうる)、図4(b)中の点rsi から点rei
までの領域(直線の範囲)をいう。なお、i=0,…,
2N−1(Nは文字行画像中の文字領域の数)であり、
図4(b)においては0≦i≦3の範囲を示している。
The contact area means an area where black pixels of a character in the character area are in contact with each other on the right side and the left side of the character area (a plurality of contact areas can exist on one side of one character area). From point rs i to point re i in FIG.
The area up to (line range). Note that i = 0, ...,
2N-1 (N is the number of character areas in the character line image),
In FIG. 4B, the range of 0 ≦ i ≦ 3 is shown.

【0028】点rsi の座標は、原点を文字行画像の左
上の点として(rsi x,rsi y)で表されるものと
する。また、点rei の座標は、原点を文字行画像の左
上の点として(rei x,rei y)で表されるものと
する。これらの座標の関係は、rsi x=rei xであ
り、rsi y≦rei yである。
The coordinates of the point rs i are represented by (rs i x, rs i y) with the origin as the upper left point of the character line image. The coordinates of the point re i are represented by (re i x, re i y) with the origin as the upper left point of the character line image. The relationship of these coordinates is rs i x = re i x, and rs i y ≦ re i y.

【0029】接触領域の大きさとは、点rsi と点re
i との間隔rwi のことをいい、次式により求められる
値をいう。 rwi =rei y−rsi y(ただし、rwi ≧0)
The size of the contact area means the points rs i and re
It means the interval rw i from i, and the value obtained by the following equation. rw i = re i y−rs i y (where rw i ≧ 0)

【0030】接触領域の座標とは、点rsi および点r
i の座標である(rsi x,rsi y)および(re
i x,rei y)のことをいう。
The coordinates of the contact area are the point rs i and the point r
The coordinates of e i are (rs i x, rs i y) and (re
i x, re i y).

【0031】ある文字領域Rj の右辺の接触領域と次の
文字領域Rj+1 の左辺の接触領域との距離である接触領
域間距離dj は、次式により求められる(ここで、j=
1,…,N−1であり、k=2j−1である)。なお、
1つの文字領域の1つの辺について複数個の接触領域が
存在する場合には、隣接する文字領域に関する接触領域
の組合せの全てについて上述の距離が求められ、それら
の距離の中の最小値が最終的な接触領域間距離として決
定される。 dj =1/2×〔{(rsk+1 x−rsk x)2 +(r
k+1 y−rsk y)21/2 +{(rek+1 x−re
k x)2 +(rek+1 y−rek y)2 1/2
The inter-contact area distance d j, which is the distance between the contact area on the right side of a certain character area R j and the contact area on the left side of the next character area R j + 1 , is obtained by the following equation (where j =
, ..., N-1, and k = 2j-1). In addition,
When there are a plurality of contact areas on one side of one character area, the above distances are obtained for all combinations of contact areas related to adjacent character areas, and the minimum value of those distances is the final value. Is determined as the distance between the contact areas. d j = 1/2 × [{(rs k + 1 x−rs k x) 2 + (r
s k + 1 y-rs k y) 2} 1/2 + {(re k + 1 x-re
k x) 2 + (re k + 1 y-re k y) 2 } 1/2 ]

【0032】結合文字幅とは、ある文字領域の左辺から
次の文字領域の右辺までの長さをいい、図4(b)中の
cw(文字領域R1 の左辺から文字領域R2 の右辺まで
の長さ)で示される値をいう。
The combined character width means the length from the left side of one character area to the right side of the next character area, and cw (the left side of the character area R 1 to the right side of the character area R 2 in FIG. 4B). Up to)).

【0033】領域情報調査手段13は、以上のような文
字領域の大きさと接触領域の大きさおよび座標と接触領
域間距離と結合文字幅とを含む領域情報を求めて出力す
る。
The area information checking means 13 obtains and outputs area information including the size of the character area, the size and the coordinates of the contact area, the distance between the contact areas, and the combined character width as described above.

【0034】掠れ文字判定手段14は、領域情報調査手
段13によって求められた領域情報と文字切出し方式1
1により求められた行高さおよび推定文字ピッチとを用
いて、結合領域を認識するための判定処理を行う。
Blurred character determining means 14 determines the area information obtained by the area information examining means 13 and the character segmentation method 1.
By using the line height and the estimated character pitch obtained by 1, the determination process for recognizing the combined area is performed.

【0035】すなわち、掠れ文字判定手段14は、領域
情報と文字行画像の行高さおよび推定文字ピッチとを参
照して、各文字領域中の文字が掠れ文字の一部であるか
否かを判定し(ある文字領域中の文字が掠れ文字の一部
であればその文字領域は結合領域に属することにな
る)、その判定結果を出力する。
That is, the blurred character determining means 14 refers to the area information, the line height of the character line image, and the estimated character pitch to determine whether the character in each character area is a part of the blurred character. It is determined (if a character in a certain character area is a part of a blurred character, the character area belongs to the combined area), and the determination result is output.

【0036】掠れ文字判定手段14による処理につい
て、図5および図6を参照して詳細に説明する。掠れ文
字判定手段14は、次のような処理を行う。
The processing by the blurred character determining means 14 will be described in detail with reference to FIGS. 5 and 6. Blurred character determination means 14 performs the following processing.

【0037】まず最初に、文字行画像記憶手段12内の
文字行画像の中から隣接する2つの文字領域を抽出する
(先頭の文字領域から順次抽出する)(ステップ5
1)。
First, two adjacent character regions are extracted from the character line image in the character line image storage means 12 (sequentially from the first character region) (step 5).
1).

【0038】1行分の文字行画像の処理が終了したか否
かを判定する(ステップ52)。
It is judged whether or not the processing of the character line image for one line is completed (step 52).

【0039】この判定で「1行分の文字行画像の処理が
終了している」場合には、処理を終了する。
If the result of this determination is that the processing of the character line image for one line has been completed, the processing ends.

【0040】ステップ52の判定で「1行分の文字行画
像の処理が終了していない」場合には、判定対象の2つ
の文字領域に関する結合文字幅が推定文字ピッチより小
さいか否かを判定する(ステップ53)。このような判
定を行うのは、判定対象の2つの文字領域中の文字が掠
れ文字の一部であればそれらの文字領域に関する結合文
字幅が推定文字ピッチよりも小さい可能性が大きいから
である。
If "the processing of the character line image for one line is not completed" in the judgment of step 52, it is judged whether or not the combined character width of the two character areas to be judged is smaller than the estimated character pitch. (Step 53). Such a determination is made because if the characters in the two character regions to be determined are a part of the blurred character, the combined character width for those character regions is likely to be smaller than the estimated character pitch. ..

【0041】ステップ53の判定で「結合文字幅が推定
文字ピッチより小さくない」場合には、判定対象の文字
領域中の文字が掠れ文字の一部である可能性が小さいの
で、ステップ51に処理を戻して次の文字領域に関する
処理(それまで処理していた2つの文字領域の右側の文
字領域とその次(右)の文字領域とに関する処理)に進
む。
If it is determined in step 53 that the combined character width is not smaller than the estimated character pitch, it is unlikely that the character in the character area to be determined is a part of the blurred character, so the process proceeds to step 51. To the processing for the next character area (processing for the character area on the right side of the two character areas that have been processed up to that point and the processing for the next (right) character area).

【0042】ステップ53の判定で「結合文字幅が推定
文字ピッチより小さい」場合には、判定対象の2つの文
字領域に関する隣接する辺の接触領域が小さいか否かを
判定する(判定対象の2つの文字領域に関する隣接する
辺における複数の接触領域の大きさの合計値が判定の対
象の値となる)(ステップ54)。このような判定を行
うのは、ゴシック体の文字等の場合には接触領域が近く
ても(後述するステップ55の判定参照)掠れ文字でな
い場合があり、このような場合には一般的に判定対象の
文字領域における隣接する辺の接触領域が大きくなるか
らである(図6(a)参照)。
When the combined character width is smaller than the estimated character pitch in the determination in step 53, it is determined whether or not the contact areas of adjacent sides of the two character areas to be determined are small (2 to be determined). The total value of the sizes of the plurality of contact areas on the adjacent sides of one character area is the determination target value) (step 54). In such a case, in the case of a Gothic character or the like, even if the contact area is close (see the judgment of step 55 described later), it may not be a blurred character. In such a case, the judgment is generally made. This is because the contact area of adjacent sides in the target character area becomes large (see FIG. 6A).

【0043】ステップ54の判定で「接触領域が小さく
ない」場合には、判定対象の文字領域中の文字が掠れ文
字の一部である可能性が小さいので、ステップ51に処
理を戻して次の文字領域に関する処理に進む。
If the contact area is not small in the judgment in step 54, it is unlikely that the character in the character area to be judged is a part of the blurred character, so the processing is returned to step 51 and the next step is executed. Proceed to the processing related to the character area.

【0044】ステップ54の判定で「接触領域が小さ
い」場合には、判定対象の2つの文字領域の接触領域が
近いか否かを判定する(ステップ55)。このような判
定を行うのは、ステップ53および54の判定が「Ye
s」であっても(結合文字幅が小さい等の状態であって
も)、図6(b)に示すような斜体文字同士に係る2つ
の文字領域(隣接する別個の斜体文字同士に係る2つの
文字領域の間の接触領域は一般的に遠くなる)を結合す
ることは妥当ではないからである(結合すべきでない文
字領域を誤って結合してしまうのを防ぐ必要があるから
である)。
If the contact area is small in the determination in step 54, it is determined whether or not the contact areas of the two character areas to be determined are close to each other (step 55). This kind of determination is made when the determinations in steps 53 and 54 are “Yes”.
s ”(even if the combined character width is small, etc.), there are two character regions associated with italic characters as shown in FIG. 6B (2 associated with different adjacent italic characters). Because it is not reasonable to combine (the contact area between two character areas is generally distant) (because it is necessary to prevent accidentally combining the character areas that should not be combined). ..

【0045】ステップ55の判定で「接触領域が近くな
い」場合には、判定対象の文字領域中の文字が掠れ文字
の一部である可能性が小さいので、ステップ51に処理
を戻して次の文字領域に関する処理に進む。
If it is judged in step 55 that the contact area is not near, it is unlikely that the character in the character area to be judged is a part of the blurred character, so the processing is returned to step 51 and the next step is executed. Proceed to the processing related to the character area.

【0046】ステップ55の判定で「接触領域が近い」
場合には、判定対象の文字領域中の文字の縦横比が設定
値の範囲内である(「縦の長さ」/「横の長さ」が設定
値よりも大きい)か否かを判定する(2つの文字領域中
の文字の両方について、縦横比が設定値よりも大きいか
否かを判定する)(ステップ56)。このような判定を
行うのは、ステップ53〜55の判定が「Yes」であ
っても、図6(c)に示す「−」のような文字同士に係
る2つの文字領域を結合することは妥当ではないからで
ある。
In the judgment of step 55, "contact area is close"
In this case, it is determined whether or not the aspect ratio of the characters in the determination target character area is within the range of the set value (“vertical length” / “horizontal length” is larger than the set value). (It is determined whether or not the aspect ratio is larger than the set value for both the characters in the two character areas) (step 56). Even if the determinations in steps 53 to 55 are "Yes", it is possible to combine two character regions related to characters such as "-" shown in FIG. It is not valid.

【0047】ステップ56の判定で「文字の縦横比が設
定値の範囲内でない」場合(図6(c)に示すような場
合)には、判定対象の文字領域中の文字が掠れ文字の一
部である可能性が小さいので、ステップ51に処理を戻
して次の文字領域に関する処理に進む。
If it is determined in step 56 that the character aspect ratio is not within the set value range (as shown in FIG. 6C), the character in the character area to be determined is a blurred character. Since it is less likely to be a copy, the process is returned to step 51 and the process for the next character area is proceeded to.

【0048】ステップ56の判定で「文字の縦横比の範
囲が設定値の範囲内である」場合には、判定対象の文字
領域中の文字の文字高さと文字行画像の行高さとの比が
小さい(「文字高さ」/「行高さ」が一定の設定値より
も小さい)か否かを判定する(ステップ57)。このよ
うな判定を行うのは、ステップ53〜56の判定が「Y
es」であっても、図6(d)に示す「I」のような文
字同士に係る2つの文字領域を結合することは妥当では
ないからである。
If it is determined in step 56 that the range of the character aspect ratio is within the set value range, the ratio of the character height of the character in the character area to be determined to the line height of the character line image is determined. It is determined whether or not it is small (“character height” / “line height” is smaller than a certain set value) (step 57). The determination in steps 53 to 56 is “Y”.
This is because even with "es", it is not appropriate to combine two character areas related to characters such as "I" shown in FIG. 6D.

【0049】ステップ57の判定で「文字高さと行高さ
との比が小さくない」場合には、判定対象の文字領域中
の文字が掠れ文字の一部である可能性が小さいので、ス
テップ51に処理を戻して次の文字領域に関する処理に
進む。
If it is determined in step 57 that "the ratio of the character height to the line height is not small", it is unlikely that the character in the character area to be determined is a part of the blurred character. The processing is returned to the processing for the next character area.

【0050】ステップ57の判定で「文字高さと行高さ
との比が小さい」場合には、ステップ53〜57の判定
条件(判定対象の文字領域中の文字が掠れ文字の一部で
あると判定する際の判定条件)が全て満たされているの
で、判定対象の2つの文字領域を結合領域として認識す
る(ステップ58)。例えば、図6(e)に示す「u」
や「n」という掠れ文字に係る2つの文字領域を結合領
域として認識する。図3(b)に示す例では、矢線で示
す隣接する2つの文字領域が結合領域として認識され
る。
If it is determined in step 57 that "the ratio of the character height to the line height is small", the determination conditions of steps 53 to 57 (the character in the character area to be determined is determined to be a part of the blurred character). Since all the judgment conditions for performing the judgment are satisfied, the two character areas to be judged are recognized as a combined area (step 58). For example, “u” shown in FIG.
Two character areas related to a blurred character such as or "n" are recognized as a combined area. In the example shown in FIG. 3B, two adjacent character areas indicated by arrows are recognized as a combined area.

【0051】なお、以上の判定処理において、「小さい
/大きい」,「近い/遠い」等の判定の基準となる具体
的な数値は、実際の文字行画像の調査等に基づいて決定
される。
In the above determination process, specific numerical values that serve as criteria for determination of "small / large", "near / far", etc. are determined on the basis of the actual character line image examination and the like.

【0052】文字領域結合手段15は、掠れ文字判定手
段14から出力された判定結果に基づいて、内部の文字
が掠れ文字の一部であると判定された隣接する2つの文
字領域を結合領域として結合し(図3(c)参照)、結
合後の文字領域の集合(結合領域も1つの文字領域と認
識されている集合)を文字認識方式16に渡す。
The character area combining means 15 determines, as the combined area, two adjacent character areas in which it is determined that the internal character is a part of the blurred character based on the determination result output from the blurred character determining means 14. The characters are combined (see FIG. 3C), and a set of combined character regions (a set in which the combined region is also recognized as one character region) is passed to the character recognition method 16.

【0053】文字認識方式16は、その文字領域の集合
に係る文字を認識する処理を行う(この文字認識におい
ては、文字行画像記憶手段12内の文字行画像が参照さ
れる)。
The character recognition system 16 performs a process of recognizing a character related to the set of the character areas (the character line image in the character line image storage means 12 is referred to in this character recognition).

【0054】[0054]

【発明の効果】以上説明したように本発明は、斜体文字
の存在を考慮して掠れ文字に係る文字領域の結合を行う
ことにより、通常体文字の掠れ文字に係る文字領域と斜
体文字の掠れ文字に係る文字領域とを共通に対象とし
て、掠れ文字に係る文字領域の正確な結合を行うことが
可能になるという効果がある。なお、このような効果に
より、文字認識方式によって行われる文字認識の精度を
向上させることが可能になる。
As described above, according to the present invention, by combining the character areas related to the blurred character in consideration of the existence of the italic character, the character area related to the blurred character in the normal character and the italic character are blurred. There is an effect that it is possible to accurately combine the character areas related to the blurred character, with the character area related to the character being the common target. Note that, due to such an effect, it is possible to improve the accuracy of character recognition performed by the character recognition method.

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明の一実施例の構成を示すブロック図であ
る。
FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention.

【図2】従来の掠れ文字結合方式を説明するための図で
ある。
FIG. 2 is a diagram illustrating a conventional blurred character combination method.

【図3】図1に示す掠れ文字結合方式の動作を説明する
ための図である。
FIG. 3 is a diagram for explaining the operation of the blurred character combination method shown in FIG.

【図4】図1中の領域情報調査手段による領域情報を求
める処理を説明するための図である。
FIG. 4 is a diagram for explaining a process for obtaining area information by an area information examining unit in FIG.

【図5】図1中の掠れ文字判定手段による処理を示す流
れ図である。
5 is a flow chart showing a process by a blurred character determining unit in FIG. 1. FIG.

【図6】図5に示す掠れ文字判定手段による処理を具体
的に説明するための図である。
FIG. 6 is a diagram for specifically explaining the processing by the blurred character determining unit shown in FIG.

【符号の説明】[Explanation of symbols]

11 文字切出し方式 12 文字行画像記憶手段 13 領域情報調査手段 14 掠れ文字判定手段 15 文字領域結合手段 16 文字認識方式 11 character cutout method 12 character line image storage means 13 area information investigation means 14 blurred character determination means 15 character area combining means 16 character recognition method

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】 処理対象の文字行画像に関して文字領域
の大きさと接触領域の大きさおよび座標と接触領域間距
離と結合文字幅とを含む領域情報を求める領域情報調査
手段と、 この領域情報調査手段により求められた領域情報に基づ
いて文字領域中の文字が掠れ文字の一部であるか否かを
判定する掠れ文字判定手段と、 この掠れ文字判定手段による判定結果に基づいて内部の
文字が掠れ文字の一部であると判定された隣接する文字
領域を結合して結合領域を生成する文字領域結合手段と
を有することを特徴とする掠れ文字結合方式。
1. An area information investigation means for obtaining area information including a size of a character area, a size and coordinate of a contact area, a distance between contact areas and a combined character width for a character line image to be processed, and the area information investigation. Blurred character determination means for determining whether a character in the character area is a part of blurred character based on the area information obtained by the means, and an internal character based on the determination result by the blurred character determination means. A blurred character combining method, comprising: character region combining means for combining adjacent character regions determined to be part of a blurred character to generate a combined region.
JP4035699A 1992-01-27 1992-01-27 Sharp character combination method Expired - Fee Related JP2821303B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP4035699A JP2821303B2 (en) 1992-01-27 1992-01-27 Sharp character combination method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP4035699A JP2821303B2 (en) 1992-01-27 1992-01-27 Sharp character combination method

Publications (2)

Publication Number Publication Date
JPH05205104A true JPH05205104A (en) 1993-08-13
JP2821303B2 JP2821303B2 (en) 1998-11-05

Family

ID=12449133

Family Applications (1)

Application Number Title Priority Date Filing Date
JP4035699A Expired - Fee Related JP2821303B2 (en) 1992-01-27 1992-01-27 Sharp character combination method

Country Status (1)

Country Link
JP (1) JP2821303B2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1324521C (en) * 2003-03-15 2007-07-04 三星电子株式会社 Preprocessing equipment and method for distinguishing image character
US7766026B2 (en) 2006-10-27 2010-08-03 Boey Kum F Faucet control system and method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1324521C (en) * 2003-03-15 2007-07-04 三星电子株式会社 Preprocessing equipment and method for distinguishing image character
US7766026B2 (en) 2006-10-27 2010-08-03 Boey Kum F Faucet control system and method

Also Published As

Publication number Publication date
JP2821303B2 (en) 1998-11-05

Similar Documents

Publication Publication Date Title
JP3308032B2 (en) Skew correction method, skew angle detection method, skew correction device, and skew angle detection device
US6577763B2 (en) Document image recognition apparatus and computer-readable storage medium storing document image recognition program
EP0843275A2 (en) Pattern extraction apparatus and method for extracting patterns
CN108596168B (en) Method, apparatus and medium for recognizing characters in image
JPH0713995A (en) Automatic determination device of feature of text
JPH07234915A (en) Image recognizing device
JP4395188B2 (en) Document image recognition apparatus and storage medium for document image recognition program
CN109035256A (en) User interface image cutting method, device, server and storage medium
JPH05205104A (en) System for coupling blurred character
JP3303246B2 (en) Image processing device
JP3090342B2 (en) Character string direction discriminator
JP2007295210A (en) Image processing apparatus, image processing method, image processing program, and recording medium recording the program
JPH0728935A (en) Document image processor
JPH07230525A (en) Method for recognizing ruled line and method for processing table
JPH0822507A (en) Document recognition device
US10878271B2 (en) Systems and methods for separating ligature characters in digitized document images
JP2902097B2 (en) Information processing device and character recognition device
JP3104355B2 (en) Feature extraction device
JPH07141465A (en) Method for detecting inclination of document image
JPS6343788B2 (en)
JPH11232463A (en) Picture recognizing device and method therefor
JP3226355B2 (en) Recognition result evaluation method
JP2000222577A (en) Method and device for ruled line processing, and recording medium
JP2022051198A (en) Ocr processor, ocr processing method, and program
JP3919390B2 (en) Character recognition device

Legal Events

Date Code Title Description
S531 Written request for registration of change of domicile

Free format text: JAPANESE INTERMEDIATE CODE: R313531

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350

LAPS Cancellation because of no payment of annual fees