JPH0365781A - Pattern normalizing system - Google Patents

Pattern normalizing system

Info

Publication number
JPH0365781A
JPH0365781A JP1201393A JP20139389A JPH0365781A JP H0365781 A JPH0365781 A JP H0365781A JP 1201393 A JP1201393 A JP 1201393A JP 20139389 A JP20139389 A JP 20139389A JP H0365781 A JPH0365781 A JP H0365781A
Authority
JP
Japan
Prior art keywords
character
width
normalized
height
row
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP1201393A
Other languages
Japanese (ja)
Inventor
Hiroyuki Asakawa
浅川 浩之
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Inc filed Critical Canon Inc
Priority to JP1201393A priority Critical patent/JPH0365781A/en
Publication of JPH0365781A publication Critical patent/JPH0365781A/en
Pending legal-status Critical Current

Links

Abstract

PURPOSE:To improve a recognition rate by defining either maximum height or width in a row or a column, where a character symbol is positioned, as a reference side, defining the other side as dot existing width in a rectangular area, segmenting attention character symbol pattern in this rectangular area and variably magnifying the height and width with the ratio of the reference side to a normalizing size. CONSTITUTION:The character row is detected by a character row detection part 3 and the maximum height of the character row is calculated by a maximum height detection part 5. A character rectangle is detected by a character rectangle detection part 6 and the width of the character is detected by a character width detection part 8. For the rectangular area, either the maximum height or width in the row or column, where the character symbol as the object of normalization is positioned, is defined as the reference side and the other side is defined as the dot existing width of the attention character symbol pattern. Then, when the attention character symbol pattern is segmented in such a rectangular area and normalized by a character normalization part 9, the size of the attention character symbol pattern in height and width direction is variably magnified by the ratio of the reference side to the normalizing size. Thus, since the character is normalized while keeping original feature quantity, the recognition rate of the character is improved.

Description

【発明の詳細な説明】 [産業上の利用分野] 本発明はパターン正規化方式、詳しくは文字認識の前段
階で行なう認識対象の文字パターンの正規化方式に関す
るものである。
DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a pattern normalization method, and more particularly to a method for normalizing a character pattern to be recognized, which is performed at a stage prior to character recognition.

[従来の技術] 従来、認識対象の文字パターンを正規化する場合には、
その矩形文字パターンの上下或いは左右にある白画素部
分を取り除き、縦横それぞれ別に正規化比率を計算して
処理していた。
[Prior art] Conventionally, when normalizing a character pattern to be recognized,
The white pixel portions above and below or on the left and right sides of the rectangular character pattern are removed, and normalized ratios are calculated separately for each length and width.

[発明が解決しようとしている課題] しかしながら、上述した従来の処理で正規化を行なう、
すなわち、白画素部分を取り除いて縦横それぞれに変倍
すると上下或は左右に片寄った文字・記号等はその情報
が失われてしまうことになり問題があった。
[Problem to be solved by the invention] However, when normalization is performed using the conventional processing described above,
That is, if white pixel portions are removed and magnification is changed vertically and horizontally, the information of characters, symbols, etc. that are offset vertically or horizontally will be lost, which poses a problem.

例えば、オーバーライン“−”、マイナス記号アンダー
ライン“等はすべて“■” になってしまっていた。また、大文字の“つ”と小文字
のそれとは、従来の正規化処理の後に認識結果では大き
さ情報が失われているために、区別して認識することが
できなかった。
For example, overline "-", minus sign underline ", etc. all became "■". Also, the difference between an uppercase letter "tsu" and a lowercase letter is large in the recognition results after conventional normalization processing. Because the information was lost, it was not possible to distinguish and recognize them.

本発明はかかる課題に鑑みなされたものであり、良好な
認識結果を得ることを可能ならしめるパターン正規化方
式を提供しようとするものである。
The present invention has been made in view of such problems, and it is an object of the present invention to provide a pattern normalization method that makes it possible to obtain good recognition results.

[課題を解決するための手段]及び[作用]この課題を
解決する本発明のパターン正規化方式は以下に示す構成
を備える。すなわち、文字認識の前処理で行なう切り出
し文字パターンの正規化方式において、正規化対象の文
字記号の位置する行或いは列中の最大高さ或いは幅の一
方を基準辺とし、他辺を注目文字記号パターンのドツト
存在幅とした矩形領域で当該注目文字記号パターンを切
り出し、この切り出された文字記号パターンを正規化す
るとき、正規化サイズに対する前記基準辺の比でもって
注目文字記号パターンの高さ及び幅方向の大きさを変倍
する。
[Means for Solving the Problem] and [Operation] The pattern normalization method of the present invention that solves this problem has the configuration shown below. In other words, in the normalization method of extracted character patterns performed in character recognition preprocessing, one of the maximum height or width in the row or column where the character symbol to be normalized is located is used as the reference side, and the other side is used as the target character symbol. When cutting out the character symbol pattern of interest in a rectangular area set as the dot width of the pattern and normalizing this cut out character symbol pattern, the height of the character symbol pattern of interest is determined by the ratio of the reference side to the normalized size. Change the size in the width direction.

[実施例] 以下、添付図面に従って本発明に係る実施例を詳細に説
明する。
[Embodiments] Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

第1図は実施例の装置の概略構成図であり、本実施例で
は横書き文書の文字認識を対象にしている。
FIG. 1 is a schematic configuration diagram of an apparatus according to an embodiment, and the present embodiment targets character recognition of horizontally written documents.

先ず、スキャナ等の入力部1より文書画像を読取り、画
像メモリ2に格納する。この画像メモリ2より、文字行
検出部3によって文字行を検出し、文字行バッファ4に
格納する。更に、検出された文字行から最大高さ検出部
5により、文字行の最大高さが求められる。また、文字
行バッファから文字矩形検出部6により文字矩形が検出
され文字矩形バッファ7に格納される。文字矩形が検出
されると文字幅検出部8によって文字の幅が検出され、
一方で検出された文字行の高さと共に文字正規化部9に
入力され、文字の正規化処理が行なわれる。更に、特徴
抽出部10により、正規化された文字の特徴を表すパラ
メータを抽出し、認識部11では既に得られている各文
字の特徴を格納している辞書12と比較することにより
文字の認識を行なう。
First, a document image is read by an input unit 1 such as a scanner and stored in an image memory 2. From this image memory 2, a character line is detected by a character line detecting section 3 and stored in a character line buffer 4. Furthermore, the maximum height of the character line is determined from the detected character line by the maximum height detection unit 5. Further, a character rectangle is detected from the character line buffer by the character rectangle detection unit 6 and stored in the character rectangle buffer 7. When a character rectangle is detected, the character width detection unit 8 detects the width of the character,
On the other hand, the height of the detected character line is inputted to the character normalization unit 9, and character normalization processing is performed. Furthermore, the feature extraction unit 10 extracts parameters representing the characteristics of the normalized characters, and the recognition unit 11 recognizes the characters by comparing them with a dictionary 12 that stores the characteristics of each character that have already been obtained. Do this.

第2図は特徴抽出部の処理を示すフローチャートである
FIG. 2 is a flowchart showing the processing of the feature extraction section.

ステップS1では文字行の最大高さhとし、ステップS
2で文字の矩形幅をdとし、このhとdから、ステップ
S3において、これらの最大値をとる方を基準辺Wとす
る。そして、ステップS4において、予め決められてい
る正規化サイズ℃と求められたWにより正規化比率r 
= fl / wを決め、このrを文字矩形の縦横それ
ぞれに乗じ、正規化された文字矩形を得る画像正規化処
理をステップS5で行なう。
In step S1, the maximum height of the character line is h, and in step S
2, the rectangular width of the character is set as d, and from these h and d, in step S3, the one that takes the maximum value is set as the reference side W. Then, in step S4, the normalized ratio r
In step S5, image normalization processing is performed to obtain a normalized character rectangle by determining = fl / w and multiplying each of the vertical and horizontal directions of the character rectangle by this r.

尚、上述した処理の中で、文字列行の検出は公知のもの
を採用するものとしその詳述は割愛する。
Note that, in the above-described processing, a known method is used for detecting character string lines, and a detailed description thereof will be omitted.

第3図に文字矩形の検出結果の一例を示す。各文字矩形
の高さは、その行の最大高さである“h(図示の場合に
は文字r度Jの高さ)”とした。
FIG. 3 shows an example of the detection results of character rectangles. The height of each character rectangle was set to "h" (in the illustrated case, the height of the character r degrees J), which is the maximum height of the line.

すなわち、この最大高さhが、そのまま他の文字の高さ
とした。尚、最大高さhを求める方法として、その注目
前を構成しているドツトの分布を調べればわかる。
That is, this maximum height h is used as the height of other characters. Note that the maximum height h can be determined by examining the distribution of dots forming the area in front of the object of interest.

また、各文字の幅であるが、これは各文字を構成してい
るドツトの横方向の存在範囲に一致させた。これもドツ
ト分布より求められる。
Furthermore, the width of each character was made to match the horizontal range of the dots that make up each character. This can also be determined from the dot distribution.

今、上述した処理でもってピリオド記号の矩形が第4図
(a)に示すように、高さh、幅dとして決定されたと
する。このhとdとでは、hの方が大きい。そこで、こ
れを正規化サイズβ×βにする場合には、比n/hを正
規化比率とし変倍処理を施す。
Now, suppose that the rectangle of the period symbol is determined to have a height h and a width d, as shown in FIG. 4(a), through the process described above. Between h and d, h is larger. Therefore, in order to make this the normalized size β×β, the ratio n/h is used as the normalized ratio and the scaling process is performed.

すなわち、高さに対しては、矩形の高さhに対して正規
化比率fl/hを乗することは勿論、幅d対してもβ/
hを乗じて第4図(b)のような正規化サイズのパター
ンを得る(正規化処理を示す)。パターンの変倍処理自
体はドツト間を補間する処理を施せば良い。
That is, for the height, of course, the height h of the rectangle is multiplied by the normalized ratio fl/h, and the width d is also multiplied by β/
By multiplying by h, a pattern of normalized size as shown in FIG. 4(b) is obtained (normalization processing is shown). The pattern scaling process itself may be performed by interpolating between dots.

尚、変倍した後の幅d’  (=axg/h)は、d’
 <厄であるので、その左右に(g−d’)/2の空白
部分(白画素からなる)を加え、℃×4の正規化文字矩
形サイズを得た。
The width d' (=axg/h) after scaling is d'
<Since this is a problem, a blank area (consisting of white pixels) of (g-d')/2 was added to the left and right sides of it to obtain a normalized character rectangle size of ℃×4.

因に、従来の正規化処理では、注目パターンの上下左右
の空白部分を除いた文字矩形に対して正規化処理を施し
ていた。従って、その正規化処理を施した後のパターン
は第4図(C)のようになり、それがピリオドか否かを
区別がつかなくなる。
Incidentally, in the conventional normalization processing, the normalization processing is performed on the character rectangle excluding the blank portions on the top, bottom, left, and right of the pattern of interest. Therefore, the pattern after the normalization process becomes as shown in FIG. 4(C), and it is difficult to tell whether it is a period or not.

ところが、本実施例の正規化処理によれば、第4図(b
)に示すようになり、上部分に空白部分。
However, according to the normalization process of this embodiment, as shown in FIG.
), with a blank area at the top.

が存在するので、この画像から特徴量を抽出すれば、他
の特別な情報を持たせることなくピリオドとして認識さ
せることが可能となる。
exists, so by extracting the feature amount from this image, it becomes possible to recognize it as a period without adding any other special information.

第5図(a)〜(C)はそれぞれオーバーライン、マイ
ナス記号そしてアンダーラインのパターンを正規化サイ
ズに正規化させた状態を示している。図示の様に、各パ
ターンを正規化しても、その正規化サイズ内の各々の特
徴的な位置(特徴量の1つ)は保たれたままになってい
ることがわかる。すなわち、この正規化サイズから特徴
量を抽出することにより文字の認識が可能となる。尚、
第5図(a)〜(C)の場合にも、先の第4図で説明し
たように、縦横の長さを比較し、長いものと正規化サイ
ズとの比によって正規化している。
FIGS. 5A to 5C show the overline, minus sign, and underline patterns normalized to the normalized size, respectively. As shown in the figure, even if each pattern is normalized, each characteristic position (one of the characteristic amounts) within the normalized size is maintained. In other words, characters can be recognized by extracting feature amounts from this normalized size. still,
In the case of FIGS. 5(a) to 5(C) as well, the vertical and horizontal lengths are compared and normalized based on the ratio of the longer length to the normalized size, as described in FIG. 4 above.

ところが、縦横の長さが著しく異なる文字図形では、縦
横別の倍率によって正規化を行なうと基の形や位置は、
ゆがんだものとなり、他の文字・記号との区別がつかな
かったりする。例えば、オーバーライン、マイナス記号
、アンダーライン等をこの処理で正規化すると、全て第
5図(d)のようになってしまうが、本実施例の正規化
処理によればこのようなことは発生しない。
However, for character shapes with significantly different lengths and widths, if normalization is performed using the magnification of each height and width, the original shape and position will be
They become distorted and cannot be distinguished from other letters and symbols. For example, if overlines, minus signs, underlines, etc. are normalized using this process, they all end up as shown in Figure 5(d), but according to the normalization process of this embodiment, such things do not occur. do not.

以上説明したように本実施例によれば、正規化したパタ
ーンは元の特徴量を保ったままにすることが可能となる
ので、文字認識処理における認識効率を向上させるいこ
とが可能となる。
As described above, according to this embodiment, it is possible to maintain the original feature amount of the normalized pattern, and therefore it is possible to improve the recognition efficiency in character recognition processing.

尚、実施例では認識対象の文書を横書きとして説明した
が、縦書きの場合には、最初にその中で最大幅を検出し
、この最大幅を個々の文字の幅として切り出すようにす
れば良い。
In the embodiment, the document to be recognized is written horizontally, but if the document is written vertically, the maximum width may be detected first, and this maximum width may be cut out as the width of each character. .

[発明の効果コ 以上説明したように本発明によれば、元の特徴量を保っ
たまま正規化されるので、文字の認識率を向上させるこ
とが可能となる。
[Effects of the Invention] As explained above, according to the present invention, since normalization is performed while maintaining the original feature amount, it is possible to improve the character recognition rate.

【図面の簡単な説明】[Brief explanation of drawings]

第1図は実施例における文字認識装置のプロッり構成図
、 第2図の正規化部の処理内容を示すフローチャート、 第3図は文字矩形検出に係る原理を示す図、第4図(a
)は実施例の正規化処理による結果を示す図、 第4図(b)は従来の正規化処理の結果を示す図、 第5図(a)〜(c)は実施例の正規化処理による結果
を示す図、 第5図(d)は従来の正規化処理の結果を示す図である
。 図中、1・・・文書画像、2・・・文字行検出処理、3
・・・文字矩形検出処理、4・・・各文字矩形正規化処
理である。 第2図 第3図 第4図 第5図
Fig. 1 is a plot configuration diagram of the character recognition device in the embodiment, Fig. 2 is a flowchart showing the processing contents of the normalization section, Fig. 3 is a diagram showing the principle related to character rectangle detection, Fig. 4 (a
) is a diagram showing the results of the normalization process of the example, Figure 4(b) is a diagram showing the results of the conventional normalization process, and Figures 5(a) to (c) are the results of the normalization process of the example. Figure 5(d) shows the results of conventional normalization processing. In the figure, 1... document image, 2... character line detection processing, 3
. . . Character rectangle detection processing; 4 . . . Each character rectangle normalization processing. Figure 2 Figure 3 Figure 4 Figure 5

Claims (1)

【特許請求の範囲】 文字認識の前処理で行なう切り出し文字パターンの正規
化方式において、 正規化対象の文字記号の位置する行或いは列中の最大高
さ或いは幅の一方を基準辺とし、他辺を注目文字記号パ
ターンのドット存在幅とした矩形領域で当該注目文字記
号パターンを切り出し、この切り出された文字記号パタ
ーンを正規化するとき、正規化サイズに対する前記基準
辺の比でもつて注目文字記号パターンの高さ及び幅方向
の大きさを変倍することを特徴とするパターン正規化方
式。
[Claims] In a normalization method for extracted character patterns performed in preprocessing for character recognition, one of the maximum height or width in a row or column in which a character symbol to be normalized is located is used as a reference side, and the other side is set as a reference side. When cutting out the character symbol pattern of interest in a rectangular area with the dot existence width of the character symbol pattern of interest and normalizing this extracted character symbol pattern, the ratio of the reference side to the normalized size is A pattern normalization method characterized by scaling the size in the height and width directions.
JP1201393A 1989-08-04 1989-08-04 Pattern normalizing system Pending JPH0365781A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP1201393A JPH0365781A (en) 1989-08-04 1989-08-04 Pattern normalizing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP1201393A JPH0365781A (en) 1989-08-04 1989-08-04 Pattern normalizing system

Publications (1)

Publication Number Publication Date
JPH0365781A true JPH0365781A (en) 1991-03-20

Family

ID=16440346

Family Applications (1)

Application Number Title Priority Date Filing Date
JP1201393A Pending JPH0365781A (en) 1989-08-04 1989-08-04 Pattern normalizing system

Country Status (1)

Country Link
JP (1) JPH0365781A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07152864A (en) * 1991-08-07 1995-06-16 Hokkaido Prefecture Graphic generating system for hand-written alphanumeric character recognition

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07152864A (en) * 1991-08-07 1995-06-16 Hokkaido Prefecture Graphic generating system for hand-written alphanumeric character recognition

Similar Documents

Publication Publication Date Title
US5212739A (en) Noise tolerant optical character recognition system
EP0854434B1 (en) Ruled line extracting apparatus for extracting ruled line from normal document image and method thereof
US5828771A (en) Method and article of manufacture for determining whether a scanned image is an original image or fax image
US20070041642A1 (en) Post-ocr image segmentation into spatially separated text zones
JPH0713995A (en) Automatic determination device of feature of text
US10169650B1 (en) Identification of emphasized text in electronic documents
Dori et al. Segmentation and recognition of dimensioning text from engineering drawings
US8311331B2 (en) Resolution adjustment of an image that includes text undergoing an OCR process
US5923782A (en) System for detecting and identifying substantially linear horizontal and vertical lines of engineering drawings
JPH05242300A (en) Method for processing document image
JPH0365781A (en) Pattern normalizing system
JP3476595B2 (en) Image area division method and image binarization method
JP2000090194A (en) Image processing method and image processor
JPH02116987A (en) Character recognizing device
JP2939985B2 (en) Image processing device
JP3091278B2 (en) Document recognition method
JP3122476B2 (en) Automatic document copy machine
JP2612383B2 (en) Character recognition processing method
JP2008234223A (en) Image processing apparatus, image processing method, program, and recording medium
JPH0697470B2 (en) Character string extractor
JPS60153567A (en) Method for extracting area in printed document picture
Haralick et al. Document structural decomposition
JPH0415776A (en) Extracting method for character size information
JP2963474B2 (en) Similar character identification method
JPH04346189A (en) Character string type identification device