JPH0365781A - Pattern normalizing system - Google Patents
Pattern normalizing systemInfo
- Publication number
- JPH0365781A JPH0365781A JP1201393A JP20139389A JPH0365781A JP H0365781 A JPH0365781 A JP H0365781A JP 1201393 A JP1201393 A JP 1201393A JP 20139389 A JP20139389 A JP 20139389A JP H0365781 A JPH0365781 A JP H0365781A
- Authority
- JP
- Japan
- Prior art keywords
- character
- width
- normalized
- height
- row
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010606 normalization Methods 0.000 claims abstract description 27
- 238000007781 pre-processing Methods 0.000 claims description 2
- 238000001514 detection method Methods 0.000 abstract description 11
- 238000000034 method Methods 0.000 description 12
- 238000010586 diagram Methods 0.000 description 5
- 238000000605 extraction Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 1
Abstract
Description
【発明の詳細な説明】
[産業上の利用分野]
本発明はパターン正規化方式、詳しくは文字認識の前段
階で行なう認識対象の文字パターンの正規化方式に関す
るものである。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a pattern normalization method, and more particularly to a method for normalizing a character pattern to be recognized, which is performed at a stage prior to character recognition.
[従来の技術]
従来、認識対象の文字パターンを正規化する場合には、
その矩形文字パターンの上下或いは左右にある白画素部
分を取り除き、縦横それぞれ別に正規化比率を計算して
処理していた。[Prior art] Conventionally, when normalizing a character pattern to be recognized,
The white pixel portions above and below or on the left and right sides of the rectangular character pattern are removed, and normalized ratios are calculated separately for each length and width.
[発明が解決しようとしている課題]
しかしながら、上述した従来の処理で正規化を行なう、
すなわち、白画素部分を取り除いて縦横それぞれに変倍
すると上下或は左右に片寄った文字・記号等はその情報
が失われてしまうことになり問題があった。[Problem to be solved by the invention] However, when normalization is performed using the conventional processing described above,
That is, if white pixel portions are removed and magnification is changed vertically and horizontally, the information of characters, symbols, etc. that are offset vertically or horizontally will be lost, which poses a problem.
例えば、オーバーライン“−”、マイナス記号アンダー
ライン“等はすべて“■”
になってしまっていた。また、大文字の“つ”と小文字
のそれとは、従来の正規化処理の後に認識結果では大き
さ情報が失われているために、区別して認識することが
できなかった。For example, overline "-", minus sign underline ", etc. all became "■". Also, the difference between an uppercase letter "tsu" and a lowercase letter is large in the recognition results after conventional normalization processing. Because the information was lost, it was not possible to distinguish and recognize them.
本発明はかかる課題に鑑みなされたものであり、良好な
認識結果を得ることを可能ならしめるパターン正規化方
式を提供しようとするものである。The present invention has been made in view of such problems, and it is an object of the present invention to provide a pattern normalization method that makes it possible to obtain good recognition results.
[課題を解決するための手段]及び[作用]この課題を
解決する本発明のパターン正規化方式は以下に示す構成
を備える。すなわち、文字認識の前処理で行なう切り出
し文字パターンの正規化方式において、正規化対象の文
字記号の位置する行或いは列中の最大高さ或いは幅の一
方を基準辺とし、他辺を注目文字記号パターンのドツト
存在幅とした矩形領域で当該注目文字記号パターンを切
り出し、この切り出された文字記号パターンを正規化す
るとき、正規化サイズに対する前記基準辺の比でもって
注目文字記号パターンの高さ及び幅方向の大きさを変倍
する。[Means for Solving the Problem] and [Operation] The pattern normalization method of the present invention that solves this problem has the configuration shown below. In other words, in the normalization method of extracted character patterns performed in character recognition preprocessing, one of the maximum height or width in the row or column where the character symbol to be normalized is located is used as the reference side, and the other side is used as the target character symbol. When cutting out the character symbol pattern of interest in a rectangular area set as the dot width of the pattern and normalizing this cut out character symbol pattern, the height of the character symbol pattern of interest is determined by the ratio of the reference side to the normalized size. Change the size in the width direction.
[実施例]
以下、添付図面に従って本発明に係る実施例を詳細に説
明する。[Embodiments] Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
第1図は実施例の装置の概略構成図であり、本実施例で
は横書き文書の文字認識を対象にしている。FIG. 1 is a schematic configuration diagram of an apparatus according to an embodiment, and the present embodiment targets character recognition of horizontally written documents.
先ず、スキャナ等の入力部1より文書画像を読取り、画
像メモリ2に格納する。この画像メモリ2より、文字行
検出部3によって文字行を検出し、文字行バッファ4に
格納する。更に、検出された文字行から最大高さ検出部
5により、文字行の最大高さが求められる。また、文字
行バッファから文字矩形検出部6により文字矩形が検出
され文字矩形バッファ7に格納される。文字矩形が検出
されると文字幅検出部8によって文字の幅が検出され、
一方で検出された文字行の高さと共に文字正規化部9に
入力され、文字の正規化処理が行なわれる。更に、特徴
抽出部10により、正規化された文字の特徴を表すパラ
メータを抽出し、認識部11では既に得られている各文
字の特徴を格納している辞書12と比較することにより
文字の認識を行なう。First, a document image is read by an input unit 1 such as a scanner and stored in an image memory 2. From this image memory 2, a character line is detected by a character line detecting section 3 and stored in a character line buffer 4. Furthermore, the maximum height of the character line is determined from the detected character line by the maximum height detection unit 5. Further, a character rectangle is detected from the character line buffer by the character rectangle detection unit 6 and stored in the character rectangle buffer 7. When a character rectangle is detected, the character width detection unit 8 detects the width of the character,
On the other hand, the height of the detected character line is inputted to the character normalization unit 9, and character normalization processing is performed. Furthermore, the feature extraction unit 10 extracts parameters representing the characteristics of the normalized characters, and the recognition unit 11 recognizes the characters by comparing them with a dictionary 12 that stores the characteristics of each character that have already been obtained. Do this.
第2図は特徴抽出部の処理を示すフローチャートである
。FIG. 2 is a flowchart showing the processing of the feature extraction section.
ステップS1では文字行の最大高さhとし、ステップS
2で文字の矩形幅をdとし、このhとdから、ステップ
S3において、これらの最大値をとる方を基準辺Wとす
る。そして、ステップS4において、予め決められてい
る正規化サイズ℃と求められたWにより正規化比率r
= fl / wを決め、このrを文字矩形の縦横それ
ぞれに乗じ、正規化された文字矩形を得る画像正規化処
理をステップS5で行なう。In step S1, the maximum height of the character line is h, and in step S
2, the rectangular width of the character is set as d, and from these h and d, in step S3, the one that takes the maximum value is set as the reference side W. Then, in step S4, the normalized ratio r
In step S5, image normalization processing is performed to obtain a normalized character rectangle by determining = fl / w and multiplying each of the vertical and horizontal directions of the character rectangle by this r.
尚、上述した処理の中で、文字列行の検出は公知のもの
を採用するものとしその詳述は割愛する。Note that, in the above-described processing, a known method is used for detecting character string lines, and a detailed description thereof will be omitted.
第3図に文字矩形の検出結果の一例を示す。各文字矩形
の高さは、その行の最大高さである“h(図示の場合に
は文字r度Jの高さ)”とした。FIG. 3 shows an example of the detection results of character rectangles. The height of each character rectangle was set to "h" (in the illustrated case, the height of the character r degrees J), which is the maximum height of the line.
すなわち、この最大高さhが、そのまま他の文字の高さ
とした。尚、最大高さhを求める方法として、その注目
前を構成しているドツトの分布を調べればわかる。That is, this maximum height h is used as the height of other characters. Note that the maximum height h can be determined by examining the distribution of dots forming the area in front of the object of interest.
また、各文字の幅であるが、これは各文字を構成してい
るドツトの横方向の存在範囲に一致させた。これもドツ
ト分布より求められる。Furthermore, the width of each character was made to match the horizontal range of the dots that make up each character. This can also be determined from the dot distribution.
今、上述した処理でもってピリオド記号の矩形が第4図
(a)に示すように、高さh、幅dとして決定されたと
する。このhとdとでは、hの方が大きい。そこで、こ
れを正規化サイズβ×βにする場合には、比n/hを正
規化比率とし変倍処理を施す。Now, suppose that the rectangle of the period symbol is determined to have a height h and a width d, as shown in FIG. 4(a), through the process described above. Between h and d, h is larger. Therefore, in order to make this the normalized size β×β, the ratio n/h is used as the normalized ratio and the scaling process is performed.
すなわち、高さに対しては、矩形の高さhに対して正規
化比率fl/hを乗することは勿論、幅d対してもβ/
hを乗じて第4図(b)のような正規化サイズのパター
ンを得る(正規化処理を示す)。パターンの変倍処理自
体はドツト間を補間する処理を施せば良い。That is, for the height, of course, the height h of the rectangle is multiplied by the normalized ratio fl/h, and the width d is also multiplied by β/
By multiplying by h, a pattern of normalized size as shown in FIG. 4(b) is obtained (normalization processing is shown). The pattern scaling process itself may be performed by interpolating between dots.
尚、変倍した後の幅d’ (=axg/h)は、d’
<厄であるので、その左右に(g−d’)/2の空白
部分(白画素からなる)を加え、℃×4の正規化文字矩
形サイズを得た。The width d' (=axg/h) after scaling is d'
<Since this is a problem, a blank area (consisting of white pixels) of (g-d')/2 was added to the left and right sides of it to obtain a normalized character rectangle size of ℃×4.
因に、従来の正規化処理では、注目パターンの上下左右
の空白部分を除いた文字矩形に対して正規化処理を施し
ていた。従って、その正規化処理を施した後のパターン
は第4図(C)のようになり、それがピリオドか否かを
区別がつかなくなる。Incidentally, in the conventional normalization processing, the normalization processing is performed on the character rectangle excluding the blank portions on the top, bottom, left, and right of the pattern of interest. Therefore, the pattern after the normalization process becomes as shown in FIG. 4(C), and it is difficult to tell whether it is a period or not.
ところが、本実施例の正規化処理によれば、第4図(b
)に示すようになり、上部分に空白部分。However, according to the normalization process of this embodiment, as shown in FIG.
), with a blank area at the top.
が存在するので、この画像から特徴量を抽出すれば、他
の特別な情報を持たせることなくピリオドとして認識さ
せることが可能となる。exists, so by extracting the feature amount from this image, it becomes possible to recognize it as a period without adding any other special information.
第5図(a)〜(C)はそれぞれオーバーライン、マイ
ナス記号そしてアンダーラインのパターンを正規化サイ
ズに正規化させた状態を示している。図示の様に、各パ
ターンを正規化しても、その正規化サイズ内の各々の特
徴的な位置(特徴量の1つ)は保たれたままになってい
ることがわかる。すなわち、この正規化サイズから特徴
量を抽出することにより文字の認識が可能となる。尚、
第5図(a)〜(C)の場合にも、先の第4図で説明し
たように、縦横の長さを比較し、長いものと正規化サイ
ズとの比によって正規化している。FIGS. 5A to 5C show the overline, minus sign, and underline patterns normalized to the normalized size, respectively. As shown in the figure, even if each pattern is normalized, each characteristic position (one of the characteristic amounts) within the normalized size is maintained. In other words, characters can be recognized by extracting feature amounts from this normalized size. still,
In the case of FIGS. 5(a) to 5(C) as well, the vertical and horizontal lengths are compared and normalized based on the ratio of the longer length to the normalized size, as described in FIG. 4 above.
ところが、縦横の長さが著しく異なる文字図形では、縦
横別の倍率によって正規化を行なうと基の形や位置は、
ゆがんだものとなり、他の文字・記号との区別がつかな
かったりする。例えば、オーバーライン、マイナス記号
、アンダーライン等をこの処理で正規化すると、全て第
5図(d)のようになってしまうが、本実施例の正規化
処理によればこのようなことは発生しない。However, for character shapes with significantly different lengths and widths, if normalization is performed using the magnification of each height and width, the original shape and position will be
They become distorted and cannot be distinguished from other letters and symbols. For example, if overlines, minus signs, underlines, etc. are normalized using this process, they all end up as shown in Figure 5(d), but according to the normalization process of this embodiment, such things do not occur. do not.
以上説明したように本実施例によれば、正規化したパタ
ーンは元の特徴量を保ったままにすることが可能となる
ので、文字認識処理における認識効率を向上させるいこ
とが可能となる。As described above, according to this embodiment, it is possible to maintain the original feature amount of the normalized pattern, and therefore it is possible to improve the recognition efficiency in character recognition processing.
尚、実施例では認識対象の文書を横書きとして説明した
が、縦書きの場合には、最初にその中で最大幅を検出し
、この最大幅を個々の文字の幅として切り出すようにす
れば良い。In the embodiment, the document to be recognized is written horizontally, but if the document is written vertically, the maximum width may be detected first, and this maximum width may be cut out as the width of each character. .
[発明の効果コ
以上説明したように本発明によれば、元の特徴量を保っ
たまま正規化されるので、文字の認識率を向上させるこ
とが可能となる。[Effects of the Invention] As explained above, according to the present invention, since normalization is performed while maintaining the original feature amount, it is possible to improve the character recognition rate.
第1図は実施例における文字認識装置のプロッり構成図
、
第2図の正規化部の処理内容を示すフローチャート、
第3図は文字矩形検出に係る原理を示す図、第4図(a
)は実施例の正規化処理による結果を示す図、
第4図(b)は従来の正規化処理の結果を示す図、
第5図(a)〜(c)は実施例の正規化処理による結果
を示す図、
第5図(d)は従来の正規化処理の結果を示す図である
。
図中、1・・・文書画像、2・・・文字行検出処理、3
・・・文字矩形検出処理、4・・・各文字矩形正規化処
理である。
第2図
第3図
第4図
第5図Fig. 1 is a plot configuration diagram of the character recognition device in the embodiment, Fig. 2 is a flowchart showing the processing contents of the normalization section, Fig. 3 is a diagram showing the principle related to character rectangle detection, Fig. 4 (a
) is a diagram showing the results of the normalization process of the example, Figure 4(b) is a diagram showing the results of the conventional normalization process, and Figures 5(a) to (c) are the results of the normalization process of the example. Figure 5(d) shows the results of conventional normalization processing. In the figure, 1... document image, 2... character line detection processing, 3
. . . Character rectangle detection processing; 4 . . . Each character rectangle normalization processing. Figure 2 Figure 3 Figure 4 Figure 5
Claims (1)
化方式において、 正規化対象の文字記号の位置する行或いは列中の最大高
さ或いは幅の一方を基準辺とし、他辺を注目文字記号パ
ターンのドット存在幅とした矩形領域で当該注目文字記
号パターンを切り出し、この切り出された文字記号パタ
ーンを正規化するとき、正規化サイズに対する前記基準
辺の比でもつて注目文字記号パターンの高さ及び幅方向
の大きさを変倍することを特徴とするパターン正規化方
式。[Claims] In a normalization method for extracted character patterns performed in preprocessing for character recognition, one of the maximum height or width in a row or column in which a character symbol to be normalized is located is used as a reference side, and the other side is set as a reference side. When cutting out the character symbol pattern of interest in a rectangular area with the dot existence width of the character symbol pattern of interest and normalizing this extracted character symbol pattern, the ratio of the reference side to the normalized size is A pattern normalization method characterized by scaling the size in the height and width directions.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP1201393A JPH0365781A (en) | 1989-08-04 | 1989-08-04 | Pattern normalizing system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP1201393A JPH0365781A (en) | 1989-08-04 | 1989-08-04 | Pattern normalizing system |
Publications (1)
Publication Number | Publication Date |
---|---|
JPH0365781A true JPH0365781A (en) | 1991-03-20 |
Family
ID=16440346
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP1201393A Pending JPH0365781A (en) | 1989-08-04 | 1989-08-04 | Pattern normalizing system |
Country Status (1)
Country | Link |
---|---|
JP (1) | JPH0365781A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH07152864A (en) * | 1991-08-07 | 1995-06-16 | Hokkaido Prefecture | Graphic generating system for hand-written alphanumeric character recognition |
-
1989
- 1989-08-04 JP JP1201393A patent/JPH0365781A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH07152864A (en) * | 1991-08-07 | 1995-06-16 | Hokkaido Prefecture | Graphic generating system for hand-written alphanumeric character recognition |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5212739A (en) | Noise tolerant optical character recognition system | |
EP0854434B1 (en) | Ruled line extracting apparatus for extracting ruled line from normal document image and method thereof | |
US5828771A (en) | Method and article of manufacture for determining whether a scanned image is an original image or fax image | |
US20070041642A1 (en) | Post-ocr image segmentation into spatially separated text zones | |
JPH0713995A (en) | Automatic determination device of feature of text | |
US10169650B1 (en) | Identification of emphasized text in electronic documents | |
Dori et al. | Segmentation and recognition of dimensioning text from engineering drawings | |
US8311331B2 (en) | Resolution adjustment of an image that includes text undergoing an OCR process | |
US5923782A (en) | System for detecting and identifying substantially linear horizontal and vertical lines of engineering drawings | |
JPH05242300A (en) | Method for processing document image | |
JPH0365781A (en) | Pattern normalizing system | |
JP3476595B2 (en) | Image area division method and image binarization method | |
JP2000090194A (en) | Image processing method and image processor | |
JPH02116987A (en) | Character recognizing device | |
JP2939985B2 (en) | Image processing device | |
JP3091278B2 (en) | Document recognition method | |
JP3122476B2 (en) | Automatic document copy machine | |
JP2612383B2 (en) | Character recognition processing method | |
JP2008234223A (en) | Image processing apparatus, image processing method, program, and recording medium | |
JPH0697470B2 (en) | Character string extractor | |
JPS60153567A (en) | Method for extracting area in printed document picture | |
Haralick et al. | Document structural decomposition | |
JPH0415776A (en) | Extracting method for character size information | |
JP2963474B2 (en) | Similar character identification method | |
JPH04346189A (en) | Character string type identification device |