JPH0545990B2

JPH0545990B2 -

Info

Publication number: JPH0545990B2
Application number: JP58109188A
Authority: JP
Inventors: Yoshuki Yamashita
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1983-06-20
Filing date: 1983-06-20
Publication date: 1993-07-12
Also published as: JPS603071A

Description

【発明の詳細な説明】（技術分野）本発明は高速で精度の良い特徴抽出方法に関す
るものである。DETAILED DESCRIPTION OF THE INVENTION (Technical Field) The present invention relates to a fast and accurate feature extraction method.

（背景技術）従来文字図形認識装置に於ては、文字図形パタ
ーンよりストロークを抽出し、それら抽出された
ストロークの位置、長さ、ストローク間の相互関
係等を用いて認識する方式が多く採用されてい
る。この種の装置においては、(1)文字図形の輪郭
を追跡することにより検出された輪郭点系列につ
いて曲率を計算し、その曲率の大きな値の点を分
割点として輪郭系列を分割し、分割された系列を
組合わせることによりストロークを抽出するか、
又は(2)文字図形パターンに細線化処理を行なつて
骨格化し、その骨格パターンの連結性及び骨格パ
ターンを追跡し急激な角度の変化点等を検出して
ストロークを抽出し、該抽出されたストロークに
ついて幾何学的な特徴等を抽出し文字図形の認識
を行なつていた。しかしながら、(1)の方法は、文
字図形パターンが大きくなり又文字図形パターン
が複雑化すると、その処理量が増大しそのため処
理速度の低下を招き、(2)の方法は、文字図形パタ
ーンを細線化する必要があり又その細線化による
パターンのひずみ、ヒゲ等の問題があり、その後
の処理が複雑なものとなる欠点がある。(Background Art) Conventional character/figure recognition devices often employ a method of extracting strokes from character/figure patterns and recognizing them using the positions, lengths, and mutual relationships between strokes of the extracted strokes. ing. This type of device (1) calculates the curvature of a contour point series detected by tracing the contour of a character figure, divides the contour series using points with a large value of curvature as dividing points; Extract the strokes by combining the series, or
or (2) perform thinning processing on the character figure pattern to create a skeleton, trace the connectivity and skeleton pattern, detect sudden angle changes, etc., extract strokes, and extract strokes from the extracted strokes. Character shapes were recognized by extracting geometric features of strokes. However, in method (1), when the character/graphic pattern becomes large or complex, the amount of processing increases, resulting in a decrease in processing speed. Furthermore, there are problems such as pattern distortion and whiskers due to thinning, and subsequent processing becomes complicated.

（発明の課題）本発明の目的はこれらの欠点を改善するもの
で、文字図形パターンの所望の方向のストローク
成分を表わすサブパターンを抽出し、サブパター
ンについて文字外接枠の辺上の点から走査して走
査線上のすべての文字線の位置を検出し、走査開
始点と文字線との距離を前記走査方向の文字外接
枠の大きさで正規化して抽出するＮ乗和の辺単位
の配列を分割して特徴ベクトル群を抽出すること
を特徴とし、その目的は高速で安定な文字認識方
法を提供することにある。(Problems to be solved by the invention) An object of the present invention is to improve these drawbacks by extracting a sub-pattern representing a stroke component in a desired direction of a character figure pattern, and scanning the sub-pattern from a point on the side of a character circumscribing frame. Detect the positions of all character lines on the scanning line, normalize the distance between the scanning start point and the character line by the size of the character circumscribing frame in the scanning direction, and extract the side-by-side array of the sum of N powers. It is characterized by dividing and extracting a group of feature vectors, and its purpose is to provide a fast and stable character recognition method.

（発明の構成および作用）第１図は本発明の文字認識装置における実施
例、第２図はサブパターン例、第３図は特徴抽出
例を示す。(Structure and operation of the invention) FIG. 1 shows an embodiment of the character recognition device of the invention, FIG. 2 shows an example of a sub-pattern, and FIG. 3 shows an example of feature extraction.

第１図中１は光電変換部、２はパターンレジス
タ、３は線幅計算部、４はサブパターン抽出部、
５は文字枠検出部、６は特徴抽出部、７は特徴マ
トリクス抽出部、８は識別部、９は文字名出力で
ある。 In FIG. 1, 1 is a photoelectric conversion section, 2 is a pattern register, 3 is a line width calculation section, 4 is a sub-pattern extraction section,
5 is a character frame detection section, 6 is a feature extraction section, 7 is a feature matrix extraction section, 8 is an identification section, and 9 is a character name output.

本実施例の動作は、読取機構にセツトされた帳
票上の文字は光電変換部１において２値の量子化
されたデイジタル電気信号に変換され、パターン
レジスタ２に格納される。それと同時に、線幅計
算部３において入力パターンの線幅（Ｗ）が計算
される。サブパターン抽出部４は、パターンレジ
スタについて垂直スキヤンを全面行なつて、黒ビ
ツトの連続長さと線幅計算部３において計算され
た線幅との関係より垂直サブパターン（VSP）
を抽出する。同様に、水平スキヤンにより水平サ
ブパターン（HSP）を、右斜め45°スキヤンによ
り右斜めサブパターン（RSP）を、左斜め45°ス
キヤンにより左斜めサブパターン（LSP）を抽出
する。第２図は原パターンと各サブパターンの例
でａは原パターン、ｂは垂直サブパターン
（VSP）、ｃは水平サブパターン（HSP）、ｄは右
斜めサブパターン（RSP）、ｅは左斜めサブパタ
ーン（LSP）である。文字枠検出部５は、パター
ンレジスタ内の文字図形パターンに外接する方形
の枠（以後文字枠と称する）を検出し、パターン
レジスタで定義される２次元平面における前記文
字枠を規定する為の位置座標を特徴抽出部６へ送
出する。以後の説明においては文字枠の左下を原
点とし、水平方向をＸ軸、垂直方向をＹ軸とする
座標系を使用する。特徴抽出部６は、まず垂直サ
ブパターンについて、文字枠を構成する４辺のう
ち垂直な辺である左辺上の点Ｐ（０，ｙ）から水
平走査を開始し、白点から黒点への変化点をすべ
て検出し、検出した変化点と前記走査を開始した
垂直辺上の点Ｐとの間の距離すなわちＸ座標の差
を文字枠のＸ方向の長さを正規化定数として正規
化した値のＮ乗（Ｎは定数、本実施例ではＮ＝
２）の値の計算を前記検出したすべての変化点に
ついて行ない、その総和を配列V_l（ｙ）に格納す
る。但し前記白点とは文字背景部を表わし、黒点
とは文字線部を表わす。また式(1)は前記のV_l（ｙ）
を式で表わしたものであり△X_kはそれぞれの変
化点と文字枠辺上の走査開始点との距離を示し、
ｋ＝１，……，Ｋ，Ｋは検出された変化点個数を
表わす。又、式(1)中の△Ｘは文字枠の水平方向の
長さであり、Ｃは整数化定数であり本実施例にお
いてはＣ＝50とした。 In the operation of this embodiment, characters on a form set in a reading mechanism are converted into a binary quantized digital electrical signal by a photoelectric converter 1, and stored in a pattern register 2. At the same time, the line width (W) of the input pattern is calculated in the line width calculating section 3. The sub-pattern extraction unit 4 performs a vertical scan on the entire pattern register, and extracts a vertical sub-pattern (VSP) based on the relationship between the continuous length of black bits and the line width calculated by the line width calculation unit 3.
Extract. Similarly, a horizontal subpattern (HSP) is extracted by a horizontal scan, a right diagonal subpattern (RSP) is extracted by a 45° rightward scan, and a left diagonal subpattern (LSP) is extracted by a 45° leftward scan. Figure 2 shows an example of the original pattern and each sub-pattern, where a is the original pattern, b is the vertical sub-pattern (VSP), c is the horizontal sub-pattern (HSP), d is the right diagonal sub-pattern (RSP), and e is the left diagonal. It is a subpattern (LSP). The character frame detection unit 5 detects a rectangular frame (hereinafter referred to as a character frame) circumscribing a character figure pattern in a pattern register, and determines a position for defining the character frame in a two-dimensional plane defined by the pattern register. The coordinates are sent to the feature extraction unit 6. In the following description, a coordinate system will be used in which the origin is at the lower left of the character frame, the X axis is in the horizontal direction, and the Y axis is in the vertical direction. The feature extraction unit 6 first starts horizontal scanning of the vertical sub-pattern from a point P (0, y) on the left side, which is the vertical side of the four sides forming the character frame, and detects the change from a white point to a black point. A value obtained by detecting all points and normalizing the distance between the detected change point and the point P on the vertical side where the scanning started, that is, the difference in the X coordinate, using the length of the character frame in the X direction as a normalization constant. to the Nth power (N is a constant, in this example, N=
The value of 2) is calculated for all the detected change points, and the total sum is stored in the array V _l (y). However, the white dots represent the character background parts, and the black dots represent the character line parts. Also, equation (1) is the above-mentioned V _l (y)
is expressed by the formula, and △X _k indicates the distance between each change point and the scanning start point on the side of the character frame,
k=1, . . . , K, K represents the number of detected change points. Further, ΔX in equation (1) is the length of the character frame in the horizontal direction, and C is an integer conversion constant, and in this embodiment, C=50.

V_l（ｙ）＝_K 〓^k=1 ｛△x_k／△Ｘ×Ｃ｝² ……(1) 上記の様な処理を文字枠の２つの垂直辺上のす
べての点を開始点として行ない、垂直サブパター
ンについて、文字枠の左辺上の点から水平走査を
開始して作成する配列V_l（ｉ）、文字枠の右辺上の
点から水平走査を開始して作成する配列V_r（ｉ）
を抽出する。但しｉ＝０，……，YT，YTは文
字枠上辺のＹ座標である。同様な処理により、水
平サブパターン、右斜めサブパターン、左斜めサ
ブパターンについては文字枠の２個の水平辺上の
すべての点から垂直走査を行なつて、水平サブパ
ターンについて配列H_b（ｊ），H_t（ｊ）、右斜めサ
ブパターンについての配列R_b（ｊ），R_t（ｊ）、左
斜めサブパターンについての配列L_b（ｊ），L_t（ｊ）
を抽出する。但しｊ＝０，……，XR，XRは文
字枠右辺のＸ座標である。前記水平サブパター
ン、右斜めサブパターン、左斜めサブパターンに
ついて抽出する配列の添字ｂは文字枠の水平な下
辺上の点を開始点としたものを表わしｔは水平な
上辺上の点を開始点としたものを表わす。又、
H_b（ｊ），H_t（ｊ），R_b（ｊ），R_t（ｊ），L_b（ｊ），
L_t
（ｊ）を抽出する際における走査開始点と白点か
ら黒点への変化点との距離の正規化定数としては
文字枠の垂直方向の長さ△Ｙを使用した。特徴マ
トリクス抽出部７は特徴抽出部６において抽出さ
れた８種の配列を使用し、各配列をＭ個（Ｍは定
数、本実施例ではＭ＝７）に分割し、分割された
配列の同一分割単位内の配列の値の平均値を計算
することによりＭ×８次元の特徴マトリクスＦ
（ｍ，ｎ）を抽出する。但しｍ＝１，……，Ｍ，
ｎ＝１，……，８である。識別部８は特徴マトリ
クス抽出部７で抽出された特徴マトリクスと、同
形式で記述された標準文字マスクｆ（ｍ，ｎ）と
の間の式(2)で示される距離（Ｄ）を計算しその距
離が最も小さい値を与える標準文字マスクのカテ
ゴリ名を文字名出力９へ出力する。 V _l (y) = _K 〓 ^k=1 {△x _k ／△X×C} ² ...(1) Perform the above process starting from all points on the two vertical sides of the character frame. , for the vertical subpattern, an array V _l (i) created by starting horizontal scanning from a point on the left side of the character frame, an array V _r (i) created by starting horizontal scanning from a point on the right side of the character frame )
Extract. However, i=0, . . . , YT, YT is the Y coordinate of the upper side of the character frame. Through similar processing, vertical scanning is performed from all points on the two horizontal sides of the character frame for the horizontal sub-pattern, right diagonal sub-pattern, and left diagonal sub-pattern, and the array H _b (j ), H _t (j), array R _b (j), R _t (j) for the right diagonal sub-pattern, array L b (j), L _t ( _j ) for the left diagonal sub-pattern
Extract. However, j=0, . . . , XR, XR is the X coordinate of the right side of the character frame. The subscript b of the array to be extracted for the horizontal subpattern, right diagonal subpattern, and left diagonal subpattern indicates that the starting point is a point on the horizontal bottom side of the character frame, and t indicates the starting point is a point on the horizontal top side of the character frame. represents something that has been. or,
H _b (j), H _t (j), R _b (j), R _t (j), L _b (j),
L _t
When extracting (j), the vertical length ΔY of the character frame was used as a normalization constant for the distance between the scanning start point and the point where the white point changes to the black point. The feature matrix extraction unit 7 uses the eight types of arrays extracted by the feature extraction unit 6, divides each array into M pieces (M is a constant, in this example, M=7), and divides each array into M pieces (M is a constant, in this example, M=7), and An M×8-dimensional feature matrix F is created by calculating the average value of the array values within the division unit.
Extract (m, n). However, m=1,...,M,
n=1,...,8. The identification unit 8 calculates the distance (D) shown by equation (2) between the feature matrix extracted by the feature matrix extraction unit 7 and the standard character mask f(m, n) written in the same format. The category name of the standard character mask that gives the smallest distance is output to the character name output 9.

Ｄ＝√｛（，）−（，）｝² ……(2) （発明の効果）以上説明した様に、本実施例の特徴マトリクス
抽出部において抽出された特徴マトリクスは、文
字図形パターンのストロークの位置、長さ、方向
等を表わすもので、文字特有な性質を表現してい
る。又、図３に２種の形が類似した文字パターン
と特徴抽出部で抽出する配列を図形的に表現した
例において観察されるように、文字の局所的な違
いが前記配列に充分に反映されるので認識精度の
向上を図ることができる。又、特徴抽出部におい
て各配列を作成する際に、走査を開始した文字枠
辺上の点と文字線との距離を文字枠の当該走査方
向の大きさで正規化しているので手書文字におい
て特有な筆者の違いによる文字の大きさの変動等
を吸収することができるので精度の良い認識が可
能である。又、文字図形パターンからの特徴抽出
を単純な走査という処理により実現しているので
高度な認識が可能であり、装置の小型化を図るこ
ともできる利点がある。D=√{(,)−(,)} ² ...(2) (Effect of the invention) As explained above, the feature matrix extracted by the feature matrix extraction unit of this embodiment is based on the strokes of the character figure pattern. It expresses the position, length, direction, etc. of characters, and expresses the characteristics unique to characters. Furthermore, as can be observed in the example shown in FIG. 3, in which two types of character patterns with similar shapes and the arrangement extracted by the feature extraction unit are graphically expressed, local differences in the characters are sufficiently reflected in the arrangement. Therefore, recognition accuracy can be improved. In addition, when creating each array in the feature extraction section, the distance between the point on the side of the character frame where scanning started and the character line is normalized by the size of the character frame in the scanning direction, so it is easy to use for handwritten characters. Since it is possible to absorb variations in font size due to unique differences in writers, highly accurate recognition is possible. Furthermore, since feature extraction from character/graphic patterns is achieved through a simple scanning process, advanced recognition is possible, and the apparatus has the advantage of being able to be miniaturized.

本発明は、文字図形パターンから各方向のスト
ローク成分を抽出したサブパターンを垂直又は水
平に走査し、文字枠辺と文字線との距離を当該走
査方向の文字枠の大きさで正規化した値のＮ乗和
を特徴としているので、複雑な処理を必要とせず
又、手書文字の変形に追従して安定に特徴を抽出
しているので高速で精度の良い文字認識装置に利
用することができる。 The present invention scans vertically or horizontally a sub-pattern in which stroke components in each direction are extracted from a character figure pattern, and calculates a value obtained by normalizing the distance between a character frame side and a character line by the size of the character frame in the scanning direction. Since it is characterized by the N-th power sum of can.

[Brief explanation of the drawing]

第１図は本発明の文字認識装置における実施
例、第２図ａ〜ｅはサブパターン例を示す図、第
３図は特徴例を示す図である。１……光電変換部、２……パターンレジスタ、
３……線幅計算部、４……サブパターン抽出部、
５……文字枠検出部、６……特徴抽出部、７……
特徴マトリクス抽出部、８……識別部、９……文
字名出力。 FIG. 1 is an embodiment of the character recognition device of the present invention, FIGS. 2 a to 2e are diagrams showing examples of sub-patterns, and FIG. 3 is a diagram showing examples of characteristics. 1...Photoelectric conversion unit, 2...Pattern register,
3...Line width calculation section, 4...Sub pattern extraction section,
5...Character frame detection unit, 6...Feature extraction unit, 7...
Feature matrix extraction unit, 8...Identification unit, 9...Character name output.

Claims

[Claims]

1 Scan the character/figure pattern in the desired direction,
The sub-patterns are extracted in multiple directions by detecting the cross-section of the character line in the scanning direction and extracting a cross-section whose cross-sectional length is sufficiently longer than the character line width of the character/figure pattern. , scan in a predetermined direction using a point on the side of the character circumscribing frame of the character figure pattern as a starting point, detect the positions of all character lines on the scanning line, and detect the point on the side from which scanning started, and N, which is a value obtained by normalizing the distance to the detected character line by the size of the character circumscribing frame in the predetermined direction.
The process of extracting the sum of products (N is a constant) is performed for each of the sub-patterns, and for each scanning line, with all points on at least two of the four sides of the character circumscribing frame as the starting point. Then, for each sub-pattern, extract the array of the sum of N-th power for each scanning line with the side of the character circumscribing frame as a unit, and divide each of the extracted sum-of-N arrays into M pieces (M is a constant). By dividing and calculating the average value of the N-th power sum in each division unit and extracting it as an M-dimensional feature vector from each array, a group of feature vectors for the character figure pattern is extracted. A character recognition method characterized by recognizing characters by comparing a group of feature vectors with a dictionary expressed in the same format.