JPS6262392B2

JPS6262392B2 -

Info

Publication number: JPS6262392B2
Application number: JP56032481A
Authority: JP
Inventors: Yoshuki Yamashita; Koichi Higuchi
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1981-03-09
Filing date: 1981-03-09
Publication date: 1987-12-25
Also published as: JPS57147782A

Description

【発明の詳細な説明】本発明は高速で正確な文字認識方法に関するも
のである。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a fast and accurate character recognition method.

一般に文字図形の認識は文字図形パターンから
ストロークを抽出し、それら抽出されたストロー
クの位置、長さ、ストローク間の相互関係等の幾
何学的な特徴を抽出する事により行なわれてい
る。そしてストロークの抽出には、次のような２
つの方法が広く用いられている。 In general, recognition of characters and figures is performed by extracting strokes from character and figure patterns and extracting geometric features such as the positions, lengths, and mutual relationships between the strokes. And to extract the stroke, the following 2
Two methods are widely used.

文字図形の輪郭を追跡することにより検出さ
れた輪郭点系列について曲率を計算し、その曲
率の大きな値の点を分割点として輪郭系列を分
割し、分割された系列を組合わせることにより
ストロークの抽出を行う。 Calculate the curvature of the contour point series detected by tracing the contour of the character shape, divide the contour series using points with large curvature values as division points, and extract strokes by combining the divided series. I do.

文字図形パターンに細線化処理を行なつて骨
格化し、その骨格パターンの連結性及び骨格パ
ターンを追跡し、急激な角度の変化点等を検出
してストロークを抽出する。 A character/figure pattern is thinned to form a skeleton, and the connectivity and skeleton pattern of the skeleton pattern is traced to detect sudden angle changes and extract strokes.

しかしの方法は文字図形パターンが大きい場
合や複雑な場合には、その処理量が増大し処理速
度の低下を招くこと。またの方法は文字図形パ
ターンを細線化する必要があり、さらにその細線
化によるパターンの歪、ヒゲの発生等の問題があ
りその後の処理を複雑化させること。等の欠点を
有している。 However, when the character/graphic pattern is large or complex, the amount of processing increases and the processing speed decreases. In addition, the method requires thinning of the character/figure pattern, and furthermore, the thinning causes problems such as distortion of the pattern and generation of whiskers, which complicates subsequent processing. It has the following disadvantages.

そこでこれらの欠点を除去する方法として、文
字図形パターンの線巾Ｗを検出し、前記パターン
を所望の方向に走査し、パターンの当該走査方向
における記入点の長さｌと、前記線巾Ｗとの間
で、ｌnW（ｎは定数）を満足する記入点を取
り出すことにより走査方向のストローク成分を表
わすサブパターンを抽出し、さらに文字枠内を
（Ｍ×Ｎ）の領域（Ｍ，Ｎは整数）に分割して各
領域の文字線量を計算して認識する方法が提案さ
れている。第１図は漢字の「犬」について上記の
方法による認識の例を示したものである。Ａに示
す文字パターンより垂直方向サブパターンＢ及び
水平方向サブパターンＣを抽出した場合、文字パ
ターンＡのストロークイ及びロは各走査方向にお
いて記入点の連続の長さがＮωに満たないため、
サブパターンＢ及びＣのいずれにも属さないこと
となる。 Therefore, as a method to eliminate these defects, the line width W of the character/figure pattern is detected, the pattern is scanned in a desired direction, and the length l of the writing point of the pattern in the scanning direction and the line width W are calculated. A sub-pattern representing the stroke component in the scanning direction is extracted by extracting entry points that satisfy lnW (n is a constant) between ), and a method has been proposed in which the character dose in each area is calculated and recognized. FIG. 1 shows an example of recognition of the kanji character "dog" using the above method. When vertical sub-pattern B and horizontal sub-pattern C are extracted from the character pattern shown in A, strokes A and B of character pattern A have continuous lengths of points less than Nω in each scanning direction, so
This means that it does not belong to either subpatterns B or C.

すなわち、当該ストロークを表わす特徴が欠落
するため文字認識が困難であるという欠点を有し
ている。 That is, it has the disadvantage that character recognition is difficult because the features representing the stroke are missing.

本発明の目的はこれらの従来技術の欠点を除去
するため、各方向のストローク成分を表わすサブ
パターン抽出の際にさらに原パターンの黒ビツト
のうち前記サブパターンに属さない黒ビツトをも
つて未定義サブパターンを抽出し、この未定義サ
ブパターンを含めて特徴マトリツクスを作成し、
文字図形認識を行なうようにしたもので以下詳細
に説明する。 The purpose of the present invention is to eliminate these drawbacks of the prior art, and when extracting subpatterns representing stroke components in each direction, furthermore, among the black bits of the original pattern, black bits that do not belong to the subpatterns are undefined. Extract the subpattern, create a feature matrix including this undefined subpattern,
This system is designed to recognize characters and figures, and will be explained in detail below.

第３図は本発明の文字認識装置に於ける実施例
を示すものである。光信号入力１は光電変換部２
において２値の量子化されたデイジタル電気信号
に変換されパターンレジスタ３に格納される。そ
れと同時に線巾計算部４において入力パターンの
線巾が計算される。垂直サブパターン抽出部５は
パターンレジスタについて垂直スキヤンを全面行
なつて、黒ビツトの連続の長さと計算部４におい
て計算された線巾との関係より垂直サブパターン
（VSP）を抽出する。同様に水平サブパターン抽
出部６は水平スキヤンにより水平サブパターン
（HSP）を、右斜めサブパターン抽出部７は右斜
め45゜スキヤンにより右斜めサブパターン
（RSP）を、左斜めサブパターン抽出部８は左斜
め45゜スキヤンにより、左斜めサブパターン
（LSP）を抽出する。 FIG. 3 shows an embodiment of the character recognition device of the present invention. Optical signal input 1 is photoelectric conversion section 2
The signal is converted into a binary quantized digital electrical signal and stored in the pattern register 3. At the same time, the line width of the input pattern is calculated in the line width calculating section 4. The vertical sub-pattern extracting section 5 performs a vertical scan on the entire pattern register, and extracts a vertical sub-pattern (VSP) from the relationship between the continuous length of black bits and the line width calculated by the calculating section 4. Similarly, the horizontal sub-pattern extraction section 6 extracts the horizontal sub-pattern (HSP) by a horizontal scan, the right-diagonal sub-pattern extraction section 7 extracts the right-diagonal sub-pattern (RSP) by a 45° right-diagonal scan, and the left-diagonal sub-pattern extraction section 8 extracts the left diagonal subpattern (LSP) by performing a left diagonal 45° scan.

未定義サブパターン抽出部９はパターンレジス
タ３に格納される黒ビツトのうち前述の全てのサ
ブパターンに属さない黒ビツトをもつて未定義サ
ブパターン（USP）を抽出する。USPはHSP，
VSP，RSP，LSPと原パターンＰとの間に下記の
論理式(1)を適用することにより抽出する。 The undefined sub-pattern extractor 9 extracts an undefined sub-pattern (USP) from among the black bits stored in the pattern register 3 that do not belong to any of the aforementioned sub-patterns. USP is HSP,
The extraction is performed by applying the following logical formula (1) between VSP, RSP, LSP and the original pattern P.

USP＝Ｐ〓〓〓〓 …(1) 文字枠検出部１０はパターンレジスタ３内の文
字パターンに外接する文字枠を検出し、その結果
を文字枠分割決定部１１へ送る。文字枠分割決定
部１１は検出された文字枠内をＭ×Ｎの領域
（Ｍ，Ｎは整数、本実施例ではＭ＝Ｎ＝５）に分
割するためのＸ軸、Ｙ軸上の分割点座標を決定す
る。ここでＸ軸は文字枠の水平方向を、Ｙ軸は垂
直方向をそれぞれ示す。 USP=P〓〓〓〓 (1) The character frame detection unit 10 detects a character frame circumscribing the character pattern in the pattern register 3, and sends the result to the character frame division determination unit 11. The character frame division determining unit 11 determines dividing points on the X and Y axes for dividing the inside of the detected character frame into M×N areas (M and N are integers; in this embodiment, M=N=5). Determine the coordinates. Here, the X axis indicates the horizontal direction of the character frame, and the Y axis indicates the vertical direction.

特徴マトリクス抽出部１２は、文字枠分割決定
部１１により決定された分割点座標によりVSP，
HSP，RSP，LSP，USPの各サブパターンレジス
タ上の文字枠領域をＭ×Ｎの領域に分割し、各領
域の黒ビツト数Bijを計数し、線巾Ｗを使用して
式(2)により文字線長を示す特徴を計算し、Ｍ×Ｎ
×５次元の特徴マトリクスLijを作成する。 The feature matrix extraction unit 12 extracts VSP,
Divide the character frame area on each subpattern register of HSP, RSP, LSP, and USP into M×N areas, count the number of black bits Bij in each area, and use the line width W to calculate the Calculate the feature indicating the character line length, M×N
Create a ×5-dimensional feature matrix Lij.

Ｌ_ij＝Ｂ_ij／Ｗ …(2) その後、VSP特徴マトリクスは文字枠のｘ軸方
向の長さΔＸで、HSP特徴マトリクスはｙ軸方向
の長さΔＹで、RSP，LSP及びUSP特徴マトリク
スは√（）^２＋（）^２でそれぞれ正規化を
行ない最終的にＭ×Ｎ×５次元の正規化特徴マト
リクスを作成する。 L _ij = B _ij /W...(2) Then, the VSP feature matrix is the length ΔX of the character frame in the x-axis direction, the HSP feature matrix is the length ΔY in the y-axis direction, and the RSP, LSP, and USP feature matrices are Normalization is performed using √() ² +() ² to finally create an M×N×5-dimensional normalized feature matrix.

識別部１３は標準文字マスク（ｆ_n）と正規化
特徴マトリクス（ｆ_i）との間に式(3)で定義され
る距離（Ｄ）を適用し、Ｄが最小の値となるよう
な標準文字マスクのカテゴリ名を文字名出力１４
に出力する。 The identification unit 13 applies the distance (D) defined by equation (3) between the standard character mask (f _n ) and the normalized feature matrix (f _i ), and selects a standard such that D is the minimum value. Output character mask category name as character name 14
Output to.

Ｄ＝√（_n−_i）^２ …(3) このようにして入力文字パターンの認識を行う
のである。 D=√( _n − _i ) ² (3) In this way, the input character pattern is recognized.

第３図は漢字「犬」の本発明の方法による認識
の例を示したものである。Ａが原パターン、Ｂが
VSP，ＣがHSP，ＤがRSP，ＥがLSP，ＦがUSP
をそれぞれ示す。 FIG. 3 shows an example of recognition of the kanji character "dog" by the method of the present invention. A is the original pattern, B is
VSP, C is HSP, D is RSP, E is LSP, F is USP
are shown respectively.

以上説明したように本実施例では、未定義サブ
パターン（USP）を導入したことにより従来の文
字認識方法にみられたストロークの欠落がなくな
るという利点がある。 As described above, this embodiment has the advantage that the introduction of the undefined subpattern (USP) eliminates the lack of strokes seen in conventional character recognition methods.

本発明は文字パターンより所望の方向のストロ
ークを表わすサブパターンを抽出する際にさらに
未定義サブパターンを簡単な論理式を適用するこ
とにより抽出し、これを含めた特徴マトリクスを
作成して認識を行なうので、ストロークの欠落の
防止ばかりでなく簡単で、正確な文字認識を可能
にするというすぐれた効果を有する。 When extracting subpatterns representing strokes in a desired direction from a character pattern, the present invention further extracts undefined subpatterns by applying a simple logical formula, and creates a feature matrix that includes these subpatterns for recognition. This has the excellent effect of not only preventing missing strokes but also enabling simple and accurate character recognition.

[Brief explanation of the drawing]

第１図は従来の方法によるサブパターン抽出の
説明図、第２図は本発明に係る実施例の構成図、
第３図は本発明によるサブパターン抽出例をそれ
ぞれ示したものである。１……光信号入力、２……光電変換部、３……
パターンレジスタ、５〜９……サブパターン抽出
部、１０……文字枠検出部、１１……文字枠分割
決定部、１２……特徴マトリクス抽出部、１３…
…識別部、１４……文字名出力。 FIG. 1 is an explanatory diagram of sub-pattern extraction by a conventional method, and FIG. 2 is a configuration diagram of an embodiment according to the present invention.
FIG. 3 shows examples of sub-pattern extraction according to the present invention. 1... Optical signal input, 2... Photoelectric conversion section, 3...
Pattern register, 5 to 9...Subpattern extraction unit, 10...Character frame detection unit, 11...Character frame division determination unit, 12...Feature matrix extraction unit, 13...
...Identification part, 14...Character name output.

Claims

[Claims]

1. A digital signal obtained by photoelectrically converting and quantizing a character figure is stored in a pattern register as an original pattern, and a first sub-pattern representing stroke components in each direction from the original pattern, and the first sub-pattern are A second sub-pattern representing a black bit component in the original pattern that does not belong to is extracted and stored in a sub-pattern register, and the area within the character frame of the original pattern is divided into M×N areas ( M and N are integers), the feature amount representing the character line length in each region is normalized by the character size to create a feature matrix, and the feature matrix is compared with a standard character matrix to determine the character shape. A character recognition method that recognizes .