JPS638514B2

JPS638514B2 -

Info

Publication number: JPS638514B2
Application number: JP56134840A
Authority: JP
Inventors: Koichi Higuchi; Yoichi Yamada; Yoshuki Yamashita
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1981-08-29
Filing date: 1981-08-29
Publication date: 1988-02-23
Also published as: JPS5837780A

Description

【発明の詳細な説明】本発明は、簡単でかつ高速で正確な文字認識方
式に関するものである。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a simple, fast and accurate character recognition method.

一般に文字図形の認識は文字図形パターンから
ストロークを抽出し、それら抽出されたストロー
クの位置、長さ、ストローク間の相互関係等の幾
何学的な特徴を抽出する事により行なわれてい
る。そしてストロークの抽出には、次のような２
つの方法が広く用いられている。 In general, recognition of characters and figures is performed by extracting strokes from character and figure patterns and extracting geometric features such as the positions, lengths, and mutual relationships between the strokes. And to extract the stroke, the following 2
Two methods are widely used.

文字図形の輪郭を追跡することにより検出さ
れた輪郭点系列について曲率を計算し、その曲
率の大きな値の点を分割点として輪郭系列を分
割し、分割された系列を組合わせることにより
ストロークの抽出を行なう。 Calculate the curvature of the contour point series detected by tracing the contour of the character shape, divide the contour series using points with large curvature values as division points, and extract strokes by combining the divided series. Do this.

文字図形パターンに細線化処理を行なつて骨
格化し、その骨格パターンの連結性及び骨格パ
ターンを追跡し、急激な角度の変化点等を検出
してストロークを抽出する。 A character/figure pattern is thinned to form a skeleton, and the connectivity and skeleton pattern of the skeleton pattern is traced to detect sudden angle changes and extract strokes.

しかし、の方法は文字図形パターンが大きい
場合や複雑な場合には、その処理量が増大し処理
速度の低下を招くこと、またの方法は文字図形
パターンを細線化する必要があり、さらにその細
線化によるパターンの歪、ヒゲの発生等の問題が
ありその後の処理を複雑化させること、等の欠点
を有している。 However, when the character/graphic pattern is large or complex, the method requires an increased amount of processing and slows down the processing speed. This method has drawbacks such as distortion of the pattern and generation of whiskers due to oxidation, which complicates subsequent processing.

そこでこれらの欠点を除去する方法として、文
字図形パターンの線巾Ｗを検出し、前記パターン
を所望の方向に走査し、パターンの当該走査方向
における記入点の長さｌと、前記線巾Ｗとの間
で、ｌnW（ｎは定数）を満足する記入点を取
り出すことにより走査方向のストローク成分を表
わすサブパターンを抽出し、さらに文字枠内を
（Ｍ×Ｎ）の領域（Ｍ，Ｎは整数）に分割して各
領域の文字線量を計算して認識する方法が提案さ
れている。 Therefore, as a method to eliminate these defects, the line width W of the character/figure pattern is detected, the pattern is scanned in a desired direction, and the length l of the writing point of the pattern in the scanning direction and the line width W are calculated. A sub-pattern representing the stroke component in the scanning direction is extracted by extracting entry points that satisfy lnW (n is a constant) between ), and a method has been proposed in which the character dose in each area is calculated and recognized.

しかし、第１図に漢字の“木”の例を示すよう
に、正しくはＡの垂直サブパターンが同図Ｂの位
置にあるべきものが、同図Ｃのように文字が変形
するとその垂直サブパターンはＤのように分割領
域の位置がずれ、すなわち、抽出する特徴が不安
定となり、文字識別を困難にしていた。このよう
な文字の変形を吸収するためには、以後の処理が
複雑になり、ひいては処理速度の低下を招いてい
た。 However, as shown in the example of the kanji ``木'' in Figure 1, the vertical sub-pattern of A should be at position B in the figure, but when the character is transformed as shown in C in the figure, the vertical sub-pattern is In the pattern, as shown in D, the positions of the divided regions are shifted, that is, the extracted features become unstable, making character identification difficult. In order to absorb such character deformation, subsequent processing becomes complicated, which in turn causes a decrease in processing speed.

本発明の目的は、この従来の欠点を除去するた
めに前記特徴マトリクスにぼけ処理を施し、その
特徴マトリクスを用いた文字認識を行なうように
したもので、以下詳細に説明する。 An object of the present invention is to perform blurring processing on the feature matrix and perform character recognition using the feature matrix in order to eliminate this conventional drawback, and will be described in detail below.

第２図は、本発明の文字認識装置における実施
例を示すものである。光信号入力１は、光電変換
部２において２値の量子化されたデイジタル電気
信号に変換され、パターンレジスタ３に格納され
る。それと同時に、線巾計算部４において入力パ
ターンの線巾が計算される。垂直サブパターン抽
出部５は、パターンレジスタについて垂直スキヤ
ンを全面行なつて、黒ビツトの連続の長さと計算
部４において計算された線巾との関係より垂直サ
ブパターン（VSP）を抽出する。同様に、水平
サブパターン抽出部６は水平スキヤンにより水平
サブパターン（HSP）を、右斜めサブパターン
抽出部７は右斜め45゜スキヤンにより右斜めサブ
パターン（RSP）を、左斜めサブパターン抽出
部８は左斜め45゜スキヤンにより左斜めサブパタ
ーン（LSP）を抽出する。 FIG. 2 shows an embodiment of the character recognition device of the present invention. The optical signal input 1 is converted into a binary quantized digital electrical signal by the photoelectric converter 2 and stored in the pattern register 3 . At the same time, the line width calculation section 4 calculates the line width of the input pattern. The vertical sub-pattern extraction section 5 performs a vertical scan on the entire pattern register, and extracts a vertical sub-pattern (VSP) from the relationship between the continuous length of black bits and the line width calculated by the calculation section 4. Similarly, the horizontal sub-pattern extraction section 6 extracts the horizontal sub-pattern (HSP) by a horizontal scan, the right-diagonal sub-pattern extraction section 7 extracts the right-diagonal sub-pattern (RSP) by a 45° right-diagonal scan, and the left-diagonal sub-pattern extraction section 8 extracts a left diagonal subpattern (LSP) by scanning at a left diagonal of 45 degrees.

文字枠検出部９はパターンレジスタ３内の文字
パターンに外接する文字枠を検出し、その結果を
文字枠分割決定部１０へ送る。文字枠分割決定部
１０は検出された文字枠内をＭ×Ｎの領域（Ｍ，
Ｎは整数、本実施例ではＭ＝Ｎ＝５）に分割する
ためのＸ軸、Ｙ軸上の分割点座標を決定する。こ
こでＸ軸は文字枠の水平方向を、Ｙ軸は垂直方向
をそれぞれ示す。 The character frame detection section 9 detects a character frame circumscribing the character pattern in the pattern register 3, and sends the result to the character frame division determination section 10. The character frame division determination unit 10 divides the inside of the detected character frame into an M×N area (M,
N is an integer (in this embodiment, M=N=5), and the dividing point coordinates on the X and Y axes are determined. Here, the X axis indicates the horizontal direction of the character frame, and the Y axis indicates the vertical direction.

特徴マトリクス抽出部１１は、文字枠分割決定
部１０により決定された分割点座標によりVSP，
HSP，RSP，LSPの各サブパターンレジスタ上
の文字枠領域をＭ×Ｎの領域に分割し、各領域の
黒ビツト数Bijを計数し、線巾Ｗを使用して式(1)
により文字線長を示す特徴を計算し、Ｍ×Ｎ×４
次元の特徴マトリクスを作成する。 The feature matrix extraction unit 11 extracts VSP,
Divide the character frame area on each sub-pattern register of HSP, RSP, and LSP into M×N areas, count the number of black bits Bij in each area, and use the line width W to obtain the formula (1).
Calculate the feature indicating the character line length by M×N×4
Create a dimensional feature matrix.

Lij＝Bij／Ｗ (1) その後、VSP特徴マトリクスは文字枠のｙ軸
方向の長さΔYで、HSP特徴マトリクスはｘ軸方
向の長さΔXで、RSP及びLSP特徴マトリクスは
√（）²＋（）²でそれぞれ正規化を行ない最終
的にＭ×Ｎ×４次元の正規化特徴マトリクスを作
成する。 Lij=Bij/W (1) Then, the VSP feature matrix is the length of the character frame in the y-axis direction ΔY, the HSP feature matrix is the length in the x-axis direction ΔX, and the RSP and LSP feature matrices are √() ² + Normalization is performed in steps () ² and finally an M×N×4-dimensional normalized feature matrix is created.

ぼけ処理部１２は、特徴マトリクス作成部１１
で作成された正規化特徴マトリクスにぼけ処理を
施す。本実施例では、次に示すような方法でサブ
パターンの抽出方向に対して垂直な方向にぼか
す。すなわち、水平サブパターンより作成した水
平特徴マトリクスの要素Mijについて、次式(2)を
適用してぼけ特徴マトリクスの要素M′ijを計算す
る。 The blur processing unit 12 includes the feature matrix creation unit 11
Perform blur processing on the normalized feature matrix created in . In this embodiment, blurring is performed in the direction perpendicular to the sub-pattern extraction direction using the following method. That is, for the element Mij of the horizontal feature matrix created from the horizontal sub-pattern, the element M'ij of the blur feature matrix is calculated by applying the following equation (2).

M′ij＝CMij＋Mi，_j-1＋Mi，_j+1／Ｃ＋２ (2) 同様に、垂直サブパターンより作成した垂直特
徴マトリクスについては(3)式、右斜めサブパター
ンより作成した右斜め特徴マトリクスについては
(4)式、左斜めサブパターンより作成した左斜め特
徴マトリクスについては(5)式を適用して、ぼけ特
徴マトリクスの要素M′ijを計算する。M′ij=CMij+Mi, _j-1 +Mi, _j+1 /C+2 (2) Similarly, equation (3) is for the vertical feature matrix created from the vertical subpattern, and equation (3) is for the right diagonal feature matrix created from the right diagonal subpattern. teeth
For the left diagonal feature matrix created from the left diagonal subpattern using formula (4), formula (5) is applied to calculate the element M′ij of the blur feature matrix.

M′ij＝CMij＋M_i-1，ｊ＋M_i+1，ｊ／Ｃ＋２ (3) M′ij＝CMij＋M_i-1,j+1＋M_i+1,j-1／Ｃ＋２ (4) M′ij＝CMij＋M_i-1,j-1＋M_i+1,j+1／Ｃ＋２ (5) 但しＣは定数、また最も外側の領域について計
算する場合は、さらに外側の領域の要素がMijと
等しいものとみなす。M′ij=CMij+M _i-1 ,j+M _i+1 ,j/C+2 (3) M′ij=CMij+M _i-1,j+1 +M _i+1,j-1 /C+2 (4) M′ij=CMij+M _i-1,j-1 +M _i+1,j+1 /C+2 (5) However, C is a constant, and when calculating for the outermost region, it is assumed that the elements of the further outer region are equal to Mij.

すなわち、上記各式右辺における特徴マトリク
ス要素値M_pqの添字ｐ，ｑについて、ｐ＞Ｍまた
はｑ＞Ｎまたはｐ＜１またはｑ＜１が成立する場
合は、M_pq＝M_ijとする。 That is, when p>M or q>N or p<1 or q<1 holds true for the subscripts p and q of the feature matrix element value M _pq on the right side of each of the above equations, M _pq = M _ij .

上記(2)式は、注目する水平特徴マトリクス要素
値M_ijと、それに垂直方向に隣接する要素値
M_i,j-1，M_i,j+1とを用いて、重み付けした平均値で
ぼかすことを意味する。各要素値は水平サブパタ
ーンの各領域における長さと対応づけられる量な
ので、各要素の長さを垂直方向へ再配分するもの
と考えてよい。同様に垂直特徴マトリクスの場合
は水平方向へ、右斜めは左斜めに、左斜めは右斜
めに再配分する。Ｃの値は、代表的には４であ
り、この例では一律にＣ＝４としている。 Equation (2) above is based on the horizontal feature matrix element value M _ij of interest and the element value adjacent to it in the vertical direction.
This means blurring with a weighted average value using M _i,j-1 and M _i,j+1 . Since each element value is a quantity associated with the length in each region of the horizontal sub-pattern, it can be thought of as redistributing the length of each element in the vertical direction. Similarly, in the case of a vertical feature matrix, the right diagonal is redistributed to the left diagonally, and the left diagonal is redistributed to the right diagonal. The value of C is typically 4, and in this example, C=4.

識別部１３は、標準文字マスク（fm）とぼけ
特徴マトリクス（fi）との間に式(6)で定義される
距離Ｄを適用し、Ｄが最小の値となるような標準
文字マスクのカテゴリ名を文字出力１４に出力す
る。 The identification unit 13 applies the distance D defined by equation (6) between the standard character mask (fm) and the blur feature matrix (fi), and selects the category name of the standard character mask such that D is the minimum value. is output to the character output 14.

Ｄ＝√（_n−_i）² (6) 以上説明したように、本実施例では正規化特徴
マトリクスにぼけ処理を施しているので、文字の
変形を吸収できるという利点がある。 D=√( _n − _i ) ² (6) As explained above, in this embodiment, since the blurring process is applied to the normalized feature matrix, there is an advantage that deformation of characters can be absorbed.

本発明は、正規化特徴マトリクスを作成したの
ちに、さらに簡単な変換式を適用することにより
ぼけ処理を施し、このぼけ特徴マトリクスを用い
て識別を行なうので文字の変形を吸収することが
でき、簡単で正確な文字認識装置が実現できる。 In the present invention, after creating a normalized feature matrix, blur processing is performed by applying a simpler conversion formula, and identification is performed using this blurred feature matrix, so that deformation of characters can be absorbed. A simple and accurate character recognition device can be realized.

[Brief explanation of the drawing]

第１図は従来の文字認識方式の説明図、第２図
は本発明に係る実施例の構成図である。１……光信号入力、２……光電変換部、３……
パターンレジスタ、４……線巾計算部、５〜８…
…サブパターン抽出部、９……文字枠検出部、１
０……文字枠分割決定部、１１……特徴マトリク
ス抽出部、１２……ぼけ処理部、１３……識別
部、１４……文字名出力。 FIG. 1 is an explanatory diagram of a conventional character recognition system, and FIG. 2 is a configuration diagram of an embodiment according to the present invention. 1... Optical signal input, 2... Photoelectric conversion section, 3...
Pattern register, 4... Line width calculation section, 5 to 8...
...Sub pattern extraction section, 9...Character frame detection section, 1
0...Character frame division determination unit, 11...Feature matrix extraction unit, 12...Blur processing unit, 13...Identification unit, 14...Character name output.

Claims

[Claims]

1. Store a digital signal obtained by photoelectrically converting and quantizing a character figure in a pattern register as an original pattern, extract sub-patterns representing stroke components in each direction from the original pattern and store them in the sub-pattern register, The region within the character frame of the original pattern is divided into M×N regions (M and N are integers) for the sub-pattern register, and the feature amount representing the character line length in each region is normalized by the character size. In the character recognition method, a feature matrix is created using a standard character matrix, and the character shape is recognized by comparing the feature matrix with a standard character matrix. A character recognition method characterized by identifying characters and shapes using