JPS5822479A

JPS5822479A - Character recognition device

Info

Publication number: JPS5822479A
Application number: JP56121615A
Authority: JP
Inventors: Keiji Kobayashi; 啓二小林; Masataka Yamamoto; 山本　勝敬
Original assignee: Computer Basic Technology Research Association Corp
Current assignee: Computer Basic Technology Research Association Corp
Priority date: 1981-08-03
Filing date: 1981-08-03
Publication date: 1983-02-09
Also published as: JPH026113B2

Abstract

PURPOSE:To recognize handwritten characters with high precision by extracting straight segments from an input character pattern, and performing the recognition by using middle-point patterns of those strokes. CONSTITUTION:The picture signal of an input character on a slip 11 obtained through the scanning of a scanning means 12 is used to make the input character pattern into thin lines by a preprocessing means 13. Then, a stroke extracting means 14 finds feature points of terminal points, branch points, inflection points, etc., and then finds segments connecting those points. Further, the means 14 checks the direction of each segment at the feature points except the terminal points to concatenate the segments, thus extracting the strokes. Then, a stroke middle-point pattern from the center position of the stroke is generated and sent to a determining means 15. The means 15 finds the similarity of this stroke middle-point pattern to that of a previously stored reference character to decide on what is the input character.

Description

【発明の詳細な説明】この発明は、直線部分の多い文字を認識する文字ｇ識装
置に関するものであり、さらに詳しくは手書き漢字を認
識する文字認識装置゛に関するものである。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a character recognition device for recognizing characters with many straight lines, and more particularly to a character recognition device for recognizing handwritten Chinese characters.

従来、漢字を認識する装置、特に印刷漢字を認Ｒ−ｒる
装置では、パターンマツチング法が用いられていた。印
刷漢字のように字形が一定のものＫはこの方法は有効で
あった。しかし、第１図に示すように記入枠１内の基準
文字パターン２に対して入力文字パターン３が少しでも
傾いている場合、両者の類似度は小さくなる。また、第
２図に示すように基準文字パターン４に対して入力文字
パターン５の線幅が異なる場合も両者の類似度は小さく
なる。Conventionally, a pattern matching method has been used in devices that recognize Chinese characters, particularly devices that recognize printed Chinese characters. This method was effective for characters with a constant shape, such as printed kanji. However, as shown in FIG. 1, if the input character pattern 3 is even slightly tilted with respect to the reference character pattern 2 within the entry frame 1, the degree of similarity between the two becomes small. Furthermore, as shown in FIG. 2, when the input character pattern 5 has a different line width from the reference character pattern 4, the degree of similarity between the two also decreases.

したがって、手書き漢字のように各ストロークが基準パ
ターンに対して傾いていたり、線幅が一定でない文字に
ついては、パターンマツチング法では高い認識率を得る
ことは困難であるという欠点があった。Therefore, the pattern matching method has the disadvantage that it is difficult to obtain a high recognition rate for characters such as handwritten Chinese characters in which each stroke is tilted with respect to the reference pattern or the line width is not constant.

この発明は、これらの欠点を除去するため、入力文字パ
ターンより直線線分（以後スト、−りと呼ぶ）を抽出し
、このストロークの中点パターンを用いて認識を行うよ
５Ｋしたことを特徴とし、その目的は高い読み取り精度
の文字認識装置を提供するととＫある。以下、図面を用
いてこの発明の詳細な説明する。In order to eliminate these drawbacks, this invention is characterized by extracting straight line segments (hereinafter referred to as strokes) from the input character pattern and performing recognition using the midpoint pattern of these strokes. The purpose is to provide a character recognition device with high reading accuracy. Hereinafter, the present invention will be explained in detail using the drawings.

第３図はこの発明の装置の一実施例をブロック図で示し
た構成図である。まず、帳票１１上の入力文字を走査手
段１２で走査し、得られた入力文字の画像信号を用いズ
前処理手段１３により入力文字パターンを細線化する０
次に、ス）ｐ−り抽出手段１４によって細線化された文
字パターンからストロークを抽出し、その中心位置から
ストローク中点パターンを作成して決定手段１５に送今
。FIG. 3 is a block diagram showing an embodiment of the apparatus of the present invention. First, the input characters on the form 11 are scanned by the scanning means 12, and the input character pattern is thinned by the pre-processing means 13 using the obtained image signal of the input characters.
Next, the strokes are extracted from the thinned character pattern by the p-ri extraction means 14, and a stroke midpoint pattern is created from the center position and sent to the determination means 15.

決定子！Ｒ１５では、上記ストーーり中点パターンと、
あらかじめ記憶されている基準文字のス）ｓ−−り中点
パターンとの類似度を求め、入力文字が何であるかを決
定する。Determinant! In R15, the above-mentioned stalling midpoint pattern,
The degree of similarity with a pre-stored standard character midpoint pattern is determined to determine what the input character is.

第４図は漢字の１有”を上記前処理手段１３により細線
化した例を示すものであり、細線化された入力文字パタ
ーンの文字部１６を＠１″で示している。FIG. 4 shows an example in which the kanji character ``1'' is thinned by the preprocessing means 13, and the character portion 16 of the thinned input character pattern is indicated by @1''.

第５図は上記細線化された入力文字パターンの文字部１
６から上記ス）９−り抽出手段１４により端点１分岐点
、屈折点などの特徴点を求め、特徴点間を結んだ線分（
以後セグメントと呼ぶ）を抽出した結果を示したもので
あり、特徴点１１．。Figure 5 shows character part 1 of the thinned input character pattern above.
From step 6 to step 9 above, feature points such as end points, branch points, and refraction points are obtained by the extraction means 14, and a line segment connecting the feature points (
This figure shows the result of extracting feature points 11. .

１７、、　１　Ｆ、、　　１７．、−・・は１本”で示
し、各セグメント１８は＠Ａ″〜＠Ｎ″で示した０例え
ば、特徴点１１、に連結しているセグメントはｓ　”Ａ
”で示されるセグメント１９．＠Ｂ”で示されるセグメ
ント２ｏ、＠ｃ″で示されるセグメント２１．＠Ｄ”で
示されるセグメント２２である。17,, 1 F,, 17. , -... are indicated by one line ", and each segment 18 is indicated by @A" to @N". For example, a segment connected to the feature point 11 is indicated by s "A".
Segment 19 indicated by "@B", segment 2o indicated by "@c", segment 21 indicated by "@c", segment 22 indicated by "@D".

さらに、ストー−り抽出手段１４では、端点な除く特徴
点における各セグメントの方向を調べてセグメントを結
合することＫよってストロークを１　抽出する。スト・
−りの抽出は以下のように行う。Furthermore, the stroke extracting means 14 extracts one stroke by examining the direction of each segment at feature points excluding end points and combining the segments. Strike
- Extraction of ri is carried out as follows.

まず、ある特徴点ＰＩ　に連結しているセグメント対（
２個のセグメントを合わせてセグメント対と呼ぶ）Ｋつ
いて、第１のセグメントにつぃ文は、４１１像点ＰＩ　
を終点とし、他端（終端）をＰＪ　とし１１．→　　− た第１のセグメントの方向ベクトルＰｊ　Ｐ、を求め、
第２のセグメン）Ｋついては特徴点Ｐｉ　を始点とし、
他端（終端）をＰｋとした第２のセグメントの方向ベク
トルＰＩＰｋを求める。特徴点Ｐ、　ＫおけるＩＩＩＥ
Ｉのセグメントと第２のセグメントの方向ペタトルのな
す角をＯとしたとき、上記２つのセグメントの方向の一
致度をＣＳＯ＃で定義し、特徴点Ｐ、に連結したすべて
のセグメントの中から一致度が最も大きく、かつ所定の
しきい値以上のセグメント対を結合してストーーりとす
る。さらに上記セグメント対を除いたセグメントについ
て上記ストｐ−夕を求める処理を繰り返す。First, a pair of segments connected to a certain feature point PI (
For K, the first segment has 411 image points PI.
11. Set the end point to PJ and the other end (terminal end) to PJ. → − Find the direction vector Pj P of the first segment,
For the second segment) K, the starting point is the feature point Pi,
A direction vector PIPk of the second segment with the other end (terminus) as Pk is determined. IIIE at feature points P and K
When the angle formed by the direction petator of the segment I and the second segment is O, the degree of coincidence of the directions of the above two segments is defined by CSO#, and a match is found among all segments connected to the feature point P. A pair of segments with the highest degree and a predetermined threshold value or more are combined to form a stalemate. Further, the process of determining the above-mentioned strike points is repeated for segments other than the above-mentioned segment pairs.

第６図はストローク抽出手段１４によって上記方法で求
められたストロークを示す図である。すなわち、第５図
のセグメント１９と２２が結合されて＠１１１＋で示さ
れるストρ−り２３が抽出され、同じくセグメント２０
と２１が結合されて１２″で示されるストローク２４が
抽出される。抽出された各ストロークは１１′〜ｍＢ１
ｍで示している。FIG. 6 is a diagram showing strokes determined by the stroke extraction means 14 using the above method. That is, segments 19 and 22 in FIG. 5 are combined to extract the string ρ-23 indicated by @111+,
and 21 are combined to extract a stroke 24 indicated by 12''. Each extracted stroke is 11' to mB1.
It is indicated by m.

次に、これら各ストロークからストローク中点を求める
。ストローク中点の座標は、ストロークの始点の座標を
（Ｘ□Ｙ、）、終点の座標を（Ｘ、。Next, the stroke midpoint is determined from each of these strokes. The coordinates of the stroke midpoint are the coordinates of the start point of the stroke (X□Y,) and the coordinates of the end point (X,).

ｙｔ　）とすると、（（Ｘｓ　＋　Ｘｔ　）／２−　　
（Ｙｔ　十Ｙｔ　）／２）で表わされる。yt ), then ((Xs + Xt )/2−
(Yt +Yt)/2).

第７図は上記処理によって抽出した入力文字パターンの
ストローク中点パターン２５を示す図であり、ストロー
ク中点には第６図のストローク番号に対応した番号な付
している。FIG. 7 is a diagram showing a stroke midpoint pattern 25 of the input character pattern extracted by the above processing, and the stroke midpoints are numbered in correspondence with the stroke numbers in FIG. 6.

第８図は同様の処理で得られた漢字Ｔの基準文字のスト
ローク中点パターン３４を示す図である。決定手段１５
においては、上記入力文字のストローク中点パターン２
５と、決定手段ＩＳＫ格納されている基準文字のストロ
ーク中点パターン３４とｂｉら類似度を求めて入力文字
が何であるかを決定する。FIG. 8 is a diagram showing the stroke midpoint pattern 34 of the reference character of the Kanji T obtained by the same process. Determination means 15
, stroke midpoint pattern 2 of the input character above
5, the stroke midpoint pattern 34 of the reference character stored in the determining means ISK, and bi, the similarity is determined to determine what the input character is.

具体的には、まず基準文字の各ストローク中点３５〜４
２に対して入力文字の各ストローク中点２６〜３３のう
ち距離の最も近い点を対応付ける。Specifically, first, each stroke midpoint of the reference character is 35 to 4.
2 is associated with the closest point among the stroke midpoints 26 to 33 of the input character.

この例では、ストローク中点２６〜３３が基準文字のス
トローク中点３ｓ〜４２に対応付けられる。In this example, stroke midpoints 26 to 33 are associated with stroke midpoints 3s to 42 of the reference character.

次に、上記対応付けられた２点間の距離を加算し、その
逆数を基準文字に対する入力文字の類似度とする。最後
に、ｌ１１４Ｊｉ度が最も大きい値を持つ基準文字を認
識文字として決定する。このようＫ、安定なストローク
中点を利用して認識しているので、従来に比較して高い
認識精度を得ることができる。Next, the distance between the two correlated points is added, and the reciprocal of the distance is taken as the degree of similarity of the input character to the reference character. Finally, the reference character with the largest l114Ji degree is determined as the recognized character. Since recognition is performed using the stable stroke midpoint, higher recognition accuracy can be obtained than in the past.

なお、上記実施例では手書き漢字を認識する場合につい
て説明したが、この発明は、これに限らず直線線分の多
い文字、例えば、手書きカタカナ文字等の認識に使用し
てもよい。Although the above embodiment describes the case of recognizing handwritten kanji characters, the present invention is not limited to this and may be used to recognize characters with many straight line segments, such as handwritten katakana characters.

また、決定手段１ｓの決定方法として、入力文字のスト
ーーり中点パターンと基準文字のストーーり中点パター
ンとの距離から類似度を求めて決定する方法について説
明したが、この発明はこれに隈らず、基準文字のあらか
じめ重み付けされたストーーり中点パターンと入力文字
のストローク中点パターンとな直接重ね台わせて類似度
を求めることによって決定する方法を使用してもよい。Furthermore, as a determining method of the determining means 1s, a method has been described in which the degree of similarity is determined from the distance between the stuck midpoint pattern of the input character and the staggered midpoint pattern of the reference character. Alternatively, a method may be used in which the pre-weighted stroke midpoint pattern of the reference character is directly superimposed on the stroke midpoint pattern of the input character to determine the degree of similarity.

以上説明したように１この発明によれば細線化した後に
抽出したストローク中点パターンを用いて認識するよう
にしたので、線分の傾きの小さな変動とか、線幅の変動
に対して安定であるとともに１端点や分岐点等の特徴点
を特徴とする方法の欠点である文字線の結合や分離に影
響されＫくいので、高い精度で手書き文字を認識できる
という利点がある。As explained above, 1.According to this invention, recognition is performed using the stroke midpoint pattern extracted after line thinning, so it is stable against small fluctuations in the slope of line segments and fluctuations in line width. In addition, this method has the advantage that handwritten characters can be recognized with high accuracy because it is less affected by the combination or separation of character lines, which is a disadvantage of methods characterized by feature points such as one end point or a branch point.

[Brief explanation of the drawing]

第１図および第２図は入力文字パターンの変動例を説明
するための図、第３図はこの発明の装置の一実施例をブ
ロック図で示した構成図、第４図は細線化された入力文
字パターンの例を示す図、第５図はセグメントを抽出し
た入力文字パターンの例を示す図、第６図はストローク
を抽出した入力文字パターンの例を示す図、第７図は入
力文字のストーーり中点パターンの例を示す図、第８図
は基準文字のス）９−り中点パターンの例を示す図であ
る。図中、１１は帳票、１２は走査手段、１３は前処理手段
、１４はストローク抽出手段、１５は決定手段である。なお、図中の同一符号は同一または和尚部分を示す。代理人　葛野信−（ほか１名）第１図　　　第２図第３図第４図第６図１７゜Figures 1 and 2 are diagrams for explaining examples of variations in input character patterns, Figure 3 is a block diagram showing an embodiment of the device of the present invention, and Figure 4 is a diagram with thin lines. Figure 5 is a diagram showing an example of an input character pattern, Figure 5 is a diagram showing an example of an input character pattern with segments extracted, Figure 6 is a diagram showing an example of an input character pattern with strokes extracted, Figure 7 is a diagram showing an example of an input character pattern with strokes extracted. FIG. 8 is a diagram showing an example of a midpoint pattern of a standard character. In the figure, 11 is a form, 12 is a scanning means, 13 is a preprocessing means, 14 is a stroke extraction means, and 15 is a determining means. Note that the same reference numerals in the figures indicate the same or similar parts. Agent Makoto Kuzuno (1 other person) Figure 1 Figure 2 Figure 3 Figure 4 Figure 6 Figure 17゜

Claims

[Claims]

Character U that recognizes characters recorded on recording media such as forms
The recognition device 1 includes a scanning means for scanning and photoelectrically converting the characters, a preprocessing means for thinning the input character pattern obtained by the scanning means, and a straight line segment from the thinned input character pattern. a stroke extracting means for extracting;
determining means for determining a character using the midpoint pattern of the straight line segment obtained by the stroke extraction means, the stroke extraction means Extract feature points such as end point 9-segment line points and refraction points, find a line segment that connects them, and combine pairs of line segments that connect to feature points other than end points in the same direction to form a straight line. extracting a line segment and recognizing a character using the degree of similarity between the midpoint pattern of the straight line segment and the midpoint pattern of the straight line segment of a reference character pattern stored in advance in the determining means; Character recognition device.