JPH0425588B2

JPH0425588B2 -

Info

Publication number: JPH0425588B2
Application number: JP57169510A
Authority: JP
Inventors: Yoshihisa Fujii; Eiichiro Yamamoto; Hiroshi Kamata
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1982-09-30
Filing date: 1982-09-30
Publication date: 1992-05-01
Also published as: JPS5960574A

Description

[Detailed description of the invention]

(1) 発明の技術分野本発明は文字認識方式、特に手書きのひらがな
や漢字を光学的に読み取る光学文字読取装置の上
記ひらがなと漢字を判別する方式に関する。 (2) 従来技術と問題点従来の光学文字読取装置においては、ひらがな
と漢字を一つの手段で認識していたが、ひらがな
と漢字とでは一般的には画数等の複雑さ、曲線部
分の多少のように明らかな相違点がみられる。従つて両者を同一手段で認識していたのでは、
相違点が明確に把握できずに、良好な認識結果が
得られず、かつ認識結果に対し、漢字のみから成
る単語処理のような文字の後処理も正確に行なう
ことはできないという問題点があつた。 (3) 発明の目的本発明の目的は、光学文字読取装置に入力され
るひらがなと漢字をそれぞれ別の認識手段で判別
することにより、ひらがなと漢字の認識精度を向
上させかつ認識結果に対する後処理の正確化を図
ることにある。 (4) 発明の構成本発明によれば、光学文字読取装置に入力され
た文字のパターンをその文字で取り囲まれたルー
プの数と互いに分離している部分である連結成分
の数と水平垂直方向に走査された文字の連続した
部分である黒ランの数とにより多画数の文字と少
画数の文字に分類し、更に該少画数の文字の輪郭
を構成する所定の線分の集合である輪郭線分系列
により少画数の文字を少画数の莞爾とひらがなに
分類することを特徴とする文字認識方式が提供さ
れる。 (5) 以下、本発明を実施例により添付図面を参照
して説明する。第１図は、本発明に係る文字認識方式を実施す
るための回路構成図である。第１図の回路にひら
がなと漢字から成る文字Ｍが入力されると、先ず
ループ数連結成分数部１と平均黒ラン数部２によ
り多画数漢字TKとそれ以外の文字が判別され、
輪郭線分系列部３により更に少画数漢字SKとひ
らがなＨとが判別されるようになつている。ループ数連結成分数部１は、ループ数計数回路
１１と連結成分数計数回路１２と多画数・少画数
判定回路１３とから構成されている。このうち、
回路１１と１２はそれぞれ第２図にて定義づけら
れるループＡの数と連結成分Ｂの数を計数する機
能を有する。ループＡは入力文字Ｍの文字部分で取り囲まれ
た部分をいい、第２図の例ではその数は２つあ
る。また連結成分Ｂは入力文字Ｍの互いに分離し
ている部分をいい、第２図の例では破線で示すよ
うに５つある。また多画数・少画数判定回路１３
は入力文字Ｍの画数の多少を判別する回路であ
り、他の平均黒ラン計数部２にもこの回路１３は
組み込まれており、多画数漢字TKと他の文字を
判別して該多画数漢字TKを抽出する（第１図）。平均黒ラン数部２は平均黒ラン数計数回路２１
と多画数・少画数判定回路２２から成り、前段の
ループ数連結成分数部１と共に入力文字Ｍの複雑
さの程度を判定する。黒ランＣほ第２図のｘ方向またはｙ方向に走査
した場合に走査対象たる文字の連続した部分をい
い、ｘ方向とｙ方向のメツシユ数をＭ，Ｎとすれ
ば各方向の平均黒ラン数n_x、n_yはそれぞれ (1) Technical Field of the Invention The present invention relates to a character recognition method, and particularly to a method for discriminating between hiragana and kanji in an optical character reading device that optically reads handwritten hiragana and kanji. (2) Prior art and problems Conventional optical character reading devices recognize hiragana and kanji using a single method, but hiragana and kanji are generally distinguished by complexity such as the number of strokes and the number of curved parts. There are clear differences as shown in the figure. Therefore, it seems likely that both were recognized by the same means.
There are problems in that good recognition results cannot be obtained because the differences cannot be clearly understood, and character post-processing cannot be performed accurately on the recognition results, such as word processing consisting only of kanji. Ta. (3) Purpose of the Invention The purpose of the present invention is to improve the recognition accuracy of hiragana and kanji by distinguishing hiragana and kanji input into an optical character reading device using separate recognition means, and to perform post-processing on the recognition results. The aim is to improve the accuracy of (4) Structure of the Invention According to the present invention, a character pattern input to an optical character reading device is calculated by determining the number of loops surrounded by the character, the number of connected components that are separated from each other, and the horizontal and vertical directions. The characters are classified into characters with a large number of strokes and characters with a small number of strokes according to the number of black runs, which are continuous parts of the characters scanned, and are further classified into characters with a large number of strokes and characters with a small number of strokes. A character recognition method is provided which is characterized by classifying characters with a small number of strokes into kanji and hiragana with a small number of strokes based on a line segment series. (5) Hereinafter, the present invention will be explained by way of examples with reference to the accompanying drawings. FIG. 1 is a circuit configuration diagram for implementing a character recognition method according to the present invention. When a character M consisting of hiragana and kanji is input to the circuit shown in Fig. 1, first, the multi-stroke number kanji TK and other characters are distinguished by the loop number connected component number part 1 and the average black run number part 2.
The contour line segment series section 3 further distinguishes between kanji SK and hiragana H, which have a small number of strokes. The loop number and connected component number section 1 includes a loop number counting circuit 11, a connected component number counting circuit 12, and a large stroke number/small stroke number determination circuit 13. this house,
Circuits 11 and 12 each have the function of counting the number of loops A and the number of connected components B defined in FIG. Loop A refers to a portion surrounded by the character portion of input character M, and in the example of FIG. 2, there are two loops. Connected components B refer to mutually separated parts of the input character M, and in the example of FIG. 2, there are five connected components as shown by broken lines. In addition, the circuit 13 for determining the number of strokes and the number of strokes is small.
is a circuit that determines the number of strokes of the input character M. This circuit 13 is also incorporated in the other average black run counting unit 2, and distinguishes between the kanji with a large number of strokes TK and other characters, and calculates the number of kanji with a corresponding number of strokes. Extract TK (Figure 1). The average black run number section 2 is an average black run number counting circuit 21
and a large stroke number/small stroke number determination circuit 22, which determines the degree of complexity of the input character M together with the loop number connected component number section 1 at the previous stage. Black run C refers to the continuous part of the character to be scanned when scanning in the x or y direction in Figure 2, and if the number of meshes in the x and y directions is M and N, then the average black run in each direction is The numbers n _x and n _y are respectively

【式】【formula】

【式】で表わされる。輪郭線分系列部３は輪郭線分系列抽出回路３１
と少画数・ひらがな判定回路２２から成り少画数
漢字SKとひらがなＨを判別する機能が有する。輪郭線分系列とは第３図に示すように入力文字
Ｍをｘ方向（第３図１）とｙ方向（第３図２）に
走査し、文字の縁部において文字が開いているか
（○印）閉じているか（●印）の組み合わせでで
きる４つの線分の系列（○と○、○と●、●と
○、●と●）をいう。第３図１′に、第３図１の
文字を左側から見た輪郭パターンおよび輪郭線分
系列を示す。輪郭線分系列は輪郭パターンを左側
から見たときに連続した部分に形成される。輪郭
線分の両端の状態（開または閉）はその輪郭線分
の上（上端の場合）あるいは下（下端の場合）を
見たときに輪郭にぶつかれば閉であり輪郭にぶつ
からなければ開である。これら線分系列を抽出す
ることにより少画数・ひらがな判定回路３２に入
力させ少画数漢字SKとひらがなＨとを判別しよ
うとするものである。少画数の漢字とひらがなを比較すると、ひらが
なは曲線のストロークで構成されているものが多
く、漢字の直線のストロークで構成されているも
のが多い。第４図に輪郭線分系列により少画数の
漢字とひらがなを判別する手法を示す。第４図に
示すようなａ，ｂ二種類の線分の右側から見た輪
郭線分を例にとり説明すると、ａのような直線で
は、輪郭線分が連続した一つの線部として生成さ
れるのに対し、ｂのような曲線では、複数の輪郭
線分に分割して生成される。したがつて、ループ
数、連結成分数、黒ラン数などが同一の場合、輪
郭線分数の多寡によつてひらがなと漢字を分離す
ることができる。上記構成を有する回路に入力された文字Ｍのパ
ターンは先ずループ数連結成分数部１と平均黒ラ
ン数部２により画数の多い漢字TKとそれ以外の
文字に分類される。そしてそれ以外の文字は比較
的特徴が少ないので、更に詳細な認識機能を有す
る輪郭線分系列部３により画数の少ない漢字SK
とひらがなＨとに分類する。 (6) 発明の効果上記の通り、本発明によれば、光学文字読取装
置に入力されるひらがなと漢字をそれぞれ別の認
識手段で判別することができるので、ひらがなと
漢字の認識精度を向上させかつそれにもとづく後
処理の正確化も図れることができる。Represented by [Formula]. The contour line segment series section 3 includes a contour line segment series extraction circuit 31
and a low stroke count/hiragana determination circuit 22, which has a function of discriminating between low stroke count kanji SK and hiragana H. What is a contour line segment series?As shown in Fig. 3, input character M is scanned in the x direction (Fig. 3 1) and y direction (Fig. 3 2), and whether the character is open at the edge of the character (○ A series of four line segments (○ and ○, ○ and ●, ● and ○, ● and ●) that are formed by the combination of closed (●) marks. FIG. 3 1' shows a contour pattern and a series of contour line segments of the character in FIG. 3 1 viewed from the left side. The contour line segment series is formed as a continuous portion when the contour pattern is viewed from the left side. The state of both ends of a contour segment (open or closed) is determined by the fact that when looking above (in the case of the top end) or below (in the case of the bottom end) the contour line segment, if it collides with the contour, it is closed, and if it does not collide with the contour, it is open. be. By extracting these line segment sequences, they are inputted to a low stroke count/hiragana determination circuit 32 to discriminate between the low stroke count kanji SK and the hiragana H. Comparing kanji with a small number of strokes and hiragana, hiragana are often made up of curved strokes, while kanji are often made up of straight strokes. FIG. 4 shows a method for distinguishing between kanji and hiragana with a small number of strokes based on a series of contour line segments. Taking as an example the outline segments seen from the right side of two types of line segments a and b as shown in Figure 4, for a straight line like a, the outline segments are generated as one continuous line part. On the other hand, a curve like b is generated by dividing it into a plurality of contour line segments. Therefore, when the number of loops, the number of connected components, the number of black runs, etc. are the same, hiragana and kanji can be separated depending on the number of contour segments. The pattern of the character M input to the circuit having the above configuration is first classified into the Chinese character TK with a large number of strokes and other characters based on the loop number connected component number part 1 and the average black run number part 2. Since other characters have relatively few characteristics, the outline segment series section 3, which has a more detailed recognition function, uses SK characters for kanji with fewer strokes.
and Hiragana H. (6) Effects of the Invention As described above, according to the present invention, since hiragana and kanji input to an optical character reading device can be distinguished by separate recognition means, the recognition accuracy of hiragana and kanji can be improved. Moreover, it is possible to improve the accuracy of post-processing based on this.

[Brief explanation of drawings]

第１図は本発明に係る文字認識方式を実施する
ための回路構成図、第２図と第３図は第１図に示
す回路の概念説明図、第４図は輪郭線分系列によ
り少画数漢字とひらがなを判別する手法を説明す
る図である。１……ループ数連結成分数部、２……平均黒ラ
ン数部、３……輪郭線分系列部、１１……ループ
数計数回路、１２……連結成分数計数回路、１３
……多画数・少画数判定回路、２１……平均黒ラ
ン数計数回路、２２……多画数・少画数判定回
路、３１……輪郭線分系列抽出回路、３２……少
画数、・ひらがな判定回路。 Fig. 1 is a circuit configuration diagram for implementing the character recognition method according to the present invention, Figs. 2 and 3 are conceptual explanatory diagrams of the circuit shown in Fig. 1, and Fig. 4 shows a small number of strokes by a series of contour line segments. FIG. 2 is a diagram illustrating a method for distinguishing between kanji and hiragana. 1... Loop number connected component number part, 2... Average black run number part, 3... Contour line segment series part, 11... Loop number counting circuit, 12... Connected component number counting circuit, 13
...High stroke number/low stroke number determination circuit, 21: Average black run number counting circuit, 22: High stroke number/low stroke number determination circuit, 31: Contour line segment series extraction circuit, 32: Low stroke number, Hiragana determination circuit.

Claims

[Claims]

1 The pattern of characters input into an optical character reader is determined by the number of loops surrounded by the character, the number of connected components that separate the character from each other, and the number of consecutive parts of the character scanned in the horizontal and vertical directions. Characters with a large number of strokes are classified into characters with a large number of strokes and characters with a small number of strokes according to the number of black runs, and characters with a small number of strokes are classified into characters with a small number of strokes based on a contour line segment series, which is a set of predetermined line segments that constitute the outline of the character with a small number of strokes. A character recognition method that is characterized by classification into kanji and hiragana.