JPH0425588B2 - - Google Patents

Info

Publication number
JPH0425588B2
JPH0425588B2 JP57169510A JP16951082A JPH0425588B2 JP H0425588 B2 JPH0425588 B2 JP H0425588B2 JP 57169510 A JP57169510 A JP 57169510A JP 16951082 A JP16951082 A JP 16951082A JP H0425588 B2 JPH0425588 B2 JP H0425588B2
Authority
JP
Japan
Prior art keywords
strokes
character
characters
kanji
hiragana
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP57169510A
Other languages
Japanese (ja)
Other versions
JPS5960574A (en
Inventor
Yoshihisa Fujii
Eiichiro Yamamoto
Hiroshi Kamata
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to JP57169510A priority Critical patent/JPS5960574A/en
Publication of JPS5960574A publication Critical patent/JPS5960574A/en
Publication of JPH0425588B2 publication Critical patent/JPH0425588B2/ja
Granted legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/24Character recognition characterised by the processing or recognition method
    • G06V30/242Division of the character sequences into groups prior to recognition; Selection of dictionaries

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Discrimination (AREA)

Description

【発明の詳細な説明】[Detailed description of the invention]

(1) 発明の技術分野 本発明は文字認識方式、特に手書きのひらがな
や漢字を光学的に読み取る光学文字読取装置の上
記ひらがなと漢字を判別する方式に関する。 (2) 従来技術と問題点 従来の光学文字読取装置においては、ひらがな
と漢字を一つの手段で認識していたが、ひらがな
と漢字とでは一般的には画数等の複雑さ、曲線部
分の多少のように明らかな相違点がみられる。 従つて両者を同一手段で認識していたのでは、
相違点が明確に把握できずに、良好な認識結果が
得られず、かつ認識結果に対し、漢字のみから成
る単語処理のような文字の後処理も正確に行なう
ことはできないという問題点があつた。 (3) 発明の目的 本発明の目的は、光学文字読取装置に入力され
るひらがなと漢字をそれぞれ別の認識手段で判別
することにより、ひらがなと漢字の認識精度を向
上させかつ認識結果に対する後処理の正確化を図
ることにある。 (4) 発明の構成 本発明によれば、光学文字読取装置に入力され
た文字のパターンをその文字で取り囲まれたルー
プの数と互いに分離している部分である連結成分
の数と水平垂直方向に走査された文字の連続した
部分である黒ランの数とにより多画数の文字と少
画数の文字に分類し、更に該少画数の文字の輪郭
を構成する所定の線分の集合である輪郭線分系列
により少画数の文字を少画数の莞爾とひらがなに
分類することを特徴とする文字認識方式が提供さ
れる。 (5) 以下、本発明を実施例により添付図面を参照
して説明する。 第1図は、本発明に係る文字認識方式を実施す
るための回路構成図である。第1図の回路にひら
がなと漢字から成る文字Mが入力されると、先ず
ループ数連結成分数部1と平均黒ラン数部2によ
り多画数漢字TKとそれ以外の文字が判別され、
輪郭線分系列部3により更に少画数漢字SKとひ
らがなHとが判別されるようになつている。 ループ数連結成分数部1は、ループ数計数回路
11と連結成分数計数回路12と多画数・少画数
判定回路13とから構成されている。このうち、
回路11と12はそれぞれ第2図にて定義づけら
れるループAの数と連結成分Bの数を計数する機
能を有する。 ループAは入力文字Mの文字部分で取り囲まれ
た部分をいい、第2図の例ではその数は2つあ
る。また連結成分Bは入力文字Mの互いに分離し
ている部分をいい、第2図の例では破線で示すよ
うに5つある。また多画数・少画数判定回路13
は入力文字Mの画数の多少を判別する回路であ
り、他の平均黒ラン計数部2にもこの回路13は
組み込まれており、多画数漢字TKと他の文字を
判別して該多画数漢字TKを抽出する(第1図)。 平均黒ラン数部2は平均黒ラン数計数回路21
と多画数・少画数判定回路22から成り、前段の
ループ数連結成分数部1と共に入力文字Mの複雑
さの程度を判定する。 黒ランCほ第2図のx方向またはy方向に走査
した場合に走査対象たる文字の連続した部分をい
い、x方向とy方向のメツシユ数をM,Nとすれ
ば各方向の平均黒ラン数nx、nyはそれぞれ
(1) Technical Field of the Invention The present invention relates to a character recognition method, and particularly to a method for discriminating between hiragana and kanji in an optical character reading device that optically reads handwritten hiragana and kanji. (2) Prior art and problems Conventional optical character reading devices recognize hiragana and kanji using a single method, but hiragana and kanji are generally distinguished by complexity such as the number of strokes and the number of curved parts. There are clear differences as shown in the figure. Therefore, it seems likely that both were recognized by the same means.
There are problems in that good recognition results cannot be obtained because the differences cannot be clearly understood, and character post-processing cannot be performed accurately on the recognition results, such as word processing consisting only of kanji. Ta. (3) Purpose of the Invention The purpose of the present invention is to improve the recognition accuracy of hiragana and kanji by distinguishing hiragana and kanji input into an optical character reading device using separate recognition means, and to perform post-processing on the recognition results. The aim is to improve the accuracy of (4) Structure of the Invention According to the present invention, a character pattern input to an optical character reading device is calculated by determining the number of loops surrounded by the character, the number of connected components that are separated from each other, and the horizontal and vertical directions. The characters are classified into characters with a large number of strokes and characters with a small number of strokes according to the number of black runs, which are continuous parts of the characters scanned, and are further classified into characters with a large number of strokes and characters with a small number of strokes. A character recognition method is provided which is characterized by classifying characters with a small number of strokes into kanji and hiragana with a small number of strokes based on a line segment series. (5) Hereinafter, the present invention will be explained by way of examples with reference to the accompanying drawings. FIG. 1 is a circuit configuration diagram for implementing a character recognition method according to the present invention. When a character M consisting of hiragana and kanji is input to the circuit shown in Fig. 1, first, the multi-stroke number kanji TK and other characters are distinguished by the loop number connected component number part 1 and the average black run number part 2.
The contour line segment series section 3 further distinguishes between kanji SK and hiragana H, which have a small number of strokes. The loop number and connected component number section 1 includes a loop number counting circuit 11, a connected component number counting circuit 12, and a large stroke number/small stroke number determination circuit 13. this house,
Circuits 11 and 12 each have the function of counting the number of loops A and the number of connected components B defined in FIG. Loop A refers to a portion surrounded by the character portion of input character M, and in the example of FIG. 2, there are two loops. Connected components B refer to mutually separated parts of the input character M, and in the example of FIG. 2, there are five connected components as shown by broken lines. In addition, the circuit 13 for determining the number of strokes and the number of strokes is small.
is a circuit that determines the number of strokes of the input character M. This circuit 13 is also incorporated in the other average black run counting unit 2, and distinguishes between the kanji with a large number of strokes TK and other characters, and calculates the number of kanji with a corresponding number of strokes. Extract TK (Figure 1). The average black run number section 2 is an average black run number counting circuit 21
and a large stroke number/small stroke number determination circuit 22, which determines the degree of complexity of the input character M together with the loop number connected component number section 1 at the previous stage. Black run C refers to the continuous part of the character to be scanned when scanning in the x or y direction in Figure 2, and if the number of meshes in the x and y directions is M and N, then the average black run in each direction is The numbers n x and n y are respectively

【式】【formula】

【式】 で表わされる。 輪郭線分系列部3は輪郭線分系列抽出回路31
と少画数・ひらがな判定回路22から成り少画数
漢字SKとひらがなHを判別する機能が有する。 輪郭線分系列とは第3図に示すように入力文字
Mをx方向(第3図1)とy方向(第3図2)に
走査し、文字の縁部において文字が開いているか
(○印)閉じているか(●印)の組み合わせでで
きる4つの線分の系列(○と○、○と●、●と
○、●と●)をいう。第3図1′に、第3図1の
文字を左側から見た輪郭パターンおよび輪郭線分
系列を示す。輪郭線分系列は輪郭パターンを左側
から見たときに連続した部分に形成される。輪郭
線分の両端の状態(開または閉)はその輪郭線分
の上(上端の場合)あるいは下(下端の場合)を
見たときに輪郭にぶつかれば閉であり輪郭にぶつ
からなければ開である。これら線分系列を抽出す
ることにより少画数・ひらがな判定回路32に入
力させ少画数漢字SKとひらがなHとを判別しよ
うとするものである。 少画数の漢字とひらがなを比較すると、ひらが
なは曲線のストロークで構成されているものが多
く、漢字の直線のストロークで構成されているも
のが多い。第4図に輪郭線分系列により少画数の
漢字とひらがなを判別する手法を示す。第4図に
示すようなa,b二種類の線分の右側から見た輪
郭線分を例にとり説明すると、aのような直線で
は、輪郭線分が連続した一つの線部として生成さ
れるのに対し、bのような曲線では、複数の輪郭
線分に分割して生成される。したがつて、ループ
数、連結成分数、黒ラン数などが同一の場合、輪
郭線分数の多寡によつてひらがなと漢字を分離す
ることができる。 上記構成を有する回路に入力された文字Mのパ
ターンは先ずループ数連結成分数部1と平均黒ラ
ン数部2により画数の多い漢字TKとそれ以外の
文字に分類される。そしてそれ以外の文字は比較
的特徴が少ないので、更に詳細な認識機能を有す
る輪郭線分系列部3により画数の少ない漢字SK
とひらがなHとに分類する。 (6) 発明の効果 上記の通り、本発明によれば、光学文字読取装
置に入力されるひらがなと漢字をそれぞれ別の認
識手段で判別することができるので、ひらがなと
漢字の認識精度を向上させかつそれにもとづく後
処理の正確化も図れることができる。
Represented by [Formula]. The contour line segment series section 3 includes a contour line segment series extraction circuit 31
and a low stroke count/hiragana determination circuit 22, which has a function of discriminating between low stroke count kanji SK and hiragana H. What is a contour line segment series?As shown in Fig. 3, input character M is scanned in the x direction (Fig. 3 1) and y direction (Fig. 3 2), and whether the character is open at the edge of the character (○ A series of four line segments (○ and ○, ○ and ●, ● and ○, ● and ●) that are formed by the combination of closed (●) marks. FIG. 3 1' shows a contour pattern and a series of contour line segments of the character in FIG. 3 1 viewed from the left side. The contour line segment series is formed as a continuous portion when the contour pattern is viewed from the left side. The state of both ends of a contour segment (open or closed) is determined by the fact that when looking above (in the case of the top end) or below (in the case of the bottom end) the contour line segment, if it collides with the contour, it is closed, and if it does not collide with the contour, it is open. be. By extracting these line segment sequences, they are inputted to a low stroke count/hiragana determination circuit 32 to discriminate between the low stroke count kanji SK and the hiragana H. Comparing kanji with a small number of strokes and hiragana, hiragana are often made up of curved strokes, while kanji are often made up of straight strokes. FIG. 4 shows a method for distinguishing between kanji and hiragana with a small number of strokes based on a series of contour line segments. Taking as an example the outline segments seen from the right side of two types of line segments a and b as shown in Figure 4, for a straight line like a, the outline segments are generated as one continuous line part. On the other hand, a curve like b is generated by dividing it into a plurality of contour line segments. Therefore, when the number of loops, the number of connected components, the number of black runs, etc. are the same, hiragana and kanji can be separated depending on the number of contour segments. The pattern of the character M input to the circuit having the above configuration is first classified into the Chinese character TK with a large number of strokes and other characters based on the loop number connected component number part 1 and the average black run number part 2. Since other characters have relatively few characteristics, the outline segment series section 3, which has a more detailed recognition function, uses SK characters for kanji with fewer strokes.
and Hiragana H. (6) Effects of the Invention As described above, according to the present invention, since hiragana and kanji input to an optical character reading device can be distinguished by separate recognition means, the recognition accuracy of hiragana and kanji can be improved. Moreover, it is possible to improve the accuracy of post-processing based on this.

【図面の簡単な説明】[Brief explanation of drawings]

第1図は本発明に係る文字認識方式を実施する
ための回路構成図、第2図と第3図は第1図に示
す回路の概念説明図、第4図は輪郭線分系列によ
り少画数漢字とひらがなを判別する手法を説明す
る図である。 1……ループ数連結成分数部、2……平均黒ラ
ン数部、3……輪郭線分系列部、11……ループ
数計数回路、12……連結成分数計数回路、13
……多画数・少画数判定回路、21……平均黒ラ
ン数計数回路、22……多画数・少画数判定回
路、31……輪郭線分系列抽出回路、32……少
画数、・ひらがな判定回路。
Fig. 1 is a circuit configuration diagram for implementing the character recognition method according to the present invention, Figs. 2 and 3 are conceptual explanatory diagrams of the circuit shown in Fig. 1, and Fig. 4 shows a small number of strokes by a series of contour line segments. FIG. 2 is a diagram illustrating a method for distinguishing between kanji and hiragana. 1... Loop number connected component number part, 2... Average black run number part, 3... Contour line segment series part, 11... Loop number counting circuit, 12... Connected component number counting circuit, 13
...High stroke number/low stroke number determination circuit, 21: Average black run number counting circuit, 22: High stroke number/low stroke number determination circuit, 31: Contour line segment series extraction circuit, 32: Low stroke number, Hiragana determination circuit.

Claims (1)

【特許請求の範囲】[Claims] 1 光学文字読取装置に入力された文字のパター
ンをその文字で取り囲まれたループの数と互いに
分離している部分である連結成分の数と水平垂直
方向に走査された文字の連続した部分である黒ラ
ンの数とにより多画数の文字と少画数の文字に分
類し、更に該少画数の文字の輪郭を構成する所定
の線分の集合である輪郭線分系列により少画数の
文字を少画数の漢字とひらがなに分類することを
特徴とする文字認識方式。
1 The pattern of characters input into an optical character reader is determined by the number of loops surrounded by the character, the number of connected components that separate the character from each other, and the number of consecutive parts of the character scanned in the horizontal and vertical directions. Characters with a large number of strokes are classified into characters with a large number of strokes and characters with a small number of strokes according to the number of black runs, and characters with a small number of strokes are classified into characters with a small number of strokes based on a contour line segment series, which is a set of predetermined line segments that constitute the outline of the character with a small number of strokes. A character recognition method that is characterized by classification into kanji and hiragana.
JP57169510A 1982-09-30 1982-09-30 Character recognizing system Granted JPS5960574A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP57169510A JPS5960574A (en) 1982-09-30 1982-09-30 Character recognizing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP57169510A JPS5960574A (en) 1982-09-30 1982-09-30 Character recognizing system

Publications (2)

Publication Number Publication Date
JPS5960574A JPS5960574A (en) 1984-04-06
JPH0425588B2 true JPH0425588B2 (en) 1992-05-01

Family

ID=15887845

Family Applications (1)

Application Number Title Priority Date Filing Date
JP57169510A Granted JPS5960574A (en) 1982-09-30 1982-09-30 Character recognizing system

Country Status (1)

Country Link
JP (1) JPS5960574A (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6146573A (en) * 1984-08-10 1986-03-06 Fujitsu Ltd Character recognizing device
JPH0682400B2 (en) * 1984-08-31 1994-10-19 富士通株式会社 Character recognition device
US5425110A (en) * 1993-04-19 1995-06-13 Xerox Corporation Method and apparatus for automatic language determination of Asian language documents
US5444797A (en) * 1993-04-19 1995-08-22 Xerox Corporation Method and apparatus for automatic character script determination

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS502430A (en) * 1973-05-08 1975-01-11

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS502430A (en) * 1973-05-08 1975-01-11

Also Published As

Publication number Publication date
JPS5960574A (en) 1984-04-06

Similar Documents

Publication Publication Date Title
Lehal et al. A Gurmukhi script recognition system
Blumenstein et al. A novel feature extraction technique for the recognition of segmented handwritten characters
US5077805A (en) Hybrid feature-based and template matching optical character recognition system
Shivakumara et al. An efficient edge based technique for text detection in video frames
US5774582A (en) Handwriting recognizer with estimation of reference lines
Yamada et al. Cursive handwritten word recognition using multiple segmentation determined by contour analysis
Lehal et al. Feature extraction and classification for OCR of Gurmukhi script
JPH0425588B2 (en)
Song et al. Text region extraction and text segmentation on camera-captured document style images
US4596038A (en) Method and apparatus for character recognition
JP3092576B2 (en) Character recognition device
Gan et al. A new approach to stroke and feature point extraction in Chinese character recognition
Jindal et al. Structural features for recognizing degraded printed Gurmukhi script
Kanoun et al. Script identification for arabic and latin printed and handwritten documents
Huang et al. Scene character detection and recognition with cooperative multiple-hypothesis framework
Montero et al. Development of license plate recognition on complex scene with plate-style classification and confidence scoring based on knn
KR910000786B1 (en) Pattern recognition system
JP2832035B2 (en) Character recognition device
JP2728086B2 (en) Character extraction method
JPS6238752B2 (en)
Devi et al. Text detection from natural scene images for manipuri meetei mayek script
Gatos et al. Using curvature features in a multiclassifier OCR system
Ariki et al. Extraction and Recognition of Open Captions Superimposed on TV News Articles
Abubacker et al. An Approach for Structural Feature Extraction for Distorted Tamil Character Recognition
Molder et al. Decision fusion for improved automatic license plate recognition