JPS6162986A

JPS6162986A - Recognition order determining system

Info

Publication number: JPS6162986A
Application number: JP59186011A
Authority: JP
Inventors: Hiroshi Matsumura; 松村　博; Tatsunosuke Iwahara; 岩原　達之助
Original assignee: Tokyo Sanyo Electric Co Ltd; Sanyo Electric Co Ltd
Current assignee: Tokyo Sanyo Electric Co Ltd; Sanyo Electric Co Ltd
Priority date: 1984-09-05
Filing date: 1984-09-05
Publication date: 1986-03-31

Abstract

PURPOSE:To improve the recognition rate of the titled system by determining priority levels corresponding to a learning stage for individual character type categories and determining the recognition order of candidate character type categories on the basis of calculated degrees of resemblance and these priority levels. CONSTITUTION:A binary character pattern from a character observing part 1 is inputted to a pattern matching part 4 through a feature extracting part 2,a nd degrees of resemblance between a feature pattern and standard feature patterns are calculated. Character type categories in a dictionary part 3 re divided in accordance with learning stages of learning education, and priority levels are determined for individual category sets. Operating parts 4a-4d in the pattern matching part 4 add priority levels of category sets to character type codes and degrees of resemblance and store them in a candidate memory 5. A knowledge part 6 determines the recognition order of candidate character type categories on the basis of calculated degrees of resemblance and priority levels and stores character type codes in a result memory 8 in the recognition order.

Description

【発明の詳細な説明】（イ）産業上の利用分野本発明は、手書き漢字を認識する文字認識システムに係
り、候補字種カテゴリーの認識順位決定方式に関する。DETAILED DESCRIPTION OF THE INVENTION (a) Field of Industrial Application The present invention relates to a character recognition system for recognizing handwritten Chinese characters, and more particularly to a recognition ranking determination method for candidate character type categories.

（ロ）従来の技術一般に、文字認識システムでは、入力文字パターンから
抽出した特徴パターンと、予め辞書部に登録された字種
カテゴリー毎の標準特徴パターンとの類似度を計算し、
類似度の大きいｎ１個の候補字種カテゴリーを選択する
。そして、類似度の最も大きい候補字種カテゴリーを認
識結果として出力すると共に、誤認識の訂正のために、
選択したル個の候補字種カテゴリーには、類似度の大き
い１！ｉｉに第１位から第１位までの認識順位を決定し
ておく。(b) Conventional technology Generally, in a character recognition system, the degree of similarity between a feature pattern extracted from an input character pattern and a standard feature pattern for each character type category registered in advance in a dictionary section is calculated,
Select n1 candidate character type categories with a high degree of similarity. Then, the candidate character type category with the highest degree of similarity is output as the recognition result, and in order to correct misrecognition,
Among the selected candidate character categories, 1! In step ii, the recognition ranking from first place to first place is determined.

ところが上述の如く、認Ｒ順位の決定に知什ｊ、１丈の
みを用いていたのでは誤♂識が多く、そこで、類似度に
よる複数の候補字種カテゴリーの選択後に、何らかの後
処理を施して認識順位を決定する方式が考えられるよう
になった。However, as mentioned above, using only Chijij and Ichijo to determine the recognition R ranking causes many false positives, so some kind of post-processing is performed after selecting multiple candidate character type categories based on similarity. A method for determining recognition rankings has become possible.

そして、従来、後処理としては、特開昭５９−３２０８
２号公報に開示されているように、文法的処理を行なう
ものや、特開昭５９−２７３８１号公報のように、被認
識文字の前後の文字が、漢字か、カタカナかあるいはひ
らがなかを判定するものが提案されていた。Conventionally, as post-processing, Japanese Patent Application Laid-Open No. 59-3208
As disclosed in Publication No. 2, there are methods that perform grammatical processing, and methods that determine whether the characters before and after the recognized character are kanji, katakana, or hiragana, as in Japanese Patent Application Laid-open No. 59-27381. Something was proposed.

（ハ）発明が解決しようとする問題点従来の技術においては、文法的処理を後処理として行な
うので、文法的な辞書等の知識部が莫大となり、更には
、その処理内容が非常に複雑になるという問題があり、
又、前後の文字が、漢字かカタカナか等を判定する方式
では、選択した候補字種カテゴリーが漢字やひらがなば
かりである場合には、認識率の向上は期待できなかった
。(c) Problems to be solved by the invention In the conventional technology, grammatical processing is performed as post-processing, so the knowledge section such as a grammatical dictionary becomes enormous, and furthermore, the processing content becomes extremely complex. There is a problem that
Furthermore, in the method of determining whether the preceding and succeeding characters are kanji or katakana, no improvement in recognition rate could be expected if the selected candidate character categories were only kanji or hiragana.

更に、難しい字種カテゴリーは、その特徴が多いため、
他の字種カテゴリーと区別しやすく、誤認識は比較的少
ないが、簡単な字種カテゴリーは、その特徴が少ないた
め、類似度の接近した候補字種カテゴリーが多く選択さ
れて１．まい、難しい字種カテゴリーの認識順位が上位
に来て誤認識してしまうということが多々あった。Furthermore, difficult character categories have many characteristics, so
Easy to distinguish from other glyph categories, and there are relatively few misrecognitions, but because simple glyph categories have few characteristics, many candidate glyph categories with close similarity are selected.1. However, there were many cases where difficult character categories were ranked higher in the recognition ranking and were misrecognized.

に）問題点を解決するための手段本発明は、字種カテゴリーの各々に、学校教育の学習段
階に応じた優先度を定めておき、計算により得られた類
似度とこの優先度とに基づいて、候補字種カテゴリーの
認識順位を決定するものである。B) Means for Solving the Problems The present invention sets a priority level for each character type category according to the learning stage of school education, and then sets a priority level for each character type category based on the degree of similarity obtained by calculation and this priority level. This determines the recognition ranking of candidate character type categories.

（ホ）作用本発明に依れば、類似度が犬ぎく、且つ、学校教育の学
習段階に応じた優先度の高い候補字種カテゴリーの認識
順位が上位に来ることとなり、簡単で一般大衆によ（知
られている字種カテゴＩＪ−が優先されるようになる。(e) Effects According to the present invention, the recognition ranking of the candidate character type categories that have the highest degree of similarity and have a high priority according to the learning stage of school education will be placed at the top, making it easy for the general public to recognize YO (known character type category IJ- will now have priority.

又、知識部には、類似度と優先度の関係を記憶しておけ
ばよいこととなる。Furthermore, it is sufficient to store the relationship between similarity and priority in the knowledge section.

（へ）実施例第１図は、本発明を適用した文字認識システムのブロッ
ク図であり、（１）は入力用原稿に書かれた文字を読取
り、読取り結果を２値の文字パターンとして出力する文
字観測部、（２）は入力文字パターンから特徴パターン
を抽出する特徴抽出部、（３）は字種カテゴリー毎の標
準特徴パターンを記憶した辞１°部、（４）は抽出した
特徴パターンと標準特徴パターンとのマツチングを行な
い、両パターンの類似度を計算するパターンマツチング
部である。(F) Embodiment FIG. 1 is a block diagram of a character recognition system to which the present invention is applied, and (1) reads characters written on an input manuscript and outputs the reading result as a binary character pattern. Character observation section, (2) is a feature extraction section that extracts feature patterns from input character patterns, (3) is a character observation section that stores standard feature patterns for each character category, and (4) is a feature extraction section that extracts feature patterns from input character patterns. This is a pattern matching unit that performs matching with a standard feature pattern and calculates the degree of similarity between both patterns.

辞書部（３）の字種カテゴリーは、学校教育の学習段階
に応じたカテゴリー分けが為されており、各カテゴリー
セントに優先度が定められている。即ち、第２図て示す
ように、小学校１年〜３年で学習する文字をカテゴリー
七ン）　１　（３ａ）　、小学校４〜６年で学習する文
字をカテゴリーセット２（３ｂ）、中学校以上で学習す
る文字をカテゴリーセット３　（３ｃ）、というように
全ての字種カテゴリーを３つのカテゴリーセットに分け
、カテゴリーセット１〜３に順に優先度１〜３を定めて
いる。The character categories in the dictionary section (3) are divided into categories according to the learning stage of school education, and a priority is set for each category cent. That is, as shown in Figure 2, the characters learned in the first to third years of elementary school are categorized as Category 7) 1 (3a), the characters learned in the fourth to sixth years of elementary school as Category Set 2 (3b), and the characters learned in junior high school and above are categorized as Category Set 1 (3a). All character categories are divided into three category sets, such as character to be learned in category set 3 (3c), and priority levels 1 to 3 are assigned to category sets 1 to 3 in order.

パターンマツチング部（４）は、カテゴリーセット１〜
３に各々対応する３つの演算部（４ａ）〜（４ｃ）を備
えており、各演算部は各カテゴリーセットの中から類似
度の大きい順に九個の候補字種カテゴリーを選択し、そ
の字種コード及び計算結果としての類似度を、候補メモ
リ（５）に格納する。この際、演算部では対応するカテ
ゴリーセットの優先度を字種コード及び類似度ｆ付加し
、これら３つの情報が各々の候補字種カテゴリーの情報
として候補メモ１月５）に記憶される。このようにして
、候補メモリ（５）には、各カテゴリーセントの中から
九個づつ、合計３ｒＬ個の候補字種カテゴリーが記憶さ
れる。The pattern matching section (4) selects category sets 1~
3, each of which selects nine candidate character type categories from each category set in descending order of similarity. The code and the similarity as a calculation result are stored in a candidate memory (5). At this time, the calculation unit adds the priority of the corresponding category set to the character type code and the similarity f, and these three pieces of information are stored in the candidate memo as information of each candidate character type category. In this way, a total of 3rL candidate character type categories, nine of each category cent, are stored in the candidate memory (5).

知識部（６）には、類似度と優先度との関係が記憶され
ており、クラスタリング制御処理部（７）はこの知識部
（６）の内容を参照して、候補メモリ（５）に記憶され
た３ｒＬ個の候補字種カテゴリーのうち上位九個の認識
順位を決定し、その字種コードを認識順位順に結果メモ
１月８）に格納する。例えば類似度としてシティブロッ
ク距離ＤＩ　　を用い、この距離が小さい程類似度が太
きいとすれば、知識部（６）には具体的には、第３図に
示すように、距離ＤＩと優先度による認識順位の入れ換
えの可否の関係が記憶されている。The knowledge section (6) stores the relationship between similarity and priority, and the clustering control processing section (7) refers to the contents of this knowledge section (6) and stores it in the candidate memory (5). The recognition rankings of the top nine of the 3rL candidate character type categories are determined, and the character type codes are stored in the result memo (January 8) in order of recognition ranking. For example, if we use the city block distance DI as the degree of similarity, and assume that the smaller this distance is, the greater the degree of similarity, then the knowledge section (6) specifically contains the distance DI and the priority, as shown in Figure 3. The relationship as to whether or not the recognition order can be swapped is stored.

以下、具体例を上げて認識順位の決定の様子を説明する
。The manner in which the recognition ranking is determined will be explained below using a specific example.

先ず、字種コードがＭｌ、シティブロック距離がＤＩ優
先度がＰ　（１＝ａ、ｂ、ｃ−１Ｐ＝１、と、３）の候
補字種カテゴリーを（Ｍ２Ｓ　ＤＩ、　ｐ　）と表わす
こととし、例えば、第４図（イ）に示すように、カテゴ
リーセット１〜３の各々から、類似度が上位５個づつの
候補字種カテゴリーが選択され、各シティブロック距離
Ｄｉ　の関係が、Ｄｖ（：Ｄ。First, the candidate character type category with character type code Ml, city block distance and DI priority P (1=a, b, c-1P=1, and 3) is expressed as (M2S DI, p). For example, as shown in FIG. 4(a), the top five candidate character type categories with the highest similarity are selected from each of category sets 1 to 3, and the relationship between the city block distances Di is expressed as Dv( :D.

＜Ｄ−＜ＤＱ　＜Ｄｂ　＜Ｄｒ　＜Ｄ　Ｗであったとす
る。Assume that <D-<DQ <Db <Dr <D W.

スルト、従来の如くシティブロック距離の大小だけから
では、認識順位は第４図（ロ）に示すようになる。If, as in the past, only the size of the city block distance is used, the recognition order will be as shown in FIG. 4 (b).

とこうが、今、知識部（６）における閾値り、、Ｄ、、
Ｄ、と計算したシティブロック距離の関係が、Ｄ。Now, the threshold in the knowledge section (6) is D...
The relationship between D and the calculated city block distance is D.

＜Ｄｖ　、Ｄａ　＜Ｄｔ　＜ＤＱ　、Ｄｗ　＜Ｄａ　で
あったとすると、クラスタリング制御処理部（７）は、
字種コードＭｖ、ＭＰ、Ｍ、をＢランクに、そして、字
種コードＭ、、Ｍｂ、Ｍｒ、ＭｗをＣランクにランク分
けし、これらのランク内で優先度による認識順位の入れ
換えを行なうため、第４図（ハ）に示すように、各ラン
ク内では優先度の高い字種コードが上位に来ることとな
り、結果として、類似度と優先度に基づく認識順位が決
定される。そして、上位５個の字種コードＭ＠　、Ｍｐ
　、Ｍｙ　、Ｍｂ。<Dv, Da <Dt <DQ, Dw <Da, the clustering control processing unit (7)
In order to rank the character type codes Mv, MP, M, as rank B, and the character type codes M, , Mb, Mr, Mw as rank C, and change the recognition order based on priority within these ranks. , As shown in FIG. 4(c), within each rank, character type codes with higher priority are placed at the top, and as a result, recognition rankings are determined based on similarity and priority. Then, the top 5 character type codes M@, Mp
, My , Mb.

Ｍ、が順に結果メモ１月８）に記憶され、答出力制御部
（９）は第１位の字種コードＭ、をワープロあるいはパ
ソコン等の文字表示装置に認識結果として送出し、その
字種の表示が行なわれる。このとき、もし誤認識であれ
ば次の認識順位の字種コードを送出し、以下、正しい認
識結果が得られるまで順次、次の順位の字種コードが送
出されろ。M, are stored in the result memo (January 8) in order, and the answer output control unit (9) sends the first character type code M, to a character display device such as a word processor or a personal computer as a recognition result, and displays that character type. is displayed. At this time, if there is a misrecognition, the character type code of the next recognition rank is sent out, and thereafter, the character type codes of the next rank are sent out in sequence until a correct recognition result is obtained.

従って、類似度が大きく、且つ、学校教育の学習段階の
低い候補字種カテゴリーが優先され、簡単で一般大衆に
よく知られた文字の認識順位が上位となり、認識率が向
上する。Therefore, priority is given to candidate character type categories that have a high degree of similarity and are at a low learning stage in school education, and characters that are simple and well-known to the general public are ranked high in the recognition ranking, improving the recognition rate.

又、他の例として、シティブロック距離Ｄｖが他の候補
と比べて十分小さくＤｖ＜Ｄ、であり、Ｄ、＜Ｄ、　、
Ｄｔ）＜Ｄｔである場合は、字種コードＭｖはＡランつ
てランクされ認識順位の入れ換えは行なわれず、Ｂラン
クの字種コードＭ、　、Ｍ、、ＭＱ、Ｍｂが優先度によ
る順位入れ換えが行なわれ、その認識順位は第４図に）
のようになる。この場合も第４図（（ロ）に比較すれば
、学習段階の低い候補字種カテゴリーの認識順位が上位
に来ることとなる。Also, as another example, the city block distance Dv is sufficiently small compared to other candidates such that Dv<D, and D,<D, ,
Dt)<Dt, the character type code Mv is ranked with the A rank and the recognition order is not changed, and the character type codes M, , M, , MQ, Mb of the B rank are changed in the ranking based on the priority. (The recognition order is shown in Figure 4)
become that way. In this case as well, when compared to FIG. 4 ((b)), the recognition ranking of the candidate character type category with a low learning stage comes to the top.

とこうで、本実施例においては、字種カテゴリーを優先
度に応じたカテゴリーセットに予め分けておき、優先度
情報を得るようにしたが、辞書部（３）に標準特徴パタ
ーンと共に優先度情報を予め記憶しておいてもよい。Therefore, in this embodiment, the character type categories are divided in advance into category sets according to the priorities, and the priority information is obtained. It may be stored in advance.

尚、学習段階に応じた優先度の付与は、文部省の小学校
指導書及び中学校指導書を参照すれば容易に行なえる。Assigning priorities according to the learning stage can be easily done by referring to the Ministry of Education's elementary school guidance manual and junior high school guidance manual.

（ト）発明の効果本発明に依れば、簡単で一般大衆によく知られている字
種カテゴリーが優先されるようになるので、認識率が向
上すると共に、莫大な知識部を必要とせず、短かい処理
時間で認識順位を決定できる。(g) Effects of the Invention According to the present invention, priority is given to character type categories that are simple and well-known to the general public, which improves the recognition rate and eliminates the need for a huge knowledge department. , the recognition ranking can be determined in a short processing time.

又、特に、学校教育の学習段階に応じた優先度を用いて
いるため、−膜性をもたせることが可能となり、ワープ
ロやパソコンへの入力手段として文字認識装置を用いる
場合に非常に有効となる。In addition, in particular, since priority is used according to the learning stage of school education, it is possible to provide a film-like character, which is extremely effective when using a character recognition device as an input means to a word processor or personal computer. .

[Brief explanation of the drawing]

第１図は本発明を適用した文字認識システムのブロック
図、第２図はカテゴリーセットの内容を示す説明図、第
３図は知識部の内容を示す説明図、第４図（イ）〜に）
は認識順位決定の具体例を示す説明図である。主な図番の説明（１）・・・文学観側部、　（２）・・・特徴抽出部、
　（３）・・・辞書部、（４）・・・パターンマンチン
グ部、（５）・・・候補メモリ、　（６）・・・知識部
、　（７）・・・クラスタリング制御処理部、　（８）
・・・結果メモリ、　（９）・・・答出力制御部。Figure 1 is a block diagram of a character recognition system to which the present invention is applied, Figure 2 is an explanatory diagram showing the contents of the category set, Figure 3 is an explanatory diagram showing the contents of the knowledge section, and Figures 4 (a) to )
FIG. 2 is an explanatory diagram showing a specific example of recognition ranking determination. Explanation of main figure numbers (1)... Literary view side part, (2)... Feature extraction part,
(3)...Dictionary section, (4)...Pattern munching section, (5)...Candidate memory, (6)...Knowledge section, (7)...Clustering control processing section, ( 8)
...Result memory, (9)...Answer output control section.

Claims

[Claims]

(1) A feature pattern extracted from the input character pattern,
In a character recognition system that selects a plurality of candidate character categories by calculating the degree of similarity with standard feature patterns for each character type category registered in advance in the dictionary section, each character type category is assigned a learning stage in school education. A recognition order determining method, characterized in that a priority is determined according to the above, and the recognition order of the plurality of candidate character type categories is determined based on the similarity obtained by the calculation and the priority.