JPS6162985A

JPS6162985A - Recognition order determining system

Info

Publication number: JPS6162985A
Application number: JP59185643A
Authority: JP
Inventors: Hiroshi Matsumura; 松村　博; Tatsunosuke Iwahara; 岩原　達之助
Original assignee: Tokyo Sanyo Electric Co Ltd; Sanyo Electric Co Ltd
Current assignee: Tokyo Sanyo Electric Co Ltd; Sanyo Electric Co Ltd
Priority date: 1984-09-04
Filing date: 1984-09-04
Publication date: 1986-03-31

Abstract

PURPOSE:To improve the recognition rate of the titled system without requiring an enormous knowledge part by determining priority levels of individual character type categories in accordance with frequency and determining the recognition order of candidate character type categories on the basis of calculated degrees of resemblance and priority levels. CONSTITUTION:A binary character pattern outputted from a character observing part 1 is inputted to a pattern matching part 4 through a feature extracting part 2, and degrees of resemblance between a feature pattern and standard feature patterns are calculated. A priority level is determined for each character type category in a dictionary part 3 in accordance with frequency. Operating parts 4a-4c select candidate character type cetegories from category sets in the order of the degree of resemblance, and their character type codes and degrees of resemblance are stored in a candidate memory 5 together with priority levels of category sets. A knowledge part 6 determines the recognition order of candidate character type categories on the basis of relations between degrees of resemblance and priority levels and stores character type codes in a result memory 8 in accordance with the recognition order.

Description

【発明の詳細な説明】（イ）産業上の利用分野本発明は、手書き漢字を認識する文字認識システムに係
り、候補字種カテゴリーの認識順位決定方式に関する。DETAILED DESCRIPTION OF THE INVENTION (a) Field of Industrial Application The present invention relates to a character recognition system for recognizing handwritten Chinese characters, and more particularly to a recognition ranking determination method for candidate character type categories.

（ロ）従来の技術一般に、文字認識システムでは、入力文字パターンから
抽出した特徴パターンと、予め辞書部に登録された字種
カテゴリー毎の標準特徴パターンとの類似度を計算し、
類似度の大きいｎ個の候補字種カテゴリーを選択する。(b) Conventional technology Generally, in a character recognition system, the degree of similarity between a feature pattern extracted from an input character pattern and a standard feature pattern for each character type category registered in advance in a dictionary section is calculated,
Select n candidate character type categories with a high degree of similarity.

そして、類似度の最も大きい候補字種カテゴリーを認識
結果として出力すると共に、誤認識の訂正のために、選
択したｎ個の候補字種カテゴリーには、類似度の大きい
順に第１位から第０位までの認識順位を決定しておく。Then, the candidate character type category with the highest degree of similarity is output as the recognition result, and in order to correct misrecognition, the selected n candidate character type categories are ranked from 1st to 0th in order of the degree of similarity. Decide on the recognition ranking up to the first rank.

ところが、上述の如く、認識順位の決定に類似度のみを
用いていたのでは誤認識が多く、そこで、類似度による
複数の候補字種カテゴリーの選択後に、何らかの後処理
を施して認識順位を決定する方式が考えられるようにな
った。However, as mentioned above, using only similarity to determine the recognition ranking often results in false recognition, so after selecting multiple candidate character categories based on similarity, some post-processing is performed to determine the recognition ranking. Now I can think of a way to do that.

そして、従来、後処理としては、特開昭５９−３２０８
２号公報に開示されているように、文法的処理を行なう
ものや、特開昭５９−２７３８１号公報のように、被認
識文字の前後の文字が、漢字か、カタカナかあるいはひ
らがなかを判定するものが提案されていた。Conventionally, as post-processing, Japanese Patent Application Laid-Open No. 59-3208
As disclosed in Publication No. 2, there are methods that perform grammatical processing, and methods that determine whether the characters before and after the recognized character are kanji, katakana, or hiragana, as in Japanese Patent Application Laid-open No. 59-27381. Something was proposed.

０→　発明が解決しようとする問題点従来の技術においては、文法的処理を後処理として行な
うので、文法的な辞書等の知識部が莫大となり、更には
、その処理内容が非常に複雑になるという問題があり、
又、前後の文字が、漢字かカタカナか等を判定する方式
では、選択した候補字種カテゴリーが漢字やひらがなば
かりである場合には、認識率の向上は期待できなかった
。0→ Problem to be solved by the invention In conventional technology, grammatical processing is performed as post-processing, so the knowledge section such as grammatical dictionaries becomes enormous, and furthermore, the processing content becomes extremely complex. There is a problem that
Furthermore, in the method of determining whether the preceding and succeeding characters are kanji or katakana, no improvement in recognition rate could be expected if the selected candidate character categories were only kanji or hiragana.

に）問題点を解決するための手段本発明は、字種カテゴリーの各々に、頻度に応じた優先
度を定めておき、計算により得られた類似度とこの優先
度とに基づいて、候補字種カテゴリーの認識順位を決定
するものである。B) Means for solving the problem The present invention sets a priority level for each character type category according to its frequency, and selects candidate characters based on the similarity obtained by calculation and this priority level. This determines the recognition ranking of species categories.

（羽　作用本発明に依れば、類似度が大きく、且つ、頻繁に使用さ
れる字種カテゴリーの認識順位が上位に来るようになり
、又、知識部には、類似度と優先度との関係を記憶して
おけばよいこととなる。(Function) According to the present invention, character type categories that have a large degree of similarity and are frequently used are ranked higher in the recognition order, and the knowledge section has a combination of similarity and priority. All you have to do is remember the relationship.

（へ）実施例第１図は、本発明を適用した文字認識システムのブロッ
ク図であり、（１）は入力用原稿に書かれた文字を読取
り、読取り結果を２値の文字パターンとして出力する文
字観測部、（２）は入力文字パターンから特徴パターン
を抽出する特徴抽出部、（３）は字種カテゴリー毎の標
準特徴パターンを記憶した辞書部、（４）は抽出した特
徴パターンと標準特徴パターンとのマツチングを行ない
、両パターンの類似度を計算するパターンマツチング部
である。(F) Embodiment FIG. 1 is a block diagram of a character recognition system to which the present invention is applied, and (1) reads characters written on an input manuscript and outputs the reading result as a binary character pattern. Character observation unit, (2) is a feature extraction unit that extracts feature patterns from input character patterns, (3) is a dictionary unit that stores standard feature patterns for each character type category, and (4) is the extracted feature pattern and standard feature. This is a pattern matching unit that performs matching with patterns and calculates the degree of similarity between both patterns.

辞書部（３）の字種カテゴリーは、頻度の高いものをカ
テゴリーセット１、頻度の中位のものをカテゴリーセッ
ト２、頻度の低いものをカテゴリーセット３、というよ
うに頻度に応じてカテゴリー分けが為されており、各カ
テゴリーセット１〜３に順に優先度１〜３を定めている
。The character type categories in the dictionary section (3) are categorized according to their frequency, such as those with high frequency in category set 1, those with medium frequency in category set 2, and those with low frequency in category set 3. Priorities 1 to 3 are set for each category set 1 to 3 in order.

パターンマツチング部（４）は、カテゴリーセット１〜
３に各々対応する３つの演算部（４ａ）〜（４ｃ）を備
えており、各演算部は各カテゴリーセットの中から類似
度の大きい順にｎ個の候補字種カテゴリーを選択し、そ
の字種コード及び計算結果としての類似度を、候補メモ
１月５）に格納する。この際、演算部では対応するカテ
ゴリーセットの優先度を字種コード及び類似度に付加し
、これら３つの情報が各々の候補字種カテゴリーの情報
として候補メモ１月５）に記憶される。このようにして
、候補メモ１月５）には、各カテゴリーセットの中から
ｎ個づつ、合計３ｎ個の候補字種カテゴリーが記憶され
る。The pattern matching section (4) selects category sets 1~
3, each of which selects n candidate character type categories in descending order of similarity from each category set, and selects n candidate character type categories from each category set in descending order of similarity The code and the similarity as the calculation result are stored in the candidate memo (January 5). At this time, the calculation unit adds the priority of the corresponding category set to the character type code and similarity, and these three pieces of information are stored in the candidate memo as information of each candidate character type category. In this manner, a total of 3n candidate character type categories, n from each category set, are stored in the candidate memo January 5).

知識部（６）には、類似度と優先度との関係が記憶され
ており、クラスタリング制御処理部（７）はこの知識部
（６）の内容を参照して、候補メモ１月５１に記憶され
た３０個の候補字種カテゴリーのうち上位ｎ個の認識順
位を決定し、その字種コードを認識順位順に結果メモ１
月８）に格納する。例えば、類似度としてシティブロッ
ク距離り、を用い、この距離が小さいほど類似度が太き
いとすれば、知識部（６）には具体的には、第２図に示
すように、距離珈と優先度による認識順位の入れ換えの
可否の関係が記憶されている。The knowledge section (6) stores the relationship between similarity and priority, and the clustering control processing section (7) refers to the contents of this knowledge section (6) and stores it in the candidate memo 51. The recognition rankings of the top n characters from among the 30 candidate character categories are determined, and the result memo 1 is recorded in the recognition ranking order of the character type codes.
Stored on month 8). For example, if we use the city block distance as the similarity, and assume that the smaller the distance, the thicker the similarity, the knowledge section (6) specifically contains the distance and distance as shown in Figure 2. The relationship of whether or not the recognition order can be swapped based on the priority is stored.

以下、具体例を上げて認識順位の決定の様子を説明する
。The manner in which the recognition ranking is determined will be explained below using a specific example.

先ず、字種コードがＭＬ、シティブロック距離がＤｉ、
優先度がｐ　（ｉ　：＝ａ　、　ｂ、　　ｃ・”−、Ｐ
　＝２’　＋２．３）の候補字種カテゴリーを（ＭＬＤ
Ｌ、Ｐ）と表わすこととし、例えば、第３図（イ）に示
すように、カテゴリーセット１〜３の各々から、類似度
が上位５個づつの候補字種カテゴリーが選択され、各シ
ティブロック距離り、の関係が、Ｄｖ＜ＤＰ〈Ｄ、、　
＜ＤＱ　＜Ｄｂ　＜ＤＩｌ　＜Ｉ）ｗであったとする。First, the character type code is ML, the city block distance is Di,
The priority is p (i:=a, b, c・”−, P
=2' +2.3) candidate character type category (MLD
For example, as shown in FIG. 3(a), the top five candidate character type categories with the highest similarity are selected from each of category sets 1 to 3, and each city block The relationship between distance is Dv<DP<D,
Assume that <DQ <Db <DIl <I)w.

すると、従来の如くシティブロック距離の大小だけから
では、認識順位は第３図（ロ）に示すようになる。Then, as in the conventional method, based only on the size of the city block distance, the recognition order becomes as shown in FIG. 3 (b).

ところが、今、知識部（６）における閾値り、、Ｄ、。However, now the threshold value in knowledge part (6), ,D,.

Ｄ、と計算したシティブロック距離の関係が、ＤＩ＜Ｄ
ＶＩ　Ｄａ（Ｄ２　＜ＤＱＩ　Ｄｗ　＜Ｄｓであったと
すると、クラスタリング制御処理部（７）は、字種コー
ドＭｖ。The relationship between D and the calculated city block distance is DI<D
If VI Da(D2 < DQI Dw < Ds, the clustering control processing unit (7) sets the character type code Mv.

Ｍ、　、Ｍ、をＢランクに、そして、字種コードＭ０゜
Ｍｂ　ＭＲ９ＭｗをＣランクにランク分けし、これらの
ランク内で優先度による認識順位の入れ換えを行なうた
め、第３図Ｐ→に示すように、各ランク内では優先度の
高い字種コードが上位に来ることとなり、結果として、
類似度と優先度に基づく認識順位が決定される。そして
、゛上位５個の字種コードＭ、　、　Ｍ、　、　Ｍｖ、
　Ｍｂ　、　ＭＱが順に結果メモ１月８）に記憶され、
答出力制御部（９）は、第１位の字種コードＭ、をワー
プロあるいはパソコン等の文字表示装置に認識結果とし
て送出し、その字種の表示が行なわれる。このとき、も
し誤認識であれば次の認識順位の字種コードを送出し、
以下、正しい認識結果が得られるまで順次、次の順位の
字種コードが送出される。M, ,M, are ranked as B rank, and character type code M0゜Mb MR9Mw is ranked as C rank, and in order to change the recognition order based on priority within these ranks, as shown in Fig. 3 P→ As such, within each rank, character type codes with high priority will be placed at the top, and as a result,
A recognition ranking is determined based on similarity and priority. Then, ``Top 5 character type codes M, , M, , Mv,
Mb and MQ are stored in the result memo (January 8) in order,
The answer output control section (9) sends the first character type code M to a character display device such as a word processor or a personal computer as a recognition result, and the character type is displayed. At this time, if there is a misrecognition, the character type code of the next recognition order is sent,
Thereafter, the character type codes of the next rank are sequentially sent out until a correct recognition result is obtained.

従って、類似度が大きく；且つ、頻度の高い候補字種カ
テゴリーが優先されることとなり、このような文字の認
識順位が高くなり、認識率が向上する・・又、他の例として、シティブロック距離Ｄｖが他の候補
と比べて十分小さく　Ｄｖ＜ＤＩであり、Ｄ。Therefore, priority is given to candidate character type categories with high similarity and high frequency, and the recognition ranking of such characters becomes high and the recognition rate improves. The distance Dv is sufficiently small compared to other candidates, Dv<DI, and D.

＜Ｄｒ　、：ｏｂ　＜Ｄ２である場合は、字種コードＭ
ｖはパランクにランクされ認識順位の入れ換えは行なわ
れず、Ｂランクの字種コードＭＰ２Ｍ１２ＭＱ９Ｍｂが
優先度による順位入れ換えが行なわれ、その認識順位は
第３図に）のようになる。この場合も第３図（ロ）に比
較すれば、頻度の高い候補字種カテゴリーの認識順位が
上位に来ることとなる。<Dr, :ob <D2, character type code M
v is ranked in the pararank and the recognition order is not changed, and the character type code MP2M12MQ9Mb of rank B is changed in the order based on the priority, and the recognition order is as shown in FIG. 3). In this case as well, when compared to FIG. 3 (b), the recognition ranking of the candidate character type category with high frequency comes to the top.

ところで、本実施例においては、字種カテゴリーを優先
度に応じたカテゴリーセットに予め分けておき、優先度
情報を得るようにしたが、辞書部（３）に標準特徴パタ
ーンと共に優先度情報を予め記憶しておいてもよい。By the way, in this embodiment, the character type categories are divided in advance into category sets according to the priorities, and the priority information is obtained. You may remember it.

（ト）発明の効果本発明に依れば、頻繁に使用される字種カテゴリーの認
識順位が上位に来るようになるので、認識率が向上し、
又、莫大な知識部を必要とせず、短かい処理時間で認識
順位を決定できる。(G) Effects of the Invention According to the present invention, frequently used character categories are ranked higher in the recognition ranking, so the recognition rate improves.
Furthermore, recognition rankings can be determined in a short processing time without requiring a huge knowledge section.

[Brief explanation of the drawing]

第１図は本発明を適用した文字認識システムのブロック
図、第２図は知識部の内容を示す説明図、第３図（イ）
〜に）は認識順位決定の具体例を示す説明図である。主な図番の説明（１）・・・文字観測部、　（２）・・・特徴抽出部、
　（３）・・・辞書部、　（４）・・・パターンマツチ
ング部、　（５）・・・候補メモリ、　（６）・・・知
識部、　（７）・・・クラスタリング制御処理部、　（
８）・・・結果メモリ、　（９）・・・答出力制御部。出願人　三洋電機株式会社　外１名代理人　弁理士　　佐　野　静　夫＠ｌ　　ｒ：？１（口’）Ｍｖ、ＭＰ　１Ｍａ、　ＭＱ、　Ｍｂ　、　ＭＰ　、　ＭＷ；（−’）　　　Ｍｖ　、　＋　Ｍａ　、　　Ｍ＋Ａ−７
シフーラに−１）、　　Ｍｐ　、　Ｍａ’ ３う〉ウ　−一−一一一Figure 1 is a block diagram of a character recognition system to which the present invention is applied, Figure 2 is an explanatory diagram showing the contents of the knowledge section, and Figure 3 (A).
-) are explanatory diagrams showing specific examples of recognition ranking determination. Explanation of main figure numbers (1)...Character observation section, (2)...Feature extraction section,
(3)... Dictionary section, (4)... Pattern matching section, (5)... Candidate memory, (6)... Knowledge section, (7)... Clustering control processing section, (
8)...Result memory, (9)...Answer output control section. Applicant Sanyo Electric Co., Ltd. and one other agent Patent attorney Shizuo Sano @l r:? 1 (mouth') Mv, MP 1Ma, MQ, Mb, MP, MW; (-') Mv, + Ma, M+A-7
Shihura -1), Mp, Ma' 3U〉U -1-111

Claims

[Claims]

(1) A feature pattern extracted from the input character pattern,
In a character recognition system that selects multiple candidate character categories by calculating the degree of similarity with standard feature patterns for each character type category registered in advance in the dictionary section, each of the character type categories is given priority according to its frequency. , and the recognition ranking of the plurality of candidate character type categories is determined based on the similarity obtained by the calculation and the priority.